How to force a handshake for OTA updates


#26

If you don’t mind, please PM me some device IDs that exhibit these behaviors.


Issues with Product firmware
#27

Hi all the code for forcing a handshake above doesn’t work for me.

I have major issues getting the devices to remain connected, I need to sync the time. My devices are online all the time (they never sleep) and basically they go deaf.

I can still reset them remotely via a function call, however they never complete a handshake, and therefore never get an OTA or sync the time.

I have had to revert to using a roll-your-own NTP server to get the time reliably. Anyway, as part of this I thought I would try and sort the OTA, by doing the “phantom subscription” everytime I do a time sync.

The code below, should have updated the web console with a new value for the “last handshake” but it doesn’t.

I’m currently firing this code every 300sec so I don’t have to wait to long to see some results but this will be pushed out to ~3600 secs.

Any thoughts.
Regards
Marshall

void ntp_sync_time (void ){
    char cpyoftnow[32];

 	Particle.unsubscribe();		//get rid of old subscriptions.
	Particle.disconnect(); 
	delay(5000);				//make sure the disconnect sticks.


    snprintf(cpyoftnow, sizeof(cpyoftnow), "%s", (const char*)Time.timeStr());
    OPITO_DEBUG("Time prior to Sync is %s",cpyoftnow);

    ntp_sync(); //this blocks
    
    // Print current time
    snprintf(cpyoftnow, sizeof(cpyoftnow), "%s", (const char*)Time.timeStr());
    OPITO_DEBUG("Time after Sync is    %s",cpyoftnow);

     
	//this forces the unit to reconnect to the particle cloud. (ie do a new handshake) in case we have updates that we want to send down
   	Particle.subscribe(String(Time.now()), dummy, MY_DEVICES); //creates a different hash on boot so that it forces a particle reconnect
	Particle.connect();
}

#28

Syncing the time is done by calling Particle.syncTime() and not by forcing a handshake.
Optionally you could follow that with a waitFor(Time.isValid, 10000).
Even a mere Particle.disconnect(); delay(1000); Particle.connect(); should do the same less elegantly but without a full handshake.

A full handshake would do that too, but also demands a lot more data transfer than needed to merely sync the time.
Also why exactly are you using ntp_sync() and not the builtin feature?


#29

Thanks for the reply, my previous particle time sync attempt is at the bottom of this post. this code was based on example code from the forums or the documentation.

On occasion it doesn’t sync, and my customer has noticed the drift, as per the attached picture, additionally I have seen the time sync jump 30secs forwards in time. which messes up the timestamped data that I’m sending. In all it is too much black magic that I can’t rely on.

Secondly, and the reason for the post, is that at times, the devices will not perform a handshake, and therefore will not get the OTA that I need. I thought that I would stuff the handshake code in my NTP sync function as it only needs to happen once a day or once an hour. but this doesn’t work either - I simply can’t get it force a handshake reliably unless I reboot the modem / device.

Actually to moan a little bit… The cloud service seems a bit flaky. I have another product(s), that misses publishing the regular 10min “keep_alive” that I send with the battery health embedded in it. as it gets a “device_came_online” message in the middle of the events, I don’t really mind the device_came_online message. its annoying but documented that the particle keepalive time is too long, but you would think that doing a publish itself should perform all the necessary functions to make sure that the device reconnects and sends, as the device is non critical I haven’t bothered fixing it.

for my critical products.I have put in a ton of code and modded the boards with external WDT’s to try and keep these devices online and overcome all the “nuances” of the system.

I have yet another product that I have had to switch to MQTT altogether as I can’t rely on the service (and I wanted to send messages to groups of devices). - Although this meant I wasn’t affected by the recent outage experienced at particle.

I’m really dependent on Particle, as I have PCBs designed and built, and have just got contracts for several hundred devices, so I really appreciate the support - sorry for the moan, and apologies if it is a little unspecific.

Regards
Marshall

Here is the particle time sync code.

    #define ONE_DAY_MILLIS (24 * 60 * 60 * 1000)
//#define ONE_DAY_MILLIS (10 * 1000)

void cloud_sync_time (bool sync_right_now){
  time_t lastSyncTimestamp;
  char cpyoftnow[32];
  unsigned long lastSync;
  

  if (sync_right_now == false){
	lastSync = Particle.timeSyncedLast(lastSyncTimestamp);
  }
  else { //to sync immediately 
	lastSync = ONE_DAY_MILLIS + 1;
  }
  	if (millis() - lastSync > ONE_DAY_MILLIS) {
    unsigned long cur = millis();
    //OPITO_DEBUG("Time was last synchronized %lu milliseconds ago", millis() - lastSync);
    
	//snprintf(cpyoftnow, sizeof(cpyoftnow), "%s", (const char*)Time.timeStr(lastSyncTimestamp));
    //OPITO_DEBUG("Last Time Sync received from Particle Cloud was @: %s", cpyoftnow);
    // Request time synchronization from Particle Cloud

    snprintf(cpyoftnow, sizeof(cpyoftnow), "%s", (const char*)Time.timeStr());
    OPITO_DEBUG("Time prior to Sync is %s",cpyoftnow);

    Particle.syncTime();
    // Wait until Electron receives time from Particle Cloud (or connection to Particle Cloud is lost)
    waitUntil(Particle.syncTimeDone);
    // Check if synchronized successfully
    if (Particle.timeSyncedLast() >= cur)    {
      // Print current time
	  snprintf(cpyoftnow, sizeof(cpyoftnow), "%s", (const char*)Time.timeStr());
      OPITO_DEBUG("Time after Sync is    %s",cpyoftnow);
    }

  }
	
}

Here is a picture of customer noticing the drift. it’s the trend that is of interest.

drift


#30

If this is a critical product for you it may be best when you file a support ticket as we (most forum mods) are no Particle employees.


#31

Yeah, That’s why I appreciate the support so much, it takes a lot of time to respond to the forum so often, I hope that Particle give you all your product for free for all the support you provide!!


#32

Have you tried

Particle.publish("spark/device/session/end", "", PRIVATE);

Publishing this event will disconnect your session and force a new session to be created. Don’t call it too often, as it will use several K bytes of data to re-authenticate and create a new session, but it should work for force it.


The secret OTA sauce... reliable OTA updates
#33

No I haven’t tried that! - I’ll give it a go, where would I have found out about this? (there might be other great tidbits I can implement)

Regards
Marshall


#34

That’s one I didn’t know either - was this shared with the Elite before? Must have missed the memo :pensive:


#35

Here is the missing publish issue. - sorry if this is the wrong thread, please move it if it is.

As you can see, when the device comes online it is missing the regular 5min publish that should be there.


#36

BTW, are you using a Particle SIM or 3rd party?


#37

third party.

The force handshake works!


#38

With a 3rd party SIM your keep alive may be too long for the providers requirements.
While it is true that a publish will do the UDP hole punching, for that the first attempt after the hole had already closed will probably fail as it is “consumed” in the process.

Have you set Particle.keepAlive() after the connection gets established?
There is an open issue regarding that
https://github.com/particle-iot/firmware/issues/1482


#39

What is the difference between this and just calling Particle.disconnect + Particle.connect (which is what I’ve been doing). I’ve been looking for an alternative to Particle.disconnect + Particle.connect due to the fact that i run mine in SEMI_AUTOMATIC and Particle.connect comes with the blocking risk. Would calling Particle.publish here allow me to achieve the same thing without the blocking risk? Note, I have the SYSTEM_THREAD enabled.


#40

Particle.disconnect will stop actively using the cloud connection, but will reuse the session upon reconnection. Reusing the saved session is normally a good thing because it saves several K bytes of data usage upon Particle.connect, including waking from sleep.

However, there’s an unknown condition where sometimes you might have trouble communicating and starting a new session seems to help.


#41

I have struggled to get reliable OTA for several months. I have a VERY tight power budget and hence spend a lot of time asleep. The app only wakes for a few seconds every 15 minutes, and doing long idle connects didn’t seem like a great solution. After combing dozens of threads I finally stumbled upon this one and the suggest that @rickkas7 provided to force a session disconnect / reset using

Particle.publish(“spark/device/session/end”, “”, PRIVATE);

It works! I only do it infrequently (once every four hours) but ending the Session is the ONE THING that seems to reliably force a new product firmware release to load. If this is the 'best practice’ for OTA updates that it be shared extensively with the community. Thank you @rickkas7 !!


#42

OMG. This is exactly what I’m looking for. But is there anyway to achieve this through the cloud, rather than initiating from the device?


Some devices are not updating their "last heard" or "last handshake" Why?
#43

Yes. If you PUT the disconnect endpoint for a device it will reset the cloud connection from the cloud side.

curl -X PUT https://api.particle.io/v1/devices/<deviceid>/disconnect?access_token=<token>

This should also work from the product endpoint using a product bearer token:

curl -X PUT https://api.particle.io/v1/products/<productid>/devices/<deviceid>/disconnect?access_token=<token>

#44

You’re a :star:

Worked like a charm.


#46

I’ve been using the particle rules engine for forcing updates based on this guide here:
https://docs.particle.io/tutorials/iot-rules-engine/dynamic-firmware-management/

Is this helpful?