How to force a handshake for OTA updates


#21

@ScruffR just FYI, in your code, we needed to take out “PRIVATE”, but then it appears to be working okay since then. We’re having problems with it handshaking every single time, but that may be something wrong in our code logic.

void dummy(const char* filter, const char* data) {
}

void setup() {
  // all your usual stuff
  ...

  if (someConditionToOnlyDoWhenNeeded == true) {
    // this is just to add mutating data into the hash
    Particle.subscribe(String(Time.now()), dummy, PRIVATE);
  }
}

Also, according to this link, Particle forces a handshake once per week (see quote below). That would be sufficient for us, but that is not the case of what is happening. Any update on this?

“On the Electron we use UDP, which is connection-less. An Electron can go to sleep, wake up sometime later, and talk to the Particle cloud without performing a handshake, continuing to use the DTLS session credentials it previously established. We currently require an Electron to handshake at least once per week, and this is subject to change. The data usage of handshakes is thus dramatically reduced from hundreds of KB per week for sleepy Photons to a few KB per week for sleepy Electrons.”


#22

Sorry, for Particle.subscribe() it should be MY_DEVICES instead of PRIVATE :blush:

I might be wrong, but to me that could also mean that “we” (=Particle) wants the device to handshake once a week (for whatever purpose) but not necessarily that the device is made to do so by itself.
Which would line up with your experience that this does in fact not happen.

Maybe @rickkas7 has some insight on that.


#23

Hi folks, this thread was just brought to my attention.

Thanks for the great discussion, helpful use cases, justified and fair expressions of frustration, and concrete details of what you’ve tried. I have two follow-up notes on this thread.

  • A missing feature that is absolutely possible and on our roadmap but that we haven’t built yet is the ability to force a more immediate OTA update to devices in a product fleet from the console (instead of waiting for the next handshake). You can definitely expect this feature to be released in 2018, probably in the first half. (cc: product managers @jeiden @jberi)
  • Electron handshakes should happen at least once a week as documented. If a UDP device session were active more than a week after it was created, that would be a bug. I’ll make sure a :particle: engineer tests this and responds here in January. (cc: device team lead @mdma)
    • Edit: A separate bug would be if the console doesn’t show that a handshake has happened.

Cheers,
Zachary


#24

I just posted firmware issue 1453 for those interested in following along.


#25

Thanks Zachary!

Just an update on my end, we have handshakes happening now on every session when my devices connect even when we have it configured to happen in our code once per week (not sure if it’s just a firmware issue on my end or not - using what @ScruffR suggested).

Just to clarify one thing, my devices go to deep sleep and don’t stay connected, and then when they wake up, they don’t handshake once per week. I don’t have my devices continually connected to the network. This is probably useful for @jeiden and @jberi to know. @mdma I have some devices that haven’t performed a handshake in months. I’m not sure if it’s the console not showing it or not, but it really seems like it’s not handshaking at all.


#26

If you don’t mind, please PM me some device IDs that exhibit these behaviors.


Issues with Product firmware
#27

Hi all the code for forcing a handshake above doesn’t work for me.

I have major issues getting the devices to remain connected, I need to sync the time. My devices are online all the time (they never sleep) and basically they go deaf.

I can still reset them remotely via a function call, however they never complete a handshake, and therefore never get an OTA or sync the time.

I have had to revert to using a roll-your-own NTP server to get the time reliably. Anyway, as part of this I thought I would try and sort the OTA, by doing the “phantom subscription” everytime I do a time sync.

The code below, should have updated the web console with a new value for the “last handshake” but it doesn’t.

I’m currently firing this code every 300sec so I don’t have to wait to long to see some results but this will be pushed out to ~3600 secs.

Any thoughts.
Regards
Marshall

void ntp_sync_time (void ){
    char cpyoftnow[32];

 	Particle.unsubscribe();		//get rid of old subscriptions.
	Particle.disconnect(); 
	delay(5000);				//make sure the disconnect sticks.


    snprintf(cpyoftnow, sizeof(cpyoftnow), "%s", (const char*)Time.timeStr());
    OPITO_DEBUG("Time prior to Sync is %s",cpyoftnow);

    ntp_sync(); //this blocks
    
    // Print current time
    snprintf(cpyoftnow, sizeof(cpyoftnow), "%s", (const char*)Time.timeStr());
    OPITO_DEBUG("Time after Sync is    %s",cpyoftnow);

     
	//this forces the unit to reconnect to the particle cloud. (ie do a new handshake) in case we have updates that we want to send down
   	Particle.subscribe(String(Time.now()), dummy, MY_DEVICES); //creates a different hash on boot so that it forces a particle reconnect
	Particle.connect();
}

#28

Syncing the time is done by calling Particle.syncTime() and not by forcing a handshake.
Optionally you could follow that with a waitFor(Time.isValid, 10000).
Even a mere Particle.disconnect(); delay(1000); Particle.connect(); should do the same less elegantly but without a full handshake.

A full handshake would do that too, but also demands a lot more data transfer than needed to merely sync the time.
Also why exactly are you using ntp_sync() and not the builtin feature?


#29

Thanks for the reply, my previous particle time sync attempt is at the bottom of this post. this code was based on example code from the forums or the documentation.

On occasion it doesn’t sync, and my customer has noticed the drift, as per the attached picture, additionally I have seen the time sync jump 30secs forwards in time. which messes up the timestamped data that I’m sending. In all it is too much black magic that I can’t rely on.

Secondly, and the reason for the post, is that at times, the devices will not perform a handshake, and therefore will not get the OTA that I need. I thought that I would stuff the handshake code in my NTP sync function as it only needs to happen once a day or once an hour. but this doesn’t work either - I simply can’t get it force a handshake reliably unless I reboot the modem / device.

Actually to moan a little bit… The cloud service seems a bit flaky. I have another product(s), that misses publishing the regular 10min “keep_alive” that I send with the battery health embedded in it. as it gets a “device_came_online” message in the middle of the events, I don’t really mind the device_came_online message. its annoying but documented that the particle keepalive time is too long, but you would think that doing a publish itself should perform all the necessary functions to make sure that the device reconnects and sends, as the device is non critical I haven’t bothered fixing it.

for my critical products.I have put in a ton of code and modded the boards with external WDT’s to try and keep these devices online and overcome all the “nuances” of the system.

I have yet another product that I have had to switch to MQTT altogether as I can’t rely on the service (and I wanted to send messages to groups of devices). - Although this meant I wasn’t affected by the recent outage experienced at particle.

I’m really dependent on Particle, as I have PCBs designed and built, and have just got contracts for several hundred devices, so I really appreciate the support - sorry for the moan, and apologies if it is a little unspecific.

Regards
Marshall

Here is the particle time sync code.

    #define ONE_DAY_MILLIS (24 * 60 * 60 * 1000)
//#define ONE_DAY_MILLIS (10 * 1000)

void cloud_sync_time (bool sync_right_now){
  time_t lastSyncTimestamp;
  char cpyoftnow[32];
  unsigned long lastSync;
  

  if (sync_right_now == false){
	lastSync = Particle.timeSyncedLast(lastSyncTimestamp);
  }
  else { //to sync immediately 
	lastSync = ONE_DAY_MILLIS + 1;
  }
  	if (millis() - lastSync > ONE_DAY_MILLIS) {
    unsigned long cur = millis();
    //OPITO_DEBUG("Time was last synchronized %lu milliseconds ago", millis() - lastSync);
    
	//snprintf(cpyoftnow, sizeof(cpyoftnow), "%s", (const char*)Time.timeStr(lastSyncTimestamp));
    //OPITO_DEBUG("Last Time Sync received from Particle Cloud was @: %s", cpyoftnow);
    // Request time synchronization from Particle Cloud

    snprintf(cpyoftnow, sizeof(cpyoftnow), "%s", (const char*)Time.timeStr());
    OPITO_DEBUG("Time prior to Sync is %s",cpyoftnow);

    Particle.syncTime();
    // Wait until Electron receives time from Particle Cloud (or connection to Particle Cloud is lost)
    waitUntil(Particle.syncTimeDone);
    // Check if synchronized successfully
    if (Particle.timeSyncedLast() >= cur)    {
      // Print current time
	  snprintf(cpyoftnow, sizeof(cpyoftnow), "%s", (const char*)Time.timeStr());
      OPITO_DEBUG("Time after Sync is    %s",cpyoftnow);
    }

  }
	
}

Here is a picture of customer noticing the drift. it’s the trend that is of interest.

drift


#30

If this is a critical product for you it may be best when you file a support ticket as we (most forum mods) are no Particle employees.


#31

Yeah, That’s why I appreciate the support so much, it takes a lot of time to respond to the forum so often, I hope that Particle give you all your product for free for all the support you provide!!


#32

Have you tried

Particle.publish("spark/device/session/end", "", PRIVATE);

Publishing this event will disconnect your session and force a new session to be created. Don’t call it too often, as it will use several K bytes of data to re-authenticate and create a new session, but it should work for force it.


The secret OTA sauce... reliable OTA updates
nRF52840 Hardware Watchdog Question
Product Boron not updating to the newer firmware version when software resetting
#33

No I haven’t tried that! - I’ll give it a go, where would I have found out about this? (there might be other great tidbits I can implement)

Regards
Marshall


#34

That’s one I didn’t know either - was this shared with the Elite before? Must have missed the memo :pensive:


#35

Here is the missing publish issue. - sorry if this is the wrong thread, please move it if it is.

As you can see, when the device comes online it is missing the regular 5min publish that should be there.


#36

BTW, are you using a Particle SIM or 3rd party?


#37

third party.

The force handshake works!


#38

With a 3rd party SIM your keep alive may be too long for the providers requirements.
While it is true that a publish will do the UDP hole punching, for that the first attempt after the hole had already closed will probably fail as it is “consumed” in the process.

Have you set Particle.keepAlive() after the connection gets established?
There is an open issue regarding that
https://github.com/particle-iot/firmware/issues/1482


#39

What is the difference between this and just calling Particle.disconnect + Particle.connect (which is what I’ve been doing). I’ve been looking for an alternative to Particle.disconnect + Particle.connect due to the fact that i run mine in SEMI_AUTOMATIC and Particle.connect comes with the blocking risk. Would calling Particle.publish here allow me to achieve the same thing without the blocking risk? Note, I have the SYSTEM_THREAD enabled.


#40

Particle.disconnect will stop actively using the cloud connection, but will reuse the session upon reconnection. Reusing the saved session is normally a good thing because it saves several K bytes of data usage upon Particle.connect, including waking from sleep.

However, there’s an unknown condition where sometimes you might have trouble communicating and starting a new session seems to help.