Some devices are not updating their "last heard" or "last handshake" Why?

electron
Tags: #<Tag:0x00007f1ca8ae5008>

#1

I have a set of devices (Electrons running Device OS 1.0.1) that are all running the same firmware. The code is almost 800 lines long so, here is the Github link: https://github.com/chipmc/Cellular-Pressure. Here is the concept of operations:

  1. During the park’s open hours, they “nap” until a car comes by: System.sleep(intPin, RISING, wakeInSeconds);
  2. Once an hour, they wake up (without resetting). and report their data using a Webhook - all these devices do this even the ones who were “last heard” days ago have been reliably reporting data every hour.
  3. Every night these devices “sleep”: System.sleep(SLEEP_MODE_DEEP, wakeInSeconds); waking (restarting program execution from scratch) every hour to pet the watchdog but not connecting.
  4. Every morning, when the park opens, we go back to #1

Some of these Electrons are not maintaining their “last handshake” of “last heard” status while the other 90% do. Why would this be the case? I looked at the device diagnostics from one of these devices and while there are the occasional disconnect events and one has “fair” signal strength that does not seem to explain why these devices are different from the rest.

This process seems to keep the devices connected - again - except for about 10% - not always the same ones and not even them consistently. What I am looking for is some solution that will improve the consistency and reliability of the devices over time - especially as I add more to the fleet.

I did look at asimilar post which discussed this in terms of availability for OTA updates and I saw there were two solutions proposed:

But, these solutions drive additional data usage and for the 90% that are working correctly, it seems like overkill. Any suggestions?

Thanks,

Chip


#2

@ParticleD, or @mstanley are you able to assist?


#3

Hey chipmc,

Thanks for reaching out. Sorry on the delay to get back to you. I think I’d like to get some expertise from @rickkas7 on this.


#4

@mstanley,

Thank you, I would appreciate any advice. As I posted above, I can manually send a curl command that initiates a disconnect and then, on the next hour, the issue is resolved. But, thinking ahead, I would like to know if there is anything I could do in my software to prevent this from happening as my fleet of Particle devices grows.

Chip


#5

Trying to do a better job of identifying this state so, perhaps, I can initiate a correction automatically.

Here are the signs that I see indicating there is an issue with the state of the device’s connection:

  1. It can publish a Webhook but lost its subscription to the response
  2. It can publish and receive the response to Webhooks but, the “last heard” time does not get updated.
  3. On the particle console, the Particle.functions() and Particle.variables() lists are unpopulated
  4. I cannot push a firmware update to the device

In each case, initiating a “disconnect” via a curl command fixes the issue. But, this is reactive and what I want to do is either prevent or automatically fix this issue when it happens.

Thanks, Chip


#6

The SessionCheck part of the electronsample app does this.

What it does is send a self-subscribed event to the cloud. If you’re in the state where the device has lost its cloud connection this can discover that it’s happened and then issue the session end event from the device side. This should cause it to reconnect.


#7

@rickkas7,

Thank you for sending and it seems that this is exactly what I need. I will test this and see how it does.

Thanks,

Chip


#8

one thing i’ve noticed since putting my 2G electron back online is the console will show a date & time on the devices screen and when i click the electron field the electron status page will show a different time and date, usually 5 to 10 mins prior to when i access the page. i know this time & date is most likely incorrect because the electron is in my field of vision and i know it has been sitting there for quite a while breathing cyan. then when i click back to the devices page it shows the date & time from the electron page but if i reload the devices page it will show the original time & date, which i think is correct because that was the last time i saw it reconnect. it may be i don’t understand handshake, or making a secure connection but i think it’s when for whatever reason the electron goes through the sequence cell connect/cloud connect/connected to cloud illustrated by the LED sequence or via code which would just disconnect & reconnect so in this case would it still breathe cyan? . in that case then the handshake times shown on the electron page are incorrect. if the electron can do a secure handshake while breathing cyan then it must just be a coincidence that every time i hit the electron status page the date & time is ~ 5 to 10 minutes prior.


#9

@dkryder,

Have not noticed this behavior but, now that you mention it, I will take a look.

Thanks, Chip


#10

@rickkas7,

Thank you for reminding em about your reliability sampler, it is excellent. I have been testing it and have run into one issue - when my device sleeps for an hour, it can fall out of sync with the connection tester. Even though the Electron is waking every hour, it is getting reset during the hour because it did not have a cloud connection. Is there a way to “pet” the connection tracker when the Electron wakes on the hour so, the connection is validated and the Electron is allowed to nap or sleep until the next period?

Here is a console log:
44%20PM

Thanks,