Excessive number of disconnect events on multiple devices

We brought this issue to Particle’s attention about three months ago (having realized it was going on since January). Nothing could be found. The issue last week with the cache problems didn’t cause this, but it might have exposed more devices showing this distinct behavior. We have over 20 photons offline, because eventually they go offline and a power cycle is needed. Then we have 90+ photons that show this online/offline behavior sometimes 75-95 times in a two hour period. It is also impacting the ability of our firmware to work as designed.

2 Likes

Hey everyone,

We’ll be looking into this more next week now that things are settling down on the UDP front.

Based on reported behaviors, I am noticing a mix of reported behaviors. Some incidents indicate the Photon is actually going offline–whereas in other cases, it appears the Photon is reported offline but still behaving as intended.

I anticipate there are a couple of different issues occurring here. For those whose Photons are still functioning despite reporting offline–I suspect this is just a reporting bug and may not indicate any issues with the device or application code itself. This is only a preliminary assessment, however.

We will be looking into all connectivity and status issues for TCP devices soon, though!

If it is useful to know: I have a photon on 1.0.0 doing the same thing for weeks, running on a breadboard with nothing connected in order to monitor cloud connection stability.

It constantly reports offline online in the console ex. twice per hour, but the console disconnect counter only counts 7 after more than 24 hours.

Hey everyone,

We identified the cause of the online and offline event issue and are testing a fix now. We are shooting to have it released early next week.

In some situations, Photons may incorrectly publish spark/status offline events despite the device being healthy and online.

This issue is an unintentional impact of work being done to improve the average reliability of the online indicator across the Particle Cloud.

We are working on a more public announcement about the online indicator improvements that will include additional details about these events.

4 Likes

I’m sorry Stan but that doesn’t explain our issues. During this time we are unable to send commands to or receive updates from our Photons. So an “incorrect status” doesn’t really explain away out problem. our devices should be reporting in their measurements and they stop…and then resume after the new online status is updated.

1 Like

Hi @arklabs_josh

Please refer to the post just before my latest:

More than one type of issue is being acknowledged here. One issue refers to the status indicator for users whose devices continue to operate while offline. There are other, yet to be determined issues, that are acknowledged to be more than just improper status reporting.

As stated, we will be looking into all status and connectivity issues in the next week.

The status issues were quick to identify and were a quick fix, hence why these are being addressed already. The issues in regards to why devices such as your are actually going offline will take a bit more investigation. The intent is to put focus into this at the start of next week.

We recognize there's still work left to be done. So no worries, this case isn't closed yet. :slight_smile: We just need a bit of patience so that engineering has time to dig into this issue in-depth.

1 Like

Stan, are there any updates on this major issue?

It’s my understanding a fix for the online/offline status indicator was tested and is working. I am uncertain yet if this has been rolled out to production yet from stating. If should be soon if it is not already.

Engineering is still investigating issues into other Photon connectivity. As I have more information, I’ll be sure to update.

Would you be able to send me specific device IDs that are experiencing this issue?

If you are able, sharing your user application would also be helpful in this case. You may feel free to share both with me in a private, direct message.

We’ve been working directly with Dave Blevins on this issue. If you can’t get the ID’s from him I’ll be happy to supply them.

I sent Matt the list of devices a little while ago.

As mentioned above, Dave was able to share these with me. Much appreciated. :slight_smile:

Has there been any movement on this? I keep getting emails from particle asking if my problem has been resolved, replying doesn’t get me an actual response from them. I still have several devices with several hundred, some with over one thousand disconnect events. My customers are starting to be affected as I have one who can’t use two of their devices.

Hi Michael. Would you be able to provide me a few sample device IDs so that I can look into these on our end? Implementations have gone out in the past few weeks to handle online status indication as well as a fix for our 5/3 Redis crash incident. I'd like to dig into this a bit and see what might be going wrong here.

2 Likes

Sent a few of them. I do have others, if you would like them all, I’ll sit down tomorrow and go through them.

1 Like