Spark/status offline-online cycling on some devices , and cloud disconnect events

Hi guys, I’m seeing situations where I have some deployed devices that are going through frequent online-offline cycling.
I’m trying to get a better idea of what is involved in determining the spark/status offline and spark/status online, and what constitutes a cloud disconnect event, so that we can troubleshoot these devices better.

We are using an Argon connected to a cellular router, and have experienced that that sometimes the cellular antenna is damaged/vandalized - while we may be able to get connectivity, the connectivity in this case is decidedly(and not unexpectedly) poor.

As you can see in the image below, we can see that in many cases the device gets a spark/status event of offline at basically the same timestamp as the online event.
We’ve also had some situations where individual devices we have 3000+ cloud disconnect events.

My base assumption is that situations where we have these offline/online cycling are areas that we likely have antenna (and therefore connectivity issues), but hopefully knowing more about what triggers the offline/online cycling (and cloud disconnect events) can help determining if that assumption is accurate.

The Particle.keepalive is set to 20 (seconds), partially as an attempt to see if this helps regarding this issue.

The online event occurs when the device handshakes with the cloud. This typically occurs on cold boot, when waking up from most sleep modes, and on reset (button, System.reset, or OTA). It also occurs whenever the device lost connectivity with the cloud, then reestablishes it.

The offline event occurs when the device tells the cloud it’s about to disconnect. This occurs in some sleep modes, OTA, System.reset with the graceful flag set. The two other cases where the offline event occurs:

When an online occurs, and the cloud did not realize it was offline. When this occurs, the timestamps will be nearly identical. This can happen with button reset, power outage, or poor connectivity.

The other case is if the cloud does not receive a keep-alive ping or a publish for slightly more than two times the keep-alive value. This occurs when the device loses connectivity or is powered off.

Hi Rickkas - probably a stupid question, but whose perspective is it from? In other words, is the device self-declaring offline (i.e. it was unable to connect, so it will re-negotiate a new session), then online, or is it the cloud declaring the device as offline and online - or some mixture of both?

The status is from the point of view of the cloud. When possible the device will state its intention to go offline (graceful disconnect) but there are numerous situations where this is not possible (pin or button reset, power outage, bad connectivity), in which case the fallback of 2x keep-alive with no messages is what causes the offline event.