Boron - Seemingly random device reset - How to best capture the reason

In my further testing this afternoon, there is definitely some cross dependency between some or all of the following:

  1. The library PublishQueueAsyncRK
  2. Calling PublishQueue when Connected vs when not connected and then connecting later for events to be published.
  3. Threading being enabled or disabled
  4. The use of a log handler within the main sketch (i.e. SerialLogHandler logHandler;)
  5. The Boron serial port is being actively monitored by a PC to accept the messages or by itself.

Here was my results from a few initial tests in hopes to correct the problem:
Test 1: Plug the device into a USB wall charge rather than the PC (maybe it’s a power thing, doubtful but trying to think what can be different with a USB cable.
Result from test 1: As expected, no change in behavior, still would disconnect frequently (at least once or twice per hour)

Test 2: Disable Serial monitoring in the firmware by commenting out this line in the main sketch:

//SerialLogHandler logHandler;

Results from test 2: Interestingly the sketch would compile however, this caused almost immediate and constant disconnects as soon as PublishQueue.Publish() was called and by constant I mean it was every 30 seconds - 2 minutes. Since each disconnect/reconnect can burn up data, I didn’t let this one run very long.

Test 3: Try Particle.Publish() instead of PublishQueue.Publish() and repeat (maybe PublishQueue.Publish() is doing something that causes the cloud disconnect events).
Result from test 3: Significantly improved connectivity with zero disconnects in 3 hours (so far).

My immediate next step:

  1. Update all 8 devices to conditional use Particle.Publish ONLY when Particle.Connected() with all devices disconnect from a PC. Set 4 devices to stayAwake and 4 devices to sleep, take readings, call publishQueue.Publish and then connect. Identify if this corrects the random reset/cloud disconnect issue. Using this looks very promising and I’m hopeful it will correct the issue for me! I will use something like this:
    if (Particle.connected()){
      Particle.publish("Stat", jw.getBuffer(), 60, PRIVATE, WITH_ACK);
    }
    else {
      publishQueue.publish("Stat", jw.getBuffer(), 60, PRIVATE, WITH_ACK);
    }

Results: Although I thought this looked promising, the disconnect event still happens when the device connects to the cloud and starts unloading the queued events:

  1. If step 1 is successful, repeat the same test but with threading enabled - CANCELED due to results above.

3a) Test for tonight. Let’s validate there is not an issue when I use Particle.Publish() on all 8 devices and keep the devices “awake”.

Results: Out of the 8 devices publishing every 5 minutes the last 8 hours using only Particle.Publish() I had zero disconnects at the moment of the publish event with duplicate publish events showing up. So this definitely seems related to publishQueue.publish(). 6 devices out of the 8 had 0 total disconnects. 1 had 4 disconnects (but not around a publish event) and the last one had 5 disconnects (also not around a publish event). What I mean by not around a publish event, is I publish at the top of the hour and at 5 minute intervals. I.e. 1:00, 1:05, 1:10, 1:15… The disconnects in this test on the two devices were always in between those publish events. I.e. @ 1:03 for example. I suspect this is unrelated and more cell signal quality/strength related.

3b) Modify Logging Level to determine impact. At one point, I tried LOG_LEVEL_NONE and it would not maintain connection. Went back to LOG_LEVEL_INFO, flashed and connection was maintained. I need to test this more yet. Given this 5 second test and no issues when connected to the PC, it seems like there is something related to logging level and PublishQueue().

  1. Create a bare bones sketch, test each possible combination of options between Logging, connected to PC or not, Publish Queue used or not, and threading enabled or not. Document the sketch used for each test as well as the results and provide them to someone way smarter than me. :slight_smile: Include a test on other/earlier versions of firmware as I do not recall having this issue in 1.5.x. I hope to be able to share this tomorrow 1/4/2021.

By the way… this testing has been on LTS release 2.0.1. Not sure if the issue is with that version of the LTS release, the PublishQueueAsyncRK or maybe just my lack of understanding of all of this stuff. :slight_smile: In any case, feels like I’m getting close on a resolution (I Hope!)

1 Like