Random, periodic disconnecting

My Argons periodically disconnect/reconnect. Sometimes, like in the attached image, it doesn’t matter. However, sometimes it is in the middle of a game (they are used to control Escape Room puzzles) and it can screw things up, as messages need to come through quickly.

As you can see, it hits a wide range of devices and seemingly random times. All of these devices have a keepAlive set to 15 seconds, and have “if (!Particle.connected()) Particle.connect();” code inside of their loop functions.

Any advice would be appreciated.

Particle.keepAlive() is not a thing on Argons.
Are you using SYSTEM_THREAD(ENABLED)?
What device OS version are you running?
if(!Particle.connected()) Particle.connect(); without any means to guard against multiple executions while the reconnection process is active would counter your actual intent.

OS is 2.01

Not using SYSTEM_THREAD(ENABLED)

The if(!Particle.connected()) Particle.connect() is in the loop function.

I am using RK’s asynchronous queue for publishing messages, to avoid publishing messages more frequently than once per second.

You should guard against a reconnect
e.g.

  static uint32_t msLastConTime = 0;
  if (!Particle.connected() && millis() - msLastConTime < 60000) {
    Particle.connect();
    msLastConTime = millis();
  }

This way you will not knock back an ongoing connection attempt by re-initiating it for at least 60 seconds.

You may want to give it a try tho’

1 Like

I will, thanks. I’ll also post a follow up. If I get reconnects we’ll know that it didn’t work, if none in a week I think we’ll know that it did.

Well, adding the SYSTEM_THREAD(ENABLED) has messed up my message publishing. Many of my published events simply fail to show up. I switched from PublishQueueAsync to PublishQueue, but that didn’t help. Any ideas on that? The code looks as follows:

PublishQueue publishQueue(1000, false);

void MyClass::Publish(const char *format, ...)
{
    va_list arg;
    static char szLine[65];
    
    va_start (arg, format);
    vsnprintf(szLine, sizeof(szLine), format, arg);
    va_end (arg);
    
    static char who[32] = "";
    
    if (who[0] == 0)
        {
        char *ptr = strchr(VERSION, ',');
        memset(who, sizeof(who), 0);
        strncpy(who, VERSION, ptr - VERSION);
        }

    publishQueue.publish(who, szLine);
}

//Public Members
void PublishQueue::Publish(String eventName, String data)
{
    node my_node= {.eventName=eventName, .data=data};
    //Debug("Adding to queue: "+ my_node.eventName);
    my_queue.push(my_node);
    Process();
}


void PublishQueue::Process()
{
    
    //if (my_queue.size()>0)
    //{Debug("Queue size:  "+String(my_queue.size())+" _IsReadyToProcess():"+String(_IsReadyToProcess()) + " wait="+String(_intervalMillis-(millis()-lastMillis)));}
    
    //If queue is not empty process the next event.
    if (!my_queue.empty() && _IsReadyToProcess()) {
        node my_node=my_queue.front();
        my_queue.pop();
        
        if (_bSerial1)
            {
            Serial1.println(my_node.data);
            Serial1.flush();
            }
        else
            Particle.publish(my_node.eventName, my_node.data, 60, PRIVATE, WITH_ACK);
        lastMillis=millis(); //Update the time we last published
    }
    
}

And in loop and Delay (my delay function) I call publishQueue.Process().

If I comment out the SYSTEM_THREAD(ENABLED) line, they show up fine (both with Async and non).

Thoughts?

I think I answered my own question. I needed to add waitUntil(Particle.connected); to my setup function, before any messages get sent. It wasn’t immediately clear to me how early setup gets called with you have SYSTEM_THREAD(ENABLED) set.

Functioning again, now I can see if it helps my connection problems.

1 Like

Reconnection problem still occurs. Any new ideas?

Thanks!

1 Like

@IUnknown - It seems I also am troubleshooting a similar issue but on a Boron. I’ve been documenting my steps in a separate thread. You may want to read through the entire thread but here is the latest post: Boron - Seemingly random device reset - How to best capture the reason

It seems by conditioning when to use Particle.publish() vs publishQueue.Publish() based on Particle.connected has corrected or at least greatly improved my situation. I plan on rolling out the tet to 8 additional Boron’s this evening and see how long they can stay connected.

    if (Particle.connected()){
      Particle.publish("Stat", jw.getBuffer(), 60, PRIVATE, WITH_ACK);
    }
    else {
      publishQueue.publish("Stat", jw.getBuffer(), 60, PRIVATE, WITH_ACK);
    }

I would be curious to learn if they same would help you and your Argons or if it’s just me/something I’m doing. I’ll continue to post my results to my original thread but will follow this thread here too. Please continue to report what you find.

That wouldn’t work for me, because I need the messages at the time they are generated. The disconnect is still happening, but fortunately it hasn’t happened during a game (unlike previously), so these changes are an improvement. Just not a solution.

@IUnknown Yeah I get that… maybe try just using particle.Publish() instead of PublishQueue temporarily just to determine if the culprit is in fact PublishQueue. I understand this may not be a final solution but helps pinpoint where the issue is. You mentioned earlier you only use PublishQueue just to leverage the metering of events to 1 second. I wonder if you stay connected all the time, can you do that right within the sketch. I.e. limit how often it can call Particle.Publish() to once per second rather than relying on PublishQueue for this at least to understand if it temporarily helps your scenario.

I did that initially. The problem is that makes the Publish events blocking, which interferes with other actions going on. When I have a very import event to publish that does go straight to Particle.publish, and since it is a single event I don’t worry about throttling. Those have been caught up in the disconnect problem as well. And, when that happens, it’s worse, since these are the very messages I need to get out.

I did try switch between PublishQueue and PublishQueueAsync, and that didn’t seem to make a difference in the outcome.

Hi, what I will propose is not a solution to your problem, but a way to add robustness to your system.

One way to go about getting events that your system should not miss is to add a mechanism to ACKnowledge the reception or if not received, retransmit the event again.

This is a TCP flow that shows the idea:
image

Best,
Gustavo.

Thanks, I appreciate that. However, that would only partially solve the problem, as what I need is timely delivery of the messages. Once the Particle reconnects the message gets delivered, it’s just late, which messes with the experience.

Hello @IUnknown,

Did you notice any SOS lights (red RGB blinking) when these reconnection events happen? If not, is that just a reconnection event or is your device resetting?

Just reconnecting. No fault or reboot.

I just noticed this on the Particle documentation site for the Argon:

For the Argon, the keep-alive is not generally needed. However, in unusual networking situations if the network router/firewall removes the port forwarded back-channels unusually aggressively, you may need to use a keep-alive.

I’m wondering if this might be router related. We use Spectrum, and have their provided routers. Is there some function we could enable/disable that might help here?