Random, periodic disconnecting

IUnknown · December 30, 2020, 11:44pm

My Argons periodically disconnect/reconnect. Sometimes, like in the attached image, it doesn’t matter. However, sometimes it is in the middle of a game (they are used to control Escape Room puzzles) and it can screw things up, as messages need to come through quickly.

As you can see, it hits a wide range of devices and seemingly random times. All of these devices have a keepAlive set to 15 seconds, and have “if (!Particle.connected()) Particle.connect();” code inside of their loop functions.

Any advice would be appreciated.

ScruffR · December 31, 2020, 8:22am

Particle.keepAlive() is not a thing on Argons.
Are you using SYSTEM_THREAD(ENABLED)?
What device OS version are you running?
if(!Particle.connected()) Particle.connect(); without any means to guard against multiple executions while the reconnection process is active would counter your actual intent.

IUnknown · December 31, 2020, 10:37am

OS is 2.01

Not using SYSTEM_THREAD(ENABLED)

The if(!Particle.connected()) Particle.connect() is in the loop function.

I am using RK’s asynchronous queue for publishing messages, to avoid publishing messages more frequently than once per second.

ScruffR · December 31, 2020, 1:07pm

You should guard against a reconnect
e.g.

  static uint32_t msLastConTime = 0;
  if (!Particle.connected() && millis() - msLastConTime < 60000) {
    Particle.connect();
    msLastConTime = millis();
  }

This way you will not knock back an ongoing connection attempt by re-initiating it for at least 60 seconds.

You may want to give it a try tho'

IUnknown · December 31, 2020, 6:27pm

I will, thanks. I’ll also post a follow up. If I get reconnects we’ll know that it didn’t work, if none in a week I think we’ll know that it did.

IUnknown · December 31, 2020, 9:37pm

Well, adding the SYSTEM_THREAD(ENABLED) has messed up my message publishing. Many of my published events simply fail to show up. I switched from PublishQueueAsync to PublishQueue, but that didn’t help. Any ideas on that? The code looks as follows:

PublishQueue publishQueue(1000, false);

void MyClass::Publish(const char *format, ...)
{
    va_list arg;
    static char szLine[65];
    
    va_start (arg, format);
    vsnprintf(szLine, sizeof(szLine), format, arg);
    va_end (arg);
    
    static char who[32] = "";
    
    if (who[0] == 0)
        {
        char *ptr = strchr(VERSION, ',');
        memset(who, sizeof(who), 0);
        strncpy(who, VERSION, ptr - VERSION);
        }

    publishQueue.publish(who, szLine);
}

//Public Members
void PublishQueue::Publish(String eventName, String data)
{
    node my_node= {.eventName=eventName, .data=data};
    //Debug("Adding to queue: "+ my_node.eventName);
    my_queue.push(my_node);
    Process();
}


void PublishQueue::Process()
{
    
    //if (my_queue.size()>0)
    //{Debug("Queue size:  "+String(my_queue.size())+" _IsReadyToProcess():"+String(_IsReadyToProcess()) + " wait="+String(_intervalMillis-(millis()-lastMillis)));}
    
    //If queue is not empty process the next event.
    if (!my_queue.empty() && _IsReadyToProcess()) {
        node my_node=my_queue.front();
        my_queue.pop();
        
        if (_bSerial1)
            {
            Serial1.println(my_node.data);
            Serial1.flush();
            }
        else
            Particle.publish(my_node.eventName, my_node.data, 60, PRIVATE, WITH_ACK);
        lastMillis=millis(); //Update the time we last published
    }
    
}

And in loop and Delay (my delay function) I call publishQueue.Process().

If I comment out the SYSTEM_THREAD(ENABLED) line, they show up fine (both with Async and non).

Thoughts?

IUnknown · January 2, 2021, 12:46am

I think I answered my own question. I needed to add waitUntil(Particle.connected); to my setup function, before any messages get sent. It wasn't immediately clear to me how early setup gets called with you have SYSTEM_THREAD(ENABLED) set.

Functioning again, now I can see if it helps my connection problems.

IUnknown · January 2, 2021, 5:27pm

Reconnection problem still occurs. Any new ideas?

Thanks!

jgskarda · January 4, 2021, 2:07am

@IUnknown - It seems I also am troubleshooting a similar issue but on a Boron. I’ve been documenting my steps in a separate thread. You may want to read through the entire thread but here is the latest post: Boron - Seemingly random device reset - How to best capture the reason

It seems by conditioning when to use Particle.publish() vs publishQueue.Publish() based on Particle.connected has corrected or at least greatly improved my situation. I plan on rolling out the tet to 8 additional Boron’s this evening and see how long they can stay connected.

    if (Particle.connected()){
      Particle.publish("Stat", jw.getBuffer(), 60, PRIVATE, WITH_ACK);
    }
    else {
      publishQueue.publish("Stat", jw.getBuffer(), 60, PRIVATE, WITH_ACK);
    }

I would be curious to learn if they same would help you and your Argons or if it’s just me/something I’m doing. I’ll continue to post my results to my original thread but will follow this thread here too. Please continue to report what you find.

IUnknown · January 4, 2021, 4:52pm

That wouldn’t work for me, because I need the messages at the time they are generated. The disconnect is still happening, but fortunately it hasn’t happened during a game (unlike previously), so these changes are an improvement. Just not a solution.

jgskarda · January 4, 2021, 5:09pm

@IUnknown Yeah I get that… maybe try just using particle.Publish() instead of PublishQueue temporarily just to determine if the culprit is in fact PublishQueue. I understand this may not be a final solution but helps pinpoint where the issue is. You mentioned earlier you only use PublishQueue just to leverage the metering of events to 1 second. I wonder if you stay connected all the time, can you do that right within the sketch. I.e. limit how often it can call Particle.Publish() to once per second rather than relying on PublishQueue for this at least to understand if it temporarily helps your scenario.

IUnknown · January 4, 2021, 5:48pm

I did that initially. The problem is that makes the Publish events blocking, which interferes with other actions going on. When I have a very import event to publish that does go straight to Particle.publish, and since it is a single event I don’t worry about throttling. Those have been caught up in the disconnect problem as well. And, when that happens, it’s worse, since these are the very messages I need to get out.

I did try switch between PublishQueue and PublishQueueAsync, and that didn’t seem to make a difference in the outcome.

gusgonnet · January 4, 2021, 6:06pm

Hi, what I will propose is not a solution to your problem, but a way to add robustness to your system.

One way to go about getting events that your system should not miss is to add a mechanism to ACKnowledge the reception or if not received, retransmit the event again.

This is a TCP flow that shows the idea:

Best,
Gustavo.

IUnknown · January 5, 2021, 10:44pm

Thanks, I appreciate that. However, that would only partially solve the problem, as what I need is timely delivery of the messages. Once the Particle reconnects the message gets delivered, it’s just late, which messes with the experience.

Gildons · January 5, 2021, 10:54pm

Hello @IUnknown,

Did you notice any SOS lights (red RGB blinking) when these reconnection events happen? If not, is that just a reconnection event or is your device resetting?

IUnknown · January 6, 2021, 1:07am

Just reconnecting. No fault or reboot.

IUnknown · January 6, 2021, 1:29am

I just noticed this on the Particle documentation site for the Argon:

For the Argon, the keep-alive is not generally needed. However, in unusual networking situations if the network router/firewall removes the port forwarded back-channels unusually aggressively, you may need to use a keep-alive.

I’m wondering if this might be router related. We use Spectrum, and have their provided routers. Is there some function we could enable/disable that might help here?

system · July 7, 2021, 1:29pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Outdated Session Data Troubleshooting electron	10	1264	May 30, 2019
Particle cloud disconnects every 2 minutes Troubleshooting boron	7	972	November 3, 2019
What happens to the UART connection of Argon while the device try to reconnect to WI-Fi? Hardware argon	6	481	March 16, 2022
Argon- System Mode Manual- Infrequent publishes Firmware argon	15	1156	February 24, 2021
Argon stops responding to pings or flash requests Troubleshooting argon	3	725	February 9, 2019

Random, periodic disconnecting

Related topics