Advice/best practice for unreliable internet connections

HG78 · November 14, 2020, 2:02pm

Hi,
I’m using an Argon to run a remote sampling site for some academic research. I need to have the Argon send some I2C commands at time intervals controlled by an RTC (Adalogger Featherwing).

My problem is that the remote sites where the Argon will be based don’t always have the best (or any…) internet connections. If I lose signal at the wrong time the device could try to re-establish a non-existent connection at the exact time it is supposed to be sending the I2C commands, thus missing the moment when I need to collect the data.

Is there a best-practice guide to working in these kind of unreliable internet connections situations?

I presume the correct thing for me to do would be to put the device in MANUAL mode and have a timeout to stop a Particle.connect() or Particle.process() if there will be a clash with the RTC-derrived sampling time? If so, does anyone know of some code examples that I could take a look at?

Thanks for you help

armor · November 14, 2020, 4:01pm

@HG78 Best to break the problem down and also understand what you are doing because it is not clear from your post.

You use the RTC and micro SD card to wake the Argon and store you readings from a sensor?
You have a sensor connected via I2C bus - would be helpful to know which sensor?
You connect to the cloud every so often and publish the sensor readings?

My suggestions of what to use are based upon a using a DS3231 RTC and a FRAM I2C memory but you could use the Adafruit DataLogger with its PFC5823 I2C RTC and battery, plus your microSD card on SPI.

What you need to do to make it reliable is to buffer the sensor reading from the event sending.

Always use SYSTEM_THREAD(ENABLED); the system mode is not important if you use the following library; PublishQueueAsyncRK. This will enable you to buffer your event data and publish from your application thread without needing to worry about whether the device is online or not. You then have options where to buffer the events.

I would use Retained or Backup RAM 3K on the Argon for the send buffer, but you could use the microSD card - it all depends upon how long the device might be out of range of the WiFi and how much data you are collecting. 3K was too small for my application and I didn’t have the SD card so used a 128KB Fram memory chip via I2C.

There is also a library for the RTC and datalogger - SdCardLogHandlerRK (https://github.com/rickkas7/SdCardLogHandlerRK).

robc · November 14, 2020, 6:54pm

below is an example you can try. @armor mentioned using the library: PublishQueueAsyncRK
I placed that library in the example so you could check it out if you have not already done so.

//  device  ???
//  device os ???

#include "PublishQueueAsyncRK.h"
//  v(0.2.0)   

#include "Particle.h"

// retained memory block   ******************************************************
//  When adding new retained variables to an existing set of retained variables, 
//  it's a good idea to add them after the existing variables. 
//  this ensures the existing retained data is still valid even with the new code.
//  https://docs.particle.io/reference/device-os/firmware/argon/#eeprom
retained uint8_t publishQueueRetainedBuffer[2048];
PublishQueueAsync publishQueue(publishQueueRetainedBuffer, sizeof(publishQueueRetainedBuffer));


int errCode = 0;
uint32_t g_tmsNow; //  current millis
uint32_t g_tmsMyTimer;
bool goOnline = true; 
bool sendInfo = true; 

SerialLogHandler logHandler; // Use primary serial over USB interface for logging output

// choose your SYSTEM_MODE       uncomment one statement to activate or skip
//SYSTEM_MODE(AUTOMATIC);         //  default SYSTEM_MODE is AUTOMATIC
SYSTEM_MODE(SEMI_AUTOMATIC);          
//SYSTEM_MODE(MANUAL);        

SYSTEM_THREAD(ENABLED);         //  uncomment to activate   

STARTUP(System.enableFeature(FEATURE_RETAINED_MEMORY));

void setup()  {   
  // Log some messages with different logging levels
  Log.info("This is info message");
  Log.warn("This is warning message");
  Log.error("This is error message, error=%d", errCode);

  // Format text message
  Log.info("System version: %s", (const char*)System.version());
    
  //  your code .......
  
} // end setup()

void loop() {   
    g_tmsNow = millis();
    if (goOnline)   {
        if (!Particle.connected()) { //  NOT connected to Particle cloud
            Particle.connect();
            //  After you call Particle.connect(), your loop will not be called again until the device finishes 
            //  connecting to the Cloud. Typically, you can expect a delay of approximately one second.
            waitFor(Particle.connected, 60000); 
        }
    }
    if (Particle.connected()) {
        if (!goOnline)   {
            Particle.disconnect(); //  disconnect from particle cloud
        } else if (sendInfo) {
            publishQueue.publish("eventName", "eventData", PRIVATE, WITH_ACK); // post a message
            sendInfo = false; // send info only once
        }
    }
    
    //  your code here & below .......
    
    if (g_tmsNow - g_tmsMyTimer >= 60000) { // timer expired           
        g_tmsMyTimer = g_tmsNow; // reset timer
        // do something useful here
        // decide to connect or disconnect by modifying: goOnline
        // decide to report info by modifying: sendInfo
        //
        Log.info("myLogInfo"); //  modify to meet data structure in "your code" above
        sendInfo = true; // TEST!   keep sending info
    }
    Particle.process();
} // end loop()

I ran this on an Argon device with os 2.0.0-rc.4

HG78 · November 15, 2020, 1:41pm

Hi amor, robc,

Thanks for your offers of help, I really appreciate it. The PublishQueueAsyncRK library looks nice and I’ll use that in future for publish uploads.

For this application however I’m not actually reading a sensor, instead I’m actuating a bunch of relays connected via I2C to open valves/pumps and take air samples for pollution monitoring. To take a sample I run a 60-180 second long sequence, timed using mills(), that sends the relevant I2C commands to open to the correct relays in turn.

This sampling setup has two main functions:

Take a single sample on demand.
Run a sampling sequence - e.g. take a sample every hour / every day at 14:33 for a month / one sample a month for a year / etc…

A single sample is taken by entering the correct String in a Particle.function() on console.particle.io

For the sequence I set the time for the next sample and run countdowns using millis() and Time.now(). I can start, stop, and change the sequence interval using String commands in a Particle.function().

@amor To answer your three questions:

The Argon is always awake, I use the RTC (via the SdCardLogHandlerRK library) so I can have long, yet precise sampling intervals without being concerned with millis() rollovers or having a power glitch losing the millis() count.
The I2C controls a MCP23017 I/O expander that opens/closes relays.
I want to be connected to the cloud at all times so the user can send commands via the Particle.functions() to control the device.

From this thread (Particle.connect() blocking main loop permanently, even with SYSTEM_THREAD(ENABLED)) I’m assuming that the Particle.connect() call will block the code in my main() loop. This means that if I leave the Argon in AUTOMATIC mode, I lose connection for a moment, and the Argon tries to re-connect my main code will not be run until connection is re-established. The Argon can be flashing green for ~45seconds while connecting even on a good day.

It seems like this causes me three problems:

If the Argon tries to re-connect just before a sample is due to be taken I risk missing the sampling time.
If the Argon tries to re-connect during a sample, the relay timing sequence will be ruined and I won’t take a valid sample.
Finally if the wifi/cloud connection goes down and stays down won’t I be stuck in a Particle.connect() block forever?

I’ve found a few other threads where people are asking similar questions (e.g. Particle.connect(timeout, retryDelay), Photon set Particle.connect() Timeout?), and there are hints that there might be a solution via a timeout or something to do with running SYSTEM_THREAD(ENABLED), but I haven’t been able to figure out what that solution is!

robc · November 15, 2020, 5:54pm

I did not think about it until now, you might try a short Particle.keepAlive() statement to see if that might help matters:

void setup() {
  Particle.keepAlive(5s);     // for 5 seconds.   
...

https://docs.particle.io/reference/device-os/firmware/argon/#particle-keepalive-

robc · November 15, 2020, 7:43pm

Yes. In this example block of code loop() will be blocked until either a connection is made or the block times out (60 seconds):

Particle.connect();
//  After you call Particle.connect(), your loop will not be called again until the device finishes 
//  connecting to the Cloud. Typically, you can expect a delay of approximately one second.
waitFor(Particle.connected, 60000);

Let's say you successfully connect. Blocking is no more for your loop() even when there are dropouts and such which can occur, sometimes frequently. The Argon will try to maintain the connection while your loop() continues to cycle. During these periods of "noise" Particle.connected() will return false. However, if you execute, for example, Particle.publish() during this particular condition you will block your code. The following is non-blocking:

if (Particle.connected()) { // keeps the following statement from blocking
  Particle.publish("eventName", "eventData", PRIVATE);
  delay(1000);
}

How does any of this help you? Well, trying the Particle.keepAlive() may help your incoming messages in reaching you consistently. In a nutshell, you can create an int defectCounter and zero it out before every incoming message. Increment the counter every time you detect a false condition, let's say at top of loop(). Monitor the defectCounter while processing incoming message, calculating response and transmitting message. If you get a defect anywhere along the process you can terminate accordingly.

Also recommend using SYSTEM_MODE(SEMI_AUTOMATIC) and SYSTEM_THREAD(ENABLED)

armor · November 15, 2020, 10:42pm

Incorrect - this is why SYSTEM_THREAD(ENABLED); should be used because your application will be running on a separate thread and will not get blocked whilst the connection is being established or re-established if lost. It sounds from what you have described that you should use SYSTEM_MODE(AUTOMATIC); and leave the device OS to connect initially and reconnect as required. Using the PublishQueueAsyncRK library means your application thread can 'fire and forget' events safe in the knowledge that if the event data cannot be sent it will be queued until it can be.

The Argon is always awake, I use the RTC (via the SdCardLogHandlerRK library) so I can have long, yet precise sampling intervals without being concerned with millis() rollovers or having a power glitch losing the millis() count.

Not used the RTC on the Adafruit feather much - are you using an alarm/timer and interrupt or just reading the time and comparing to the next time? millis() is an unsigned long so you do not have to worry about rollovers with this sort of logic (millis() - last_sample_millis >= SAMPLE_INTERVAL), power glitch is a valid issue but in practice is a very difficult problem to totally solve/guard against.

The I2C controls a MCP23017 I/O expander that opens/closes relays.

I2C instructions to the MCP23017 are very quick so I guess the issue you are worried about is that the solenoid is turned on and then needs to be turned off in a specific/accurate time. A straight forward way to handle this would be to use an array of millis times and a state indicator or millis = 0.

const int NUMBER_SOLENOIDS = 16;
const unsigned long SOLENOID_ON_TIME_MS = 20000; //sample period is 20 seconds
unsigned long solenoid_millis[NUMBER_SOLENOIDS];

in loop()

// to monitor for the switch off after sample time is ended
for (int s = 0; s < NUMBER_SOLENOIDS; s++)
{
  if (solenoid_millis[s] > 0)  //timer is active
  {
    if (millis()-solenoid_millis[s] >= SOLENOID_ON_TIME_MS)
    {
      solenoid_millis[s] = 0UL;
      //command MCP23017 to switch off GPIO
    }
  }
}

When you switch on solenoid s connected to GPIO 0-15

//command MCP23017 to switch on GPIO
solenoid_millis[s] = millis();

The beauty of doing this type of logic in the loop() is that you don't need to worry about calling Particle.process() or keepAlive() because you are not blocking the loop and since it normally will run at 1000/Second your timings will be accurate to a milliSecond.

HG78 · November 16, 2020, 11:20am

SYSTEM_THREAD(ENABLED); should be used because your application will be running on a separate thread and will not get blocked whilst the connection is being established or re-established if lost.

Great to have a confirmation of this.

I'm just doing a simple "read-and-compare" of the time with an if-statement to activate the sampling sequence. I create the Unix timestamp for the next sample and then compare this to Time.now() so it doesn't matter if I have a power glitch and lose the millis() count. I also use the RTC and SD card to log the time/date that the sample was collected. It seems to work nicely so far.

I2C instructions to the MCP23017 are very quick so I guess the issue you are worried about is that the solenoid is turned on and then needs to be turned off in a specific/accurate time. A straight forward way to handle this would be to use an array of millis times and a state indicator or millis = 0.

This is exactly what I do, but I had a Particle.publish() inside the loop to publish that the sample was being taken. This was then blocking my code.

In general it seems if I make the following changes everything should ok:

Use the PublishQueueAsyncRK library so my publish that the sample has been taken won't block.
Use SYSTEM_THREAD(ENABLED); so re-connection attempts won't block.
Use SYSTEM_MODE(SEMI_AUTOMATIC); with a physical switch so I can leave the Cloud functions completely turned off if at a sampling site where internet connection is impossible.
Potentially use Particle.keepAlive() to reduce the likelihood of disconnects, but the system_thread and AsyncRK library should cope even if disconnects occur.

Thanks for all your help. I'm pretty busy right now, but when I've had a chance to test this out on-site with the proper hardware I'll report back.

BerenV · February 15, 2021, 9:46pm

Did you manage to get your system working how it should @HG78? Thanks for the useful information in this thread everyone. I was trying to get one of my monitor systems working more reliably and this helped.

HG78 · February 16, 2021, 5:52pm

Hi @BerenV,
I haven’t been able to get to the remote site that was causing problems due to my country’s COVID restrictions. It’s working nicely in the lab however so I’m hopeful it’ll be ok. I’ll post an update when we finally get back into the field.

system · August 18, 2021, 5:53am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Argon "forgets" WiFi details after power down Troubleshooting	2	854	August 19, 2019
What happens to the UART connection of Argon while the device try to reconnect to WI-Fi? Hardware argon	6	481	March 16, 2022
Argon fast blink green next day, not connecting till restarted Troubleshooting argon	4	690	March 28, 2019
Solved Send/Receive I2C Question Troubleshooting	11	4917	December 3, 2017
Optional Wifi connection using Argon Connectivity argon	10	739	May 14, 2021

Advice/best practice for unreliable internet connections

Related topics