I’ve seen more and more example of this in the field, so I wanted to share here.
For some reason, on some device and from time to time, the publish queue seems to be backing up, eventhough I can see the device coming online again and again. I might have been too naive with the publishing times. First a chart of how it looks in terms of the publish queue backing up:
The code surrounding the transmission is as follows with events typically being 1000-1500 bytes:
// Publish via CloudEvent (structured JSON)
measurementEvent.name(CloudConstants::MEASUREMENT_EVENT_NAME).data(root).contentType(ContentType::STRUCTURED);
Then I have this code:
void ClassCloud::transmitPublishQueue()
{
// Calculate publish timeout based on number of events in queue
unsigned long TIMEOUT_START = System.millis();
unsigned long TIMEOUT_DURATION = CloudConstants::PUBLISH_TIMEOUT_CONSTANT + (PublishQueueExt::instance().getNumEvents() * CloudConstants::PUBLISH_QUEUE_TRANSMIT_DELAY);
// 20000 milliseconds times number of events in queue as timeout for transmission
if (Particle.connected())
{
networkLogger.info("Transmitting Queue...");
// Start transmitting queued payloads
while ((PublishQueueExt::instance().getNumEvents() > 0) && (System.millis() - TIMEOUT_START) <= TIMEOUT_DURATION)
{
Class::instance().systemCheckin();
PublishQueueExt::instance().loop();
delay(25); // avoid starving the system thread
}
PublishQueueExt::instance().loop();
if (Class::instance().getPublishVitalsFlag())
{
Particle.publishVitals();
}
ClassCloud::instance().setSignalParameters();
networkLogger.info("Transmission Completed...");
}
}
I’m unsure, because in the earlier PublishQueue library I had to set the delay to approx. 1050 ms instead of just 25ms, but the rate limiting should not be any issue anymore running 6.3.3.
The weird thing is simply I can see the device going online according to schedule, but no events are coming through and since it’s devices on the field, I can’t see the PublishQueue log.
Lastly, it seems that remotely restarting the device will allow it to send the payloads in queue again and I’m not entirely sure why, whether it’s the extra time connected in beginning of the power-up phase before it goes into a measurement loop.
A stateHandler may not be correct when you call your thread via multiple statements as you have in your code. You might try using this statement only once in your loop(), at the top (or bottom):
PublishQueueExt::instance().loop();
Essentially, you want your thread to run every loop.
You can check out the example in github: PublishQueueExtRK 2-test-suite.cpp where it uses a single statement only for loop()
You shouldn't use the library that way. It wasn't designed to work that way, and it will almost certainly fail. You need to call PublishQueueExt::instance().loop(); on every loop, preferably without any delays. It will take care making sure to only publish when the cloud connection is up and handle retry and rate limiting.
If you want only publish at certain times under your control, use the setPausePublishing() method to pause publishing when you don't want to publish. This will cause the events to be queued, even if the cloud connection is up. If you need to know when it's done use getCanSleep() which returns true when the queue is empty and all transmissions are complete.
I’ll refactor it to be called from loop instead, however I’m unsure whether it will work properly, since a loop might take 6 hours or so, e.g. with following code:
void loop()
{
// Measurement Cycle
Class::instance().measurementSequence(Class::instance().getNumberOfMeasurements(), Class::instance().getSleepTime(), true); // E.g. measure every 2 min, transmit every 6 hours
// Check for OTA updates, if one is pending - update
ClassCloud::instance().handleUpdates();
// Check for Device configuration
ClassCloud::instance().getConfiguration();
// Check for Sensor configurations
ClassCloud::instance().getSensorConfiguration();
// Sync Time
ClassCloud::instance().syncTime(CloudConstants::TIME_SYNC_TIMEOUT);
// Disconnect gracefully and turn off cellular module
ClassCloud::instance().disconnect();
// Process Background Tasks if required
Class::instance().processBackgroundTasks();
}
You should restructure your code to use a finite state machine instead of a long-running loop.
While the system thread allows the cellular modem to work while blocking loop, other things like Particle.function, Particle.variable, serial events, and some other things only dispatch after returning from loop.
I'm working on refactoring into a FSM now, however I wanted to hear your advice on how a measurement sequence with sleep intervals should be structured in terms of best practise.
We typically measure every 2 minutes and sleep between measurements - after 180 measurements (configurable), they're packaged and transmitted.
Sleeping for less than 15 minutes with cellular off is not recommended because frequent reconnection can cause your SIM to be banned by the mobile network.
Sleep with cellular on is an option for Gen 3 (nRF52, Boron, B-SoM, Tracker) but it saves little power because the cellular modem is what is consuming the most power, not the MCU. Sleep with cellular on is not supported on Gen 4 (M-SoM).
I sleep with cellular turned off every 2 minutes, it's actually off over the whole sequence that measures every 2 minutes and first turns it off after 180 minutes.
But I think I have the solution now, just need to do some longer term testing and test the power consumption