Publish event not showing up in console logs, inconsistently triggering IFTTT, not triggering webhook

I have a 3G Electron running with system threading enabled in semi_automatic mode.

It has been publishing data to the Particle Cloud every 15 minutes for the past 4 days without issue. The publish event triggers an IFTTT recipe to add the published data to a Google Spreadsheet, and also triggers a webhook to send the data to a ThingSpeak channel.

The problem is, yesterday at 10 pm GMT my Electron published its last data successfully, after which things got wonky. The IFTTT recipe triggered twice more (2 am GMT and 10 am GMT), and the webhook didn’t get triggered at all after 10 pm GMT yesterday, despite the fact that the Electron was attempting to publish every 15 minutes.

I then loaded the Particle Console logs web serivce in my browser, and used the cURL command there to begin monitoring the logs via my command terminal on my PC.

Without resetting the Electron, I opened up the serial monitor interface to it using PuTTY and began manually making it publish to the Particle Cloud to see if the publish event would show up in either the online console log or on my command terminal.

The result is that the Particle.publish() call in my firmware returns true , but the publish event doesn’t show up in the live log feed in either the Particle Console or on my command terminal.

What’s even stranger is that the IFTTT recipe now appears to be getting triggered again sometimes (about 50% of the time) however the webhook is not getting triggered at all.

About 10% of the time, the Particle.publish() call returns true, and then a subsequent check of Particle.connected() returns false and the RGB indicator LED on board the Electron starts rapid cyan flash (trying to connect to particle cloud) and then fails with rapid red blinks… then repeats… and repeats… and repeats… until it reconnects to the Particle Cloud after about 10 tries.

The remaining 40% of the time Particle.publish() appears to return true but doesn’t show up anywhere on the cloud (not in console, my terminal, IFTTT trigger, or webhook trigger) and Particle.connected() still returns true.

Here is a screenshot of the console log, which clearly shows the IFTTT count incrementing and decrementing despite there being no publish events being listed.

So, can someone help me answer the following questions:

  • why would Particle.publish() return true if it is not being logged on Particle console, IFTTT, or webhook, and a subsequent call to Particle.connected() returns false?

  • Why would the iftt-trigger-event-check event increment and then decrement without a publish event showing up in the Particle console logs?

  • What does the ifttt-trigger-event-check “count” variable actually count? My IFTTT recipe has triggered several hundred times, so why is the count only at a few dozen?

  • Why would an IFTTT recipe get triggered but not a webhook?

  • Is there a better way for me to confirm that a publish event has actually taken place other than looking at my google spreadsheet, ThingSpeak channel, and using the online/command terminal log utility?


UPDATE:

If I press the reset button, everything starts working correctly again. Obviously this isn’t a real solution, but may help one of you speak to my problem with a bit more information.

This suggests that your device's firmware may have an issue. That is, you may be overflowing the stack, a variable overflowing or even getting some fragmentation issues (e.g. over-reliance on the String class).

what does your code look like?

2 Likes

Because the return value only indicates a successful enqueuing of the event request to the background task for delivery, but does not give you any feedback about the actual delevery. This might fail due to several reasons, including what @BulldogLowell said.

This might also answer some of your other questions (in parts)

1 Like

If memory fragmentation is my issue, then I gather that the fragmentation gets worse and worse the longer my application runs. This may explain why I got this issue after the code ran for 4 days. I’m guessing that this would cause a failure because it could potentially create a situation where the background system task responsible for communicating with the uBlox modem to publish my data to the cloud would not be able to allocate memory for the serial comms buffers between the MCU and modem?

What would be an appropriate test to determine if it is a fragmentation issue vs. an overflow issue vs some other unknown issue?

Boring old C strings (char arrays) are the safest alternative - and my prefered choice (FWIW :blush:)

1 Like

Use simple C-style strings (null terminated char arrays).

I’d start with commenting out all of the serial de-bug String manipulation (if you are not using the Serial monitor to debug your issues) and see if that improves the time that your device continues to run.

1 Like

Okay thanks buds! I’ll try that out and see what happens.

Still seems strange though that IFTTT was being triggered despite no publish event being logged in the Particle Cloud. That seems like more of a server-side issue than firmware issue.

Well, it would appear that removing all of the serial monitor related code from my application has increased the amount of time that my Electron can run from 4 days to 6 days.

Now it appears that after about 6 days of running without missing a scheduled publish event, the Electron goes into a rapid green blink and cannot get any further than that. Hitting the reset button fixes the problem.

I am confused as to how to proceed with troubleshooting from here, other than to write a simple program that publishes to the cloud every hour and see how many days it can run for.

Also, regarding heap fragmentation: if this is a problem related to the heap getting fragmented after running for a long time then wouldn’t the Electron reset itself as soon as it got to a line of code that resulted in an hard fault (i.e. SOS blink)? So, in other words, me pressing the reset button has the same effect as the Electron encountering a hard fault?

Nope, if the allocation attempt fails and that failing is treated courteously (as is expected of the system firmware) then only the function trying to allocate will fail but not crash the whole system.
SOS panics usually happens when the code runs into such conditions unprepared (as user code often does).

Not quite. Running into a hard fault causes the system to reset the device which will then have the same (or similar) effect as a RESET button press.

1 Like

I have one variable that is definitely overflowing, but I'm not sure that it would effect the program behaviour. It is basically this:

char testChaBefore = 'a';
unsigned long numLoops = 0;
char testCharAfter = 'b';
void loop(){
   numLoops++;
   etc etc....
}

I wrote a test program that repeatedly overflows numLoops() in the way shown above but it did not appear to have any effect on the program continuing to run. When it overflows, numLoops just goes back to 0, and testChaBefore and testCharAfter both remain unaffected.

I've removed the numLoops ++ line and reflashed my Electron, now to see how long it lasts!

In that context, it is not really possible to see if that will have any effects related to your issue, though one must wonder what numLoops could possibly have to do with the price of price of tea in China… It suggests that numLoops is some sort of a timer, and that doesn’t seem particularly useful, considering all of the alternatives, but who knows except you.

I saw the related posts concerning String class. Have you gone through all of that and tried to get everything over to C style strings?

It seems that your issue is still eluding you. You can try to 1) post your code and see of someone can help identify your issues or 2) find a programmer that can assist you. Right now I’d only be guessing if I offered up help.

1 Like

BTW, the wrap-round of your counter is not what’s meant when talking about the dreaded effects of “overflowing” variables.
That’s usually in connection with buffers and arrays (buffer overflow) or out-of-bounds access.
While a wrap-round of numeric variables is a normal procedure with (usually) no ill side effects, buffer overflow will usually corrupt other data (including jump addresses on the stack) that will inevitably cause trouble.

I might have missed it, but if you see an SOS code, what’s the blink count in between?
One possible scenario when a wrap-round of a counter might cause an SOS is when this leads to a DIV/0 exception (SOS+13).

1 Like