What Can Cause an Assertion Failure?

DougJohnson · July 26, 2018, 11:44pm

I have a Photon that was running fine. I put some test code on it and that ran fine. I then recompiled the previously working code on 0.7.0 and loaded it. Now the Photon comes online and promptly blinks red SOS 10 – assertion failure. After a bit, it reboots and does the whole thing again. I’ve tried going into safe mode and reloading the code compiled under 0.6.0, 0.6.1, and 0.6.2 with the same SOS 10.

I can’t find anything that tells me what a assertion failure is telling me. Any help would be appreciated.
Thanks,
Doug

ScruffR · July 27, 2018, 7:05am

Have you tried the search feature of this forum?

Since we don’t know your code, it’s hard to tell as this is a somewhat unspecific error (like “if it’s none of the others, let’s call it ‘assertion failure’” ), but we had reports about assertion failures where we were able to assist users to solve the issue.

DougJohnson · July 27, 2018, 1:33pm

Searching the forum is always a good place to start and I did. None seemed relevant. Motor noise, calling the watch dog (I don’t), several unresolved problems, and some stuff that is 4 years old.

My code is fairly large and was working until I refreshed it yesterday. Can I get some hints?

Thanks,
Doug

ScruffR · July 27, 2018, 2:22pm

IIRC, that was not actually the reason, but only gave rise to the actual issue of stack overflow and the respective circumstances caused it to show up as assertion fault instead of SOS+13.

DougJohnson · July 27, 2018, 3:20pm

Sure. Calling the watch dog with a larger stack made the problem go away. But I miss the relevance. I don’t call watch dog at all.

I’m still looking for hints as how to approach the problem. Are assertion failures caused by bad calls to system code? Do I just need to start commenting stuff out until it stops? I’m truly clueless here.
Thanks,
Doug

ScruffR · July 27, 2018, 3:35pm

Still: No, the cause is not the watchdog!
Not calling the watchdog caused the crash, but when the watchdog called System.reset() (with all its internal shenanigans) from a thread with a stack that was too small for it to successfully execute.
Consequently it's irrelevant that your code is not using the watchdog.
It might be something else but similar that's happening in your code.

Since we do have at least some experience in spotting potentially suspicious constellations of code, I hinted that seeing your code might be a start. But if you can't believe it might be that way, I'm fine with that too.

DougJohnson · July 27, 2018, 8:12pm

OK. If that is the best way to handle it, the code is on github:

It uses some private Particle libraries that I can add if necessary.

Thanks,
Doug

ScruffR · July 28, 2018, 8:34am

Since I can’t see the implementation of class Measurement I’m not sure whether or not this might pose a problem, but you are creating a “local” object, copy the pointer and pass that copy to other global objects.

I’m not entirely sure about the actual behaviour of STM32 gcc when copying an instance pointer in regards to reference count, I could imagine that your local object “decays” once the local instance variable goes out of scope.
So I’d rather be safe than sorry and have a global measurement instance.

One other thing you could try may be to actually create the “local” object like this

  Measurement* measure = new Measurement();

and then pass the actual instance pointer (vs. a mere copy of it)

or

  Measurement measurement;
  // act on the object directly (i.e. measurement.xxx() instead of measure->xxx())

and then adapt your other objects to take a reference (Measurement& m) instead of a pointer (Measurement* m).
This way you can be absolutely sure, the reference counter will be managed correctly.

DougJohnson · July 28, 2018, 6:25pm

I’ve added the libraries to github for completeness.

Like you, I believe a locally allocated object will be deallocated when it goes out of scope. So the convention Measurement uses is for functions to never retain a passed-in pointer to Measurement, but to copy the Measurement values to a locally valid Measurement. The passed-in Measurement will be free to be deallocated without any dangling references.

Thanks,
Doug

DougJohnson · July 28, 2018, 11:20pm

At this point, the problem seems to be hardware related. I flashed the code with 0.6.2 to a another Photon and all was well. I replaced the problem Photon in the wind and rain unit with the new one and all was well.

I put the problem Photon in a test jig, flashed it with a bit of test code, and all was well. I flashed with the production code and 0.6.2 and it gave a nice SOS 10.

Things are working again so I’m going to declare victory and thank you for your help.

– Doug

DougJohnson · August 8, 2018, 7:18pm

Here is a snippet of the code that caused the SOS 10:


void setup() {

    String measurementString = toString();
    Particle.publish("PUBLISH_CODE",measurementString,PRIVATE);
    delete measurementString;

}

String toString()
{
  return String("{\"Measurement\": ") +
  String("{\"Type\": I, \"Subtype\": H, \"Value\": 38.179054,") +
  String(" \"Time\": 1533754532}}");
}

void loop()
{}

if I remove the “delete measurementString;” all is well. Apparently; 0.6.2 tolerated the delete and 0.7.0 doesn’t. Which leads to a question and a comment.

I can flash a 0.6.2 version of the bad code to a new device and it works fine. I flash a 0.7.0 version of the code to the same device and it fails. Once I’ve done that, 0.6.2 versions will fail also. The question is “What state is preserved across flashes that might cause this?”.

The comment is that this device was located in a very inaccessible spot. Putting it in safe mode to flash correct code was difficult. Can there be a way of putting the device in safe mode automatically after an SOS?

Thanks,
Doug

ScruffR · August 8, 2018, 7:26pm

delete is meant for object that are instantiated via a new instruction and delete wants an object pointer not an object reference.

peekay123 · August 8, 2018, 7:36pm

Step away from the Arduino String!!! These are notorious for fragmenting the heap and since these are small system with no garbage collection, you will get a heap error eventually. Instead, you should be using cstrings:

http://www.cplusplus.com/reference/cstring/

DougJohnson · August 8, 2018, 7:41pm

Thanks for the cstring suggestion. I’ll look into it, although I have a number of devices that have been running this code for 6 months or more.

Yeah, I knew delete was wrong when I saw it, ScruffR. Do you have any thoughts on my question and comment?

Thanks,
Doug

ScruffR · August 8, 2018, 7:50pm

If you flash a application targeted for 0.6.2 to device with 0.7.0 system OS firmware the system will not be downgraded.

If a device crashes several times during bootup the system OS will put the device in Safe Mode. If your code starts running and then crashes, your code needs to take care of that by calling System.enterSafeMode() - e.g. when your code doesn't make it to a certain checkpoint for x attempts.

Topic		Replies	Views
Precise meaning of "Assertion failure" (not "Attribution error")! Firmware	4	1502	August 27, 2018
SOS On Photon - SOS Message, Assertion Failure Troubleshooting	18	2345	March 5, 2017
Application Watchdog test results in "assertion failure" Troubleshooting	5	1448	April 1, 2018
How to debug SOS Assertion Failure Device OS photon	6	641	March 30, 2021
Particle board entering panic mode every couple minutes Troubleshooting	2	742	December 4, 2018

What Can Cause an Assertion Failure?

Related topics