What Can Cause an Assertion Failure?


#1

I have a Photon that was running fine. I put some test code on it and that ran fine. I then recompiled the previously working code on 0.7.0 and loaded it. Now the Photon comes online and promptly blinks red SOS 10 – assertion failure. After a bit, it reboots and does the whole thing again. I’ve tried going into safe mode and reloading the code compiled under 0.6.0, 0.6.1, and 0.6.2 with the same SOS 10.

I can’t find anything that tells me what a assertion failure is telling me. Any help would be appreciated.
Thanks,
Doug


Precise meaning of "Assertion failure" (not "Attribution error")!
#2

Have you tried the search feature of this forum?

Since we don’t know your code, it’s hard to tell as this is a somewhat unspecific error (like “if it’s none of the others, let’s call it ‘assertion failure’:wink: ), but we had reports about assertion failures where we were able to assist users to solve the issue.


#3

Searching the forum is always a good place to start and I did. None seemed relevant. Motor noise, calling the watch dog (I don’t), several unresolved problems, and some stuff that is 4 years old.

My code is fairly large and was working until I refreshed it yesterday. Can I get some hints?

Thanks,
Doug


#4

IIRC, that was not actually the reason, but only gave rise to the actual issue of stack overflow and the respective circumstances caused it to show up as assertion fault instead of SOS+13.


#5

Sure. Calling the watch dog with a larger stack made the problem go away. But I miss the relevance. I don’t call watch dog at all.

I’m still looking for hints as how to approach the problem. Are assertion failures caused by bad calls to system code? Do I just need to start commenting stuff out until it stops? I’m truly clueless here.
Thanks,
Doug


#6

Still: No, the cause is not the watchdog!
Not calling the watchdog caused the crash, but when the watchdog called System.reset() (with all its internal shenanigans) from a thread with a stack that was too small for it to successfully execute.
Consequently it’s irrelevant that your code is not using the watchdog.
It might be something else but similar that’s happening in your code.

Since we do have at least some experience in spotting potentially suspicious constellations of code, I hinted that seeing your code might be a start. But if you can’t believe it might be that way, I’m fine with that too.


#7

OK. If that is the best way to handle it, the code is on github:

It uses some private Particle libraries that I can add if necessary.

Thanks,
Doug


#8

Since I can’t see the implementation of class Measurement I’m not sure whether or not this might pose a problem, but you are creating a “local” object, copy the pointer and pass that copy to other global objects.

I’m not entirely sure about the actual behaviour of STM32 gcc when copying an instance pointer in regards to reference count, I could imagine that your local object “decays” once the local instance variable goes out of scope.
So I’d rather be safe than sorry and have a global measurement instance.

One other thing you could try may be to actually create the “local” object like this

  Measurement* measure = new Measurement();

and then pass the actual instance pointer (vs. a mere copy of it)

or

  Measurement measurement;
  // act on the object directly (i.e. measurement.xxx() instead of measure->xxx())  

and then adapt your other objects to take a reference (Measurement& m) instead of a pointer (Measurement* m).
This way you can be absolutely sure, the reference counter will be managed correctly.


#9

I’ve added the libraries to github for completeness.

Like you, I believe a locally allocated object will be deallocated when it goes out of scope. So the convention Measurement uses is for functions to never retain a passed-in pointer to Measurement, but to copy the Measurement values to a locally valid Measurement. The passed-in Measurement will be free to be deallocated without any dangling references.

Thanks,
Doug


#10

At this point, the problem seems to be hardware related. I flashed the code with 0.6.2 to a another Photon and all was well. I replaced the problem Photon in the wind and rain unit with the new one and all was well.

I put the problem Photon in a test jig, flashed it with a bit of test code, and all was well. I flashed with the production code and 0.6.2 and it gave a nice SOS 10.

Things are working again so I’m going to declare victory and thank you for your help.

– Doug


#11

Here is a snippet of the code that caused the SOS 10:


void setup() {

    String measurementString = toString();
    Particle.publish("PUBLISH_CODE",measurementString,PRIVATE);
    delete measurementString;

}

String toString()
{
  return String("{\"Measurement\": ") +
  String("{\"Type\": I, \"Subtype\": H, \"Value\": 38.179054,") +
  String(" \"Time\": 1533754532}}");
}

void loop()
{}

if I remove the “delete measurementString;” all is well. Apparently; 0.6.2 tolerated the delete and 0.7.0 doesn’t. Which leads to a question and a comment.

I can flash a 0.6.2 version of the bad code to a new device and it works fine. I flash a 0.7.0 version of the code to the same device and it fails. Once I’ve done that, 0.6.2 versions will fail also. The question is “What state is preserved across flashes that might cause this?”.

The comment is that this device was located in a very inaccessible spot. Putting it in safe mode to flash correct code was difficult. Can there be a way of putting the device in safe mode automatically after an SOS?

Thanks,
Doug


#12

delete is meant for object that are instantiated via a new instruction and delete wants an object pointer not an object reference.


#13

Step away from the Arduino String!!! These are notorious for fragmenting the heap and since these are small system with no garbage collection, you will get a heap error eventually. Instead, you should be using cstrings:

http://www.cplusplus.com/reference/cstring/


#14

Thanks for the cstring suggestion. I’ll look into it, although I have a number of devices that have been running this code for 6 months or more.

Yeah, I knew delete was wrong when I saw it, ScruffR. Do you have any thoughts on my question and comment?

Thanks,
Doug


#15

If you flash a application targeted for 0.6.2 to device with 0.7.0 system OS firmware the system will not be downgraded.

If a device crashes several times during bootup the system OS will put the device in Safe Mode. If your code starts running and then crashes, your code needs to take care of that by calling System.enterSafeMode() - e.g. when your code doesn’t make it to a certain checkpoint for x attempts.