Particle publish permanently blocks when internet goes down

So look at this doc:

especially this statment:

Overview of API field limits
Publish/Subscribe Event Data 622 bytes

maybe this the case :slight_smile:

You’re ignoring the fact that everything works until wifi is killed.

Sorry just want to help
I don’t know what is going on with your data within the loop when wifi is killed especially with System Thread is enabled may be the data is “growing up” some how. I don’t know that.

Also I’m not ignoring noting as I mentioned before I use just Particle.connected() as if this is true WiFi.ready() is considered to be true as well.

One more idea in my head is to modify a little your debug part of code like this:

if (WiFi.ready() && Particle.connected())
{
    Serial.println("Before");
    Particle.publish("name", data, PRIVATE , NO_ACK);
    Serial.println("After");
}

then you can determinate if there is a issue with WiFi.ready()/Particle.connected() or with Particle.publish itself

Nope, I didn’t say you cannot use Serial.print() or Serial1.write() at all when using Particle.publish(). What I meant was that using either of them in a system callback or ISR may lead to issues and should be avoided.

Why is it not possible?
For debugging such issues it’s always best to break out the suspected sections that cause and/or exhibit the problem into a minimal test case and that should well be postable :wink:

Thanks for the help so far,
I wasn’t listening for system events. That was added to debug the issue.

There is proprietary stuff there and it’s talking to another proprietary device on Serial1 and so far I have been unable to figure out what can possibly cause the failure to be able to strip down the firmware. I am not using any threads, ISR or any external libraries and even the blocking version of Particle.publish fails in the same way. If you or someone else could point me to the possible list of things that can cause the blocking version of Particle.publish to end up with solid cyan, I can perhaps create a minimal test case that others can replicate.

Solid LED colors should never occur. They occur because the system locks up and prevents the LED from changing color, breathing, or blinking. It’s not really meaningful that it’s solid cyan, it just was that color prior to locking up. Though solid red is sometimes significant because the system most likely locked up while outputting a SOS indication, which may be significant.

There are some things that are known to be able to cause it, however:

  • Disabling interrupts forever, or entering an infinite loop with interrupts disabled.

  • Blocking on a mutex while in a noInterrupts() or SINGLE_THREADED_BLOCK with the mutex held by another thread. This deadlocks the system because thread swaps are prohibited, but the mutex can never become free without thread swapping. Note that many things have internal mutexes that can cause this: SPI transactions, Log.* calls, and anything that uses the cloud (Particle.publish) or network (Cellular or WiFi) may block on a resource mutex internally.

  • Severely corrupting RAM in a way that confuses FreeRTOS and scheduling gets stuck. Freeing memory twice, uninitialized pointers, overwriting the end of blocks or the stack, all have unpredictable side-effects that may not immediately crash, but could cause problems later.

4 Likes

Maybe this template could help. If you could answer the two questions at the top and comment/uncomment the appropriate lines then run the code to see if this runs ok. I hope it does.
Then, keep adding your code to it until it breaks. Then, please share your code & results with us, if you like. Hopefully, this process will help you quickly discover any underlying issue(s) in the firmware.

I have been wanting to make this template for a while so I didn’t want to pass this up. I ran it on an Argon with device OS: 2.0.0-rc.2

//  device  ???
//  device os ???

const uint32_t baudRate = 115200;
int errCode = 0;

const char *someData = "Could be a large amount of data.......!";
char msg512[512];                                      


SerialLogHandler logHandler; // Use primary serial over USB interface for logging output

//  choose your SYSTEM_MODE       uncomment one statement to activate or skip
//SYSTEM_MODE(AUTOMATIC);         //  default SYSTEM_MODE is AUTOMATIC
//SYSTEM_MODE(SEMI_AUTOMATIC);          
//SYSTEM_MODE(MANUAL);        

SYSTEM_THREAD(ENABLED);         //  uncomment to activate   

void setup()
{   Serial.begin(baudRate); //Initialize serial port
    delay(1s);
    
    // Log some messages with different logging levels
    Log.info("This is info message");
    Log.warn("This is warning message");
    Log.error("This is error message, error=%d", errCode);

    // Format text message
    Log.info("System version: %s", (const char*)System.version());
    
    
    //  your code .......
}

void loop()
{   //  your code .......

    sprintf(msg512, "%s", someData); 

    if (Particle.connected()) {
        Particle.publish("eventName", msg512, PRIVATE); //  modify to meet data structure in "your code" above
        Log.info(msg512);
    }

    delay(1s);
}
2 Likes

Wow, great discussion. Very informative as I have a similar problem. I’m using a photon in an automotive application that publishes data only when connected to the home WiFi. I do a system.sleep() to conserve power. If publish() is unsuccessful, it stores the data and retries 10 minutes later. This part works great.
However, on occasion, if the home WiFi is active but internet connectivity is down, the photon will subsequently no longer attempt to connect, even after internet connectivity has been restored. This requires a hard power cycle on the vehicle to restore.

I have had this issue with my photons through all of the os updates. Our work around is a high priority hardware interrupt that acts a watchdog. Resets the system.