Non-Cyan-Flash Offline Core

Is there a known issue with Cores becoming unresponsive where they don’t have the ‘Cyan flash of death’?

It seems like the more I try and use TcpClient the quicker my core will get into a state where it’s dead but the led doesn’t flash: it still has the slow fade in/out blue that it has when it’s normally connected.

I have it set to post a short HTTP GET every 10 seconds to dweet.io, taking great care never to block the loop() for too long, so as not to upset the cloud heartbeat. I see that after a few minutes (a different amount each time) it just stops sending those HTTP GETs, and if I try and query a variable from the core I get back:

{
  "error": "Timed out.”
}

All I can do at this point is reset it.

Basically, the core is unusable for me for sending even small amounts of data out.

Contrast this to my other core, which is being queried once an hour: it stays up for a week or so at a time without becoming unresponsive.

Ive seen this happen also, but not as often as other errors.

You can try Bdubs watchdog firmware if your looking for a auto-resetting solution.

Thanks. I’m just hoping Spark are aware of this issue too and working to resolve it.

If you can build locally then pull today’s master it is much improved.

@david_s5 you talking about the Spark/master? It was updated today?

yes There is one PR against it if you run in debug mode but if you build release mode it is not needed.

FYI Spark/master is not where webIDE is built from

I haven’t set up the tool chain for local builds but I can set it up and give that version a go.

You mean if I don't run this new master firmware in Debug mode I should have other known issues?

Your talking about this master firmware file right? GitHub - particle-iot/device-os: Device OS (Firmware) for Particle Devices

It is the 2 repos at branch master. https://github.com/spark/core-firmware , https://github.com/spark/core-common-lib

If you run in release mode (no DEBUG_BUILD=y) It will be fine as is.

If you want to run debug then you need the one file from the branch https://github.com/davids/core-firmware serial_fix see the PR https://raw2.github.com/davids5/core-firmware/serial_fix/src/spark_wiring_usartserial.cpp

Thanks, I’m going to give it a try again soon as I have a few hours to spare.

I built at 5cc4b6b9f058365cc762f84fdae021052aedae06 and installed it, but now it’s not running my ‘sketch’. If I flash the sketch via the Web IDE it will leave the core firmware alone, right?

Reflashed my sketch via the Web IDE and it lasted 4 minutes before hanging, so maybe it did reflash the old firmware along with the sketch?

If you flash the Spark via the WEB IDE it will overwrite the firmware that you complied on your PC.

1 Like

Hi @finsprings,

Make sure you’re cleaning up your sockets after you post data, I think it’s still possible right now to open too many sockets without closing them properly, and this can kill your sketch. Any chance you can share your code that is crashing?

Thanks,
David

How do I get the sketch into what I install via dfu-util?

I’m keeping a socket open and using HTTP 1/1, but I see the same if after .available() returns yes I do a flush() then stop(), and then a subsequent connect() next time around.

Here’s the loop:

void loop()
{
    dweetConnected = dweetClient.connected();
    
    if (dweetEnabled && !dweetConnected) {
        if (dweetClient.connect(DWEET_ADDRESS, 80)) {
            ringBuffer.log("Conn.");
        } else {
            ringBuffer.log("ConnFail.");
        }
    }

    if (dweetConnected && dweetClient.available()) {
        // just read on this loop, so we read as fast as possible
        //ringBuffer.log(dweetClient.read());
        dweetClient.flush();
        //dweetClient.stop();
        //ringBuffer.log("Disc.");
    } else {
        const unsigned long now = millis();
        since = now - lastSendTime;
        const bool updateIsDue = since > DWEET_EVERY_MS; // This is 10,000

        if (dweetClient.connected() && updateIsDue) {
            lastSendTime = now;
            ringBuffer.log("Send.");

            dweetClient.print("GET /dweet/for/foo?temperature=");
            dweetClient.print(tempString);
            dweetClient.println(" HTTP/1.1");
            dweetClient.println();
            dweetClient.println();
        } else {
            update(); // This would go do 1-wire stuff but that’s a NOP right now for testing this HTTP stuff
        }
    }
}

Ah, I see, I need to just edit application.cpp.

It looks like maybe you could do the millis / lastSendTime check before you connect the socket? Otherwise you’re just making their server keep a socket open for 10 seconds, which may time out.

Ahh yeah, and I’d add this as well:

dweetClient.flush();
dweetClient.stop();
delay(100);

after your dweetClient.println()'s . Just to make sure you’ve completed the request.

True, but it stays connected (HTTP 1.1 keep alive) and doesn’t time out though, so I don’t think that’s why the core is becoming unresponsive.

The flush will block which could lead me to get killed by the 10s heartbeat.
The stop will disconnect, which means I’ll just need to reconnect next time. Why spend all that time on TCP socket establishment every time? (And even if I do it still hangs.)

If you don’t think blocking is an issue I can try it.