Cyan blink of death - need debugging tips

Hi There,

A while back I posted about a cyan of death, and agreed that the most likely candidate was String use.

I’ve since removed all instances of String from my code, but am still able to reproduce the Cyan of death.

My setup is a Photon with a DS18B20.
I have SYSTEM_THREAD(ENABLED);
and by default in Automatic Mode.

I hit the DS18b20 every couple of seconds, but with a small modification to the library, rather than a delay(750ms) after requesting a temperature, I return back to the main loop, and after at least that much time has passed (counting millis() ) I retrieve the temperature (which I understand does some timing-related bitbanging and no-interrupt stuff.

Most of the time my code spends loop()'ing, waiting for the ds18b20 to be ready, or waiting until it’s a good time to request a new temperature.

REPRO:
The Photon is connected via a little wifi router on my desk on a powerbar. I can simply flip the powerbar on/off to trigger a photon disconnect.
Every 5th time or so, I can reproduce a Cyan Blink of Death.

I’ve put Serial.println(" ") debug statements all frigging over the place.
It doesn’t crash in the same spot all the time.
It doesn’t crash between requesting and retrieving a temperature (often just while waiting for a timer to elapse to request the next reading.

I’m pretty stuck. My next step is to start paring down my code to something that does almost nothing, and building it back up until I can start reproducing it again. What a painful proposition.

Q1). I’m wondering if anyone has some good debugging tips, or how I might get better information or crash reports ?
Q2) Is Serial.println() actually giving me good data here, or is it possible that the crash cuts off Serial flushing before the meat that really caused the issue?

I’m going bug-eyes from reading my terminal spitting out thousands of A,B,C,D,Es.

Cyan flashing indicates trying to connect to Wi-Fi.

So, to clarify, your program is continuing (Serial Monitior sees loop() activity) but your device gets ‘stuck’ trying to re-connect to WiFi?

maybe post your code…

Thanks for your interest!

Most times the photon does reconnect, and the whole while the serial monitor sees loop() activity (and temps being taken while disconnected, etc).

Frequently, the serial monitor stops getting loop() activity, and the device never recovers. remaining in a steady Cyan blink.

There are still circumstances where user code can affect the System Thread so you may want to review that.

Also, you could try to manage the Wi-Fi connection yourself, and still allow your main function to run.

You can add this line to your code to get some extra info of the system during the boot and connection tasks

SerialLogHandler logHandler;

If it’s consistently happening after 5 times it might indicate that you got all available sockets stalled.
I would opt for SEMI_AUTOMATIC and actively check the connection in a millis() or Software Timer and in case of prolonged disconnects issue a WiFi.off() and start the module from scratch.

If your code uses any other sockets (e.g. in a lib), make sure sockets held by these are also returned to the system.

Thanks for the tips. I’ve added the SerialLogHandler, and this is what I’m seeing in the tail of two crash logs

0000623931 [app] TRACE: A
0000623932 [system] WARN: Resetting WLAN due to SPARK_WLAN_RESET
0000623931 [app] TRACE: B
0000623932 [ap

–and–

0002110391 [app] TRACE: A
0002110392 [system] WARN: Resetting WLAN due to SPARK_WLAN_RESET
0002110392 [app

“TRACE A” and “TRACE B” are just two of the Log calls in my loop() (I have A,B,C,D, etc… inserted between each function call in my loop() )

I’m not using sockets in any libraries, and it didn’t seem like it was consistently “every 5 disconnects”… I had to cycle the router a dozen or so times to get one of the crashes.

Any other ideas before I look into managing the connections myself?

So I’ve had some more interesting results.

In my main .ino file, I have commented out the entire body of the setup() and loop() functions.
The only exception is Serial1.begin(9600); in setup, which I figured was important for the log tracing.
I also have:

SYSTEM_THREAD(ENABLED);
SerialLogHandler logHandler(LOG_LEVEL_ALL);

I can still reproduce the Cyan blink of death if I disconnect and reconnect the little wifi router the photon uses. Firmware is 0.6.1

This points to a critically important bug in the System code, as the device can only be recovered by a manual reset, which isn’t possible (or desirable) for a field/fleet device.

Here are two separate logs from the reproduction:
https://pastebin.com/K9g8NLQ9
https://pastebin.com/vh5T2HZi

I’m not sure if this post will get any new attention being a bit old now, but this feels like a significantly new revelation. Thanks for your interest.

Hmm, I can’t reproduce neither the cyan blink of death nor the need for Serial1.begin(9600) with any of my devices.

SYSTEM_THREAD(ENABLED)
SerialLogHandler logHandler(LOG_LEVEL_ALL);

void setup() {
}

void loop() {
}

Provides me with this log when starting off connected, then depower my WiFi AP and repower

0000018658 [comm.sparkprotocol] WARN: bytes recieved error -9
0000018658 [system] WARN: Communication loop error, closing cloud socket
0000018758 [system] INFO: Cloud: connecting
0000018762 [system] ERROR: Cloud: unable to resolve IP for device.spark.io
0000018762 [system] WARN: Cloud socket connection failed: -1
0000018764 [system] WARN: Internet Test Failed!
0000018764 [system] WARN: Resetting WLAN due to 2 failed connect attempts
0000018764 [system] WARN: Handling cloud error: 2
0000018864 [system] WARN: Resetting WLAN due to SPARK_WLAN_RESET
0000018965 [system] INFO: Network Connect: !SPARK_WLAN_STARTED
0000018965 [system] INFO: ready(): 0; connecting(): 0; listening(): 0; WLAN_SMART_CONFIG_START: 0
0000018967 [hal.wlan] INFO: Using internal antenna
0000018973 [system] INFO: ARM_WLAN_WD 1
0000018974 [system] INFO: ARM_WLAN_WD 4
0000018974 [system] INFO: ARM_WLAN_WD 4
0000040028 [system] INFO: DHCP fail, ARM_WLAN_WD 5
0000040129 [system] INFO: Network Connect: SPARK_CLOUD_CONNECT && !network.connected()
0000040129 [system] INFO: ready(): 0; connecting(): 0; listening(): 0; WLAN_SMART_CONFIG_START: 0
0000040131 [system] INFO: ARM_WLAN_WD 1
0000040908 [system] INFO: ARM_WLAN_WD 2
0000040908 [hal.wlan] INFO: Bringing WiFi interface up with DHCP
0000040980 [system] INFO: CLR_WLAN_WD 1, DHCP success
0000040982 [system] INFO: Cloud: connecting
0000041095 [system] INFO: Resolved host device.spark.io to 52.90.98.3
0000041230 [system] INFO: connected to cloud 52.90.98.3:5683
0000041230 [system] INFO: Cloud socket connected
0000041232 [system] INFO: Starting handshake: presense_announce=1
0000041232 [comm.sparkprotocol.handshake] INFO: Started: Receive nonce
0000041379 [comm.sparkprotocol.handshake] INFO: Encrypting handshake nonce
0000041473 [comm.sparkprotocol.handshake] INFO: Sending encrypted nonce
0000041473 [comm.sparkprotocol.handshake] INFO: Receive key
0000041637 [comm.sparkprotocol.handshake] INFO: Setting key
0000042017 [comm.sparkprotocol.handshake] INFO: Sending HELLO message
0000042020 [comm.sparkprotocol.handshake] INFO: Receiving HELLO response
0000042177 [comm.sparkprotocol.handshake] INFO: Completed
0000042177 [system] INFO: Send spark/hardware/max_binary event
0000042178 [system] INFO: spark/hardware/ota_chunk_size event
0000042185 [system] INFO: Send subscriptions
0000042186 [comm.sparkprotocol] INFO: Sending TIME request
0000042188 [system] INFO: Cloud connected
0000042593 [comm.sparkprotocol] INFO: Received TIME response: 1492670398
0000043502 [comm.sparkprotocol] INFO: Sending A describe message
0000043735 [comm.sparkprotocol] INFO: Sending S describe message
0000044643 [comm.sparkprotocol] WARN: bytes recieved error -17
0000044643 [system] WARN: Communication loop error, closing cloud socket
0000044745 [system] INFO: Cloud: connecting
0000044795 [system] INFO: Resolved host device.spark.io to 52.90.98.3
0000044935 [system] INFO: connected to cloud 52.90.98.3:5683
0000044935 [system] INFO: Cloud socket connected
0000044935 [system] INFO: Starting handshake: presense_announce=1
0000044937 [comm.sparkprotocol.handshake] INFO: Started: Receive nonce
0000045069 [comm.sparkprotocol.handshake] INFO: Encrypting handshake nonce
0000045163 [comm.sparkprotocol.handshake] INFO: Sending encrypted nonce
0000045163 [comm.sparkprotocol.handshake] INFO: Receive key
0000045346 [comm.sparkprotocol.handshake] INFO: Setting key
0000045726 [comm.sparkprotocol.handshake] INFO: Sending HELLO message
0000045728 [comm.sparkprotocol.handshake] INFO: Receiving HELLO response
0000045874 [comm.sparkprotocol.handshake] INFO: Completed
0000045874 [system] INFO: Send spark/hardware/max_binary event
0000045875 [system] INFO: spark/hardware/ota_chunk_size event
0000045882 [system] INFO: Send subscriptions
0000045883 [comm.sparkprotocol] INFO: Sending TIME request
0000045885 [system] INFO: Cloud connected
0000046391 [comm.sparkprotocol] INFO: Received TIME response: 1492670402
0000047199 [comm.sparkprotocol] INFO: Sending A describe message
0000047432 [comm.sparkprotocol] INFO: Sending S describe message

What exactly did you do when saying this

What does disconnect and reconnect mean?

  • Your router keeps maintaining the WiFi connection to the Photon but has no internet connectivity, or
  • You cut the WiFi connectivity on the router, or
  • You depower the router, or

What does your local network look like?
Is your WiFi router also providing the DHCP address or is the IP address handled by another router which gets disconnected from the device or not from the device?

Thanks again ScruffR for your persistence! I c&p’d your code into a brand new .ino and flashed it and was still able to reproduce it after 4 or 5 disconnects.
The log is here: https://pastebin.com/jSJjgFJg

I’m using the Particle(Atom) IDE, Firmware 0.6.1

So essentially I get what appears on the surface to be similar logs to you. A successful connection seems to end with:
"[comm.sparkprotocol] INFO: Sending S describe message"
At this point the photon is breathing cyan as expected. Then I disconnect and reconnect (which I will describe), which is first noticeable in the logs with:
"[comm.sparkprotocol] WARN: bytes recieved error -9"

If you look through my logs, you’ll see that pattern a few times, indicating each time I dis/re-connected.

My network setup is:
Photon > DLink DAP-1320 Wifi Extender > Dlink DIR-825 Wifi Router > Cable Modem

The Photon is the only thing using the DAP-1320 extender. The Photon is exclusively configured with credentials for that same extender.

The DIR-825 Router is the one serving DHCP.

The extender is on a powerbar here at my desk. When I say I disconnect/reconnect, what I’m really doing is simply turning the powerbar off and on again, which cycles the extender.

About 5 seconds after I flip the switch off and on, the photon starts blinking green (and the extender has not yet booted back up).
About 20 seconds after that, the extender has booted up enough that the photon appears to connect to it (changes from blinking green to blinking cyan).
The extender is usually still trying to connect to the Route at this point, and the Photon will continue to blink cyan, and sometimes go back blinking green.

When the extender is fully connected again, it seems to take another 5 or so seconds for the photon to go from blinking cyan to breathing cyan. Sometimes, it just stays blinking cyan forever.

I hope that’s a more thorough description!

That definetly makes things a bit clearer.
Maybe add some logs to print out the IP address the Photon gets when it starts blinking cyan, as by that time your DHCP server won’t be able to provide the IP but for some reason the Photon thinks to have a valid IP otherwise it wouldn’t transition to cyan.
For such network setup you’d need to prevent the Photon from getting a wrong IP - either use a static IP or knock the WiFi back to re-request a valid DHCP IP.

Hey @ScruffR and @Horganic - hate to “thread jack”, but did you guys ever get anywhere with this?

My Photon was breathing Cyan all night here on my desk, then this morning my Internet connection flaked out. Reset my router (pulled power, wait, plug in) and it is experiencing the same behavior you describe. Now blinking Cyan endlessly. I have unplugged it and waited a few times and no result.

I was able to redo the wifi via my phone and the setup button (thought that might be it), but after I gave it the wifi credentials it just fell back into the Cyan blink of death. Just about to fire up my new ones and actually build something finally and now my first one is flaking out.

@charrold303, did you see this part from @ScruffR?

Maybe add some logs to print out the IP address the Photon gets when it starts blinking cyan.

Then report your findings.

Would love to know how to do that? Still do not know how to access the Photon from the console on my Mac (working on learning how to do that this morning).

Also my scenario is not 100% the same, but this thread was as close as I found. To whit:

  • I do not use an extender - direct to the router
  • I can “see” the Photon connected to my network (from the router) with an IP and no conflict therein
  • My flashing lights cycle is slightly different but ends the same with the endless Cyan blink

@charrold303, what is your router model? It could be blocking the CoAP port (UDP 5683), preventing connection to the cloud. As @ScruffR pointed out in an older thread:

You have no MAC filtering on your router?
The COAP port is open on your firewall?
You’ve got free IPs in your DHCP range on the router?
Does LED ever start flashing green rapidly?
How long is your SSID and passphrase?
Any “funny” characters (non ASCII) in it?
Can you try different (simple) SSID/PWD or even an open network?
Can you try your mobile phone as AP?

It is an ASUS RT-N66U

@charrold303, before proceeding you need to search this forum for solutions for:

  • Still do not know how to access the Photon from the console on my Mac
  • Print out the IP address the Photon

Then come back and report your results :slight_smile:

As I said, that’s what I am working on this morning. I just hoped that something that was marketed as the foundation for “simple and approachable IoT development” actually was.

I truly do not mind learning this stuff, and my statement is not meant to be as acerbic as it sounds, but as I go to productize what I build things like this really give me pause. I literally did nothing but simulate a power outage - and being that I am not the only one who is posting about this, I would expect that the platform would handle a common consumer-grade situation better.

I will continue to plug away at it.

@charrold303, a lot of factors can affect cloud connectivity. You may want to install the Particle CLI as it will come in handy for all your development needs.

Well the thread jacking was unfortunate, but here is a trimmed crash log where I do a Log.trace(String(WiFi.localIP())); inside my loop.
https://pastebin.com/5kZFm3Gx

Compared with my last log (without IP, showing just system/hal https://pastebin.com/jSJjgFJg ), the tail is essentially the same - but now you can see the IP addresses interspersed.

Does this give you guys @peekay123 @ScruffR any insight? (also, if it is possible do mods clean up threads?)