Bug bounty: Kill the 'Cyan flash of death'

My setup (if it helps in anyway):

I have two cores in “production”.
One controls my blinds and has a spark function setup to receive commands. This one goes down with CFOD every 36 hours approx.

The other one is a simple light switch. with a dimmer control (actually, a PSP joystick). It analogRead's a button pin and an X and Y pin for the joystick on each loop iteration. It has no reason to talk to the outside world. Its encased in a box, so I cannot confirm its actually flashing cyan, but it dies every few hours, maybe between 2 and 6 hours. I have this connected to an LED strip. Sometimes the LED strip comes on by itself when the freeze occurs. But if I power cycle the core the strip is always off at the start (as designed).

My router is an Airport Extreme (not most recent gen, the one before it). Running B/G/N on 2.4 and 5GHz with visible SSID. Internal IP range is 10.0.0.1 - 255 with a static external IP. Airport does DHCP. Connected to ADSL.

In the last week i’ve been having packet loss issues (about 3.5%) with my ISP. I only noticed this as my VPN to the office has started to drop since late last week, not sure if thats affecting it. I cant confirm if I did NOT have packet loss issues prior to last week, but they were not affecting me in any noticeable way. However the cores have been dropping out since before I started to notice the packet loss issue.

The Cores pretty much have line of sight to the base station, and are within 5 - 6 metres of it. Let me know if there is any other info I can provide.

Symptom: CFOD
Router: Apple Airport Extreme (MB763LL/A)
Wifi Repeater: Apple Airport Express (MC414LL/A)
Wireless Protocol: 802.11n
Location: Chattanooga, TN

Primary Network SSID: Shea Weber
Primary Network Security: WPA2 passphrase
Primary Network Range: 10.0.0.1/24

Guest Network SSID: Digital Minions
Guest Network Security: WPA2 passphrase
Guest Network Range: Don’t remember off the top of my head

I didn’t create a guest network until it was suggested as a possible fix in the other thread. Since then, I haven’t seen a drop at all. To my knowledge, no other wireless devices on my primary network have had any issues.

Thanks for that, I’ll try it later and report back after the weekend if I see an improvement too.

1 Like

No improvement. Still got 'em…

:frowning:

Sometimes LED fades saying “i’m ok” but loop() has stopped!

Frido.

1 Like

@timb, would you mind telling me what display you have ? I have been looking for a nice display for the spark and was looking at serial displays, but I2C sounds much better.

1 Like

Symptom: CFOD
Router: Asus RT-N66U running dd-wrt
Wireless Protocol: 802.11n
Location: PA
Network Security: WPA2 passphrase

1 Like

Can you tell us how the Spark.Core is suppose to behave when the network is lost? I understand the CBoD that occurs when it is trying to reconnect, but why does the user application stop? I would think that we want the Spark.Core to continue running the user application and continue to try and re-establish a connection with the cloud. It would be great if we could register an event handler that gets notified when the cloud connection is either lost or re-established. Having a device that hangs when the network is down is not very useful. I understand the current problem with CBoD is related to having a good connection, but when happens when the real connection is down, we can’t have the system go into a state where the user application does not get any cycles. Anyway those are my thoughts.

So I continue to get the CBOD using the updated firmware base. I have a Linksys E4200 router and a 1.5Mb DSL connection (sigh - I used to have a 40Mb connection back East). My application just blinks LEDs and responds to Cloud function calls to change the state and rate of blinking. It outputs an iteration counter to the serial port.

Hey @mtnscott,

There is separate work in progress to decouple the wifi connection from user code. See this thread:

1 Like

I don’t know if this report will help, but I have not seen a CFOD since I stopped calling TCPClient.stop. My playing around does not currently use the cloud features other than for the build IDE and downloading to the core. I have several little toy applications that scape web pages or RSS feeds for interesting things and display the info on a 2x16 LCD.

I was having the core keep a count of the number of times it hit a web page which I scheduled for every 5 minutes and display that count on the LCD. The biggest number I saw was 109, which is just over 9 hours of uptime.

I also have a UDP NTP client that I have been working on and it runs overnight without crashing as well. I don’t recall ever seeing UDP have a CFOD.

My apps DO spontaneously crash sometimes, but the core reboots gracefully and reconnects in a normal way as if I hit the reset button. During development, I have walked off the end of memory and had to reset the WiFi credentials and reflash tinker to get it to work sometimes. I would say these crashes are my fault, to the best of my understanding.

I am using an el-cheapo Netgear router with WPA2, since WEP didn’t work for me. I am having the core print its MAC and IP addresses to the display at startup, so I know that I am getting at 10.x.x.x address.

I don’t know if my good luck comes from the router (doubt it) or the lack of cloud IO or the lack of TCPClient.stop calls.

@mtnscott If you want a nice little graphical backpack that will work over UART, I2C or SPI, I’d take a look at Digole! They sell both a range of LCD’s and OLEDs with integrated backpack and just the backpacks by themselves. I’m using the 1.3" White OLED with the Core right now and it’s working great! One of the nice things is that the protocol is universal amongst all the different displays, so one library fits all.

I’d also recommend checking out the 1.8" Color OLED Module, 2.7" Backlit LCD Module, 1.8" White Backlight LCD Module and the Universal KS0108 Adapter.

Another handy feature is the built-in (user replaceable) UG8-compatible fonts and the ability to upload a startup bitmap or animation.

The raw command set is pretty simple and generally consists of ASCII characters followed by X number of option bytes. For example:

Wire.beginTransmission(0x4E);
Wire.print("CL");
Wire.print("SF");
Wire.write(18);
Wire.print("TT");
Wire.print("Cloud Uptime");
Wire.write(0x00);
Wire.endTransmission();

CL = Clear
SF = Set Font, 18 = Font
TT = Text (Followed by the text you want to display and 0x00 for EOL.)

Anyway, I’ve almost got the full Digole Arduino Library ported over to the Spark Core. If you want to step up to a graphic display and don’t need a touchscreen, I highly recommend picking up at least one of the Digole OLED units! As an aside, Digole ships displays from Canada and China, it should tell you somewhere on the product page; the stuff coming from Canada normally ships to the US in about a week!

I am not sure whether my report will be helpful but I am experiencing CFOD as well since today.

I have not started using the web IDE yet but I used the core with the relay shield and Tinker. It worked normally before, then I did not powered it up for a few days. Today, I powered it up, sent a few command from Tinker, and it worked for a few seconds before the CFOD occured. I thought something was wrong and I used the reflash Tinker command. And after that, none of any command went through.

The LED cycle: white -> green -> breathing cyan for 30 seconds -> flashing cyan for a few minutes -> red -> flashing cyan again.

I do not think the problem is with the router since it worked before, but here is the wifi details:

Router: TP-Link TL-WR41N
Mode: 11bgn mixed
Channel: Auto (current channel 7)
Location: MY
Network Range: 192.168.1.1/24
Security: WPA/WPA2 - Personal

I think that this problem is more than just losing connection with the cloud. Polling a variable read will CFOD my core consistently. Not polling it - but still running the same program - and it runs for days (although from my previous post you will see that it apepars on line but a variabel read returned nonsense).

I was looking into this but so far I am unable to duplicate running my RGB brightness demo for 96 hours on the jtag shield with 1A usb wall power supply. I have never seen this problem occur actually.

Can anyone provide a reliable way to duplicate? Running tinker and polling variables at a fixed rate?

@dorth, Can you provide the python script your using to poll the core and report uptimes?

In my case, the PWS makes no difference, I have used a 1A PWS in addition to using it connected to my Macbook Pro. I expose some functions, no variables to the cloud. I get CBOD within 30m consistently, once it lasted for 1h, but never longer than that. If I let it sit, I will get a flashing red briefly during the CBOD, it then goes back to CBOD.

Here is the python script I run with cron (every minute) to query an analog value from the Tinker app running on the core. If I poll every 15 minutes, my core will run for well over 24 hours. If I run cron every minute, it will CFOD within a few hours (see my post above for my results).

The script uses “requests” for REST handling and you need to have that installed (http://docs.python-requests.org/en/latest/. This is running on an Ubuntu box, but should be fairly portable.

Python Script to Poll Spark

Dave O

You just rocked my world with this find. This will actually be perfect for another project I have in my queue.

@dorth Thanks for posting the python script!

I have flashed the code from https://github.com/spark/core-firmware/blob/feature/debug-cfod/build/core-firmware.bin to my core.

The script is running at intervals of 1 minute. Hopefully it all turns out well with no disconnection :smiley:

Will keep you guys updated!

Awesome! I know a few other members here have had nothing but positive experiences with their displays. The backpacks are actually rather beefy little 64MHz PIC micros, so it handles all the actual drawing functions. You can just tell it “draw box of this size at these coordinates” and it handles the rest, which takes a ton of math off your device.

Oh, another cool feature is all of the displays (sans the color OLED) feature between 5 and 8 extra I/O ports! You use the DOUT command followed by a byte. (Each bit of the byte controls the state of each port.)

Finally, you can pass raw data to the LCD/OLED controller with the MCD (Manual Command) and MDT (Manual Data) commands.

They really are neat little devices and very fairly priced, too!

Another (simpler) method is just to use a curl call (see http://docs.spark.io/#/api/basic-functions) in a shell loop that cycles every 10 seconds or so. That should pretty much replicate what the Python script is doing. I ended up with the script because I had wanted to log a bunch of data (1-wire temperature readings), but kept CFOD-ing the core.

Cycling every 10 seconds definitely reduces the core running time until CFOD. Mine will not run over 30 minutes when polling this frequently.

I also noticed (and I believe someone else mentioned) that the CFOD is interrupted every 2 minutes with 2 RED flashes - then back to CFOD. Just figured I mention that as this is the first time I noticed this.

Dave O