Bug bounty: Kill the 'Cyan flash of death'


#21

This smells like a race condition to me; so adding timeouts is just a kludge that kicks the can down the road.


#22

@mtnscott there are two potential issues to debug: why the Core disconnects, and why it fails to re-connect.

For the former (why it disconnects), the failure, I believe, occurs somewhere in the SparkProtocol::event_loop() function:

As part of the main loop (main.cpp:142), Spark_Communication_Loop() is called (main.cpp:178, spark_utilities.cpp:402). This returns the results of spark_protocol.event_loop() (spark_utilities.cpp:404), which is defined in core-communications-lib, spark_protocol.cpp:125. If this function returns false, I believe that’s what causes the CFOD (main.cpp:185).

Now as for the second issue: in the SPARK_WLAN_Loop(), if the Core is not connected, it calls Spark_Connect() (spark_wlan.cpp:533). For some reason, this is not re-initiating the connection like it's supposed to. Because we never see any further changes to the LED, it suggests to me that Spark_Connect() may be blocking forever.


#23

dermotos - take a look at this thread - https://community.spark.io/t/core-without-cloud-fixed/1072


#24

Thanks but I cant really understand that. Im currently use the cloud IDE, I haven't had much luck figuring out how to do it all locally.


#25

@dermotos Just follow the instructions from https://community.spark.io/t/the-spark-core-firmware/532

I'm an arm-newbie and got it work (with OSX) !

Best Wishes,

Frido


#26

I'm testing a potential fix to CFOD (deactivating the watchdog reset), which I have committed to a branch:

https://github.com/spark/core-firmware/tree/feature/debug-cfod

(core-communication-lib and core-common-lib stay on the master branch)

Unfortunately it takes a while to run the test, but if anyone wants to get their Core going with this and tell me if it fixes the issue, let me know!

Here's my test:


#27

Is it possible to perhaps post the .BIN so that I only have to install the dfu upload utility instead of the whole toolchain? I'm able to reliably get a CFOD, so I think I'd be a good test case. I'm just not sure I'm ready to load up yet another tool-chain onto my laptop.

Dave O


#28

If you follow the Github link Zack provided, there appears to be a .BIN file in the Build folder


#29

Indeed! Thanks very much. I hadn't even though to look there. I'll get dfu installed and give it a whirl!

Dave O


#30

Awesome, thanks for your help!

The annoying thing about this bug is that it takes forever to test. tired_face


#31

I just flashed my Core with this firmware over the USB. If I do a OTA upload through the Spark Cloud of a new application does that also overwrite the firmware each time?


#32

Yup, it sure does! Anything you flash OTA is compiled from the compile-server2 branch and not master.


#33

ok perfect. I have one Core running now with a code that was frequently freezing with plenty of calls through the Spark Cloud with both functions and variables


#34

After 11 minutes my Core got stuck in a Cyan blinking loop and it took about 2,5 minutes for it to restart and reconnect to the network and start running the code again.


#35

So I've got the fix running on my Core displaying a countdown on an I2C OLED display.


#36

so my Core frooze up again this time breathing the Cyan Led nicely but no calls through the Spark Cloud api goes through. All returns a 408.

After 10 minute of no response I pressed reset on the Core and it connected straight back up and became responsive through the cloud API.


#37

@zach Just downloaded the changed main.h file and rebuilt my firmware, I get further, but my core still goes into CFoD. This time it took 67 mins, before it was ~30. Sorry frowning


#38

I was able to upload the core-firmware.bin by downloading it from git and then just using the dfu-util to flash it over. However, that will give me "Tinker" installed on my core and not my other simple app (which I know will CFOD within 120 minutes.)

I assume that pushing code via a REST call results in the same pipeline being exercised and it will revert my core to the compile-server2 branch as well. So it sound like if I want to test this out, I need to get the dev environment set up?

Trying to help out here, not add to the burden! smile

Dave O


#39

@sjunnesson that actually sounds like a different issue that has to do with blocking code and long delays in the loop().

Is that true of your code?


#40

Yes, that's correct - to test out firmware-in-progress, you'll have to build on your own machine.