This adds some additional checks and should prove to be more reliable
It send a udp packet every 60 seconds to 10.10.0.1 and does the same reading from the http get.
all the log data is critical, because the faults are logged after being detected and mitigated so they do not sent the CPU into a hardfault. Please sent logs and LED observations.
Thanks @david_s5
I ran the new binary on the Core and unlike the last time it smoothly recovered from most CFODs but then failed and got stuck in one in the end.
Iām not sure if I understood the question correctly. The RGB LED flashed RED twice in-between CFODs. The Core never connected back to the cloud. Does that make sense? Iām capturing the log on Serial1, do I need to do anything further?
I have written a decoder that (partially) groks the spi traffic when captured by a saleae logic probe. Matching this against tcpdumps of the wireless traffic shows me pretty conclusively that the CFOD that we have in captivity is not caused by a race condition/mutex snafu in the driver. This was the first place we looked.
To be sure there is plenty of opportunity to remove knuckleheadedness in the TI driver, but weāre pretty confident at this point that the CFOD is caused by one or more CC3000-internal issue(s).
Understood. It is the compound issues and recovery I am looking at. The code I have does nothing to detect the connect failures and reset anything. It is left this way to not mask any underlying issuesā¦I have fixed 8 issues that caused the code to hard fault or hang. After it is stable, I can add the recovery and it will take less the 60 seconds to recover.
No. The code is not faulting or the core would be stuck in a red blink pattern of ...---... N Blinks ...---... repeat.
Where N is one of the panic codes below:
From the logs you uploaded I see one issue that still need to be resolved.
That is socket management so the SparkProtocol does not read from the users socket as a result of the inactivity timer closing the sockets asynchronous to the user/spark code opening them.
I have something I will try in the AM that my resolve this as well.
Sure - how to get the decoder to you ? This forum software only attaches pretty pictures. It's a bit long to cut and paste here.
The decoder reads the data files created by the logic app, when it is configured to decode the SPI bus into hex. You dump that to a file, and feed it to my spectacularly ugly python hack that decodes the packets I care about to hunt down CFOD
That shows the last two packet writes (2 byte header, indicating 16 byte payload following) - these packets never make it out over the radio. Then the main core firmware gives up, and starts it's error handling - ipconfig(), close(), select() - but the CC3000 is borked at that point. SPI interface functions in zombie mode, but the radio side of the house is dead as a doornail. Packet traces show the cloud keeping on retrying until it quits in disgust.
And I want to be clear, I'm not knocking your work; just saying that I don't think quashing those bugs will stop CFOD.
Just adding my experience with the āFlash of Deathā
Iām using my Cellphone as my WiFi Hotspot and have been for years now.
The Spark Core stays connected just fine to it.
So Iām pushing data to Xivley and thats working just fine.
I start downloading movie torrents and stream a video on Youtube pulling about 1.5MB per second on hotspot data stream. Verizon kills the data connection due the high badwidth killing all connection to the internet. The torrent data transfer rate drops to 0. The WiFi signal is fine and all connected devices stay connected, there is just no data to and from the net.
Because of this the Spark Core starts to Flash Blue because the cloud is not accessible anymore. I reset the hot spot on my cell phone and every reconnects to the phone and data is flowing again. But the Spark Core never tries to reconnect to the WiFi Network, its remains flashing Blue and no further attempts are made to try to reconnect to my phones WiFi Hot Spot.
The only solution for my application is to program the Spark Core to power off and then on every 5 to 10 mins to guarantee it reconnects to the network like it does successfully after every reset. But if the Blue Flashing freezes up the code then there is no way a reset would actually run because that part of the code would never execute due to the Spark Core Freezing up.
Looking forward to the fix!
I wonder if Adafruit has this same issue with any of their CC3000 breakouts? If not then maybe this is a issue they already solved or could help out with??? They have sold a shit ton of them so I figure they should have had some of the same issues if they are using the same chip.
I guess this is not really a contribution but a comment about this issue in different environments.
Iām very impressed about something: I moved a core from my house where I can easily get days without any CFOD to my work and it doesnāt last more than 5 minutes. Here at work not only there are many WiFi connected clients but also there are many other networks from other companies (22 networks * their clients), still Iām able to easily Skype and use the internet from my phone, but the core comes very very unstable rebooting every time in less than 5 minutes.
I can do this but I need some clear step by step instructions on how to properly load this firmware.
Another thing I noticed is that if my Wifi connection dropped out completely that the Spark Core would go to flashing green and as soon as the WiFi network was available again it would successfully connect to the wifi network and the Spark cloud.
Itās only when the Spark Core Remains connected to the WiFi network and the connection to spark cloud is lost that the blue led flash begins and never recovers.
Iāve been running a data stream to Xivley successfully but about 2 hours ago even though the Spark Core is successfully connected to my Wifi network and the network is connect to the internet because all my other devices work it has stopped sending data. I have the on board blue LED flash 3 times after a successful transmission to Xivley but it has just stopped for some unknown reason even though the Spark Core says its successfully connect to the internet and the Spark Cloud.
Hey, Iāve had some success in my environment by turning on fast reconnect mode. Usually I would have the CC3000 act strange in that it would not respond to ICMP ping from my LAN like it normally does, but serial debugging output on the spark would show that netapp_ipconfig() was still returning the DHCP configuration. It would normally hang at this point on a connect() call with CFOD.
This would happen between 1 minute and 1 hour, and Iāve yet to have a CFOD in 8 hours with fast mode. Iām still running under heavily modified driver code (currently have DMA disabled on SPI).
In spark_wlan.cpp, the second argument to wlan_ioctl_set_connection_policy() will enable fast mode.
Iāve done some hacking on the Adafruit CC3000 library and am definitely watching this thread for any insight into TI CC3k driver issues. Unfortunately Iām not familiar with the spark core and donāt have a ton of input, other than to share your pain of working with the CC3k driver code.
Something that has been really helpful to me to debug lockup issues though is toggling output pins at the start and end of certain functions and using a multimeter to check if those outputs are high or low so I know where execution is getting stuck. In particular any functions in the driver that have loops are worth investigating deeplyāit looks like there are quite a few places where some condition might not happen and the driver will happily bury itself in a loop forever (am looking at a lockup issue in the HostFlowControlConsumeBuff() function inside socket.cpp now actually).