Bug bounty: Kill the 'Cyan flash of death'

@zach - I’ve had my spark running now for 8+ hours. It does seem to go into a flashing red followed by a reconnect and recovery, but the application code does not seem to hang. What I did was to pull the latest from all of the source bases, make one change to add the SPARK_WLAN_RESET = 1 in SPARK_WLAN_Loop. I did not reduce the other timeout. So this change has made a HUGE difference for me, now I can actually have a spark running for more than 30 mins. Thanks very much!

1 Like

I was able to get my 2nd core running all night long on my home network last night. It disconnected quite a bit, but reconnected like a champ everytime.

Wonder what it is about my work wifi that causes flashing blue on a reconnect sometimes.

Setting the aucInactivity back to 60 seems to give better results. I’ve also observed that sending data to the Core faster than every 200mS also makes the CFOD worse. I’m going to look into this deeper today and post the results.

@Hypnopompia the blue flashing is indeed weird.

I putted back aucInactivity to 60 and it works fine too.

exactly - I never changed
aucInactivity I left it at is previous setting of 60 - works very nicely.

I flashed the latest CFoD .bin and found that my core is “rebooting” almost continuously (running Tinker) if no Cloud requests are coming in. If I make a Cloud request every 5 seconds, that seems to help it stay alive for longer. I let it run overnight, making an analogRead to tinker every 5 seconds and here is my results:

Total Reading: 64
Average Uptime: 7.04 minutes
Max uptime: 27.65 minutes
Min uptime: 0.27 minutes

The only good news is that the core does indeed recover from the Cloud disconnects, but the increase in the frequency of disconnects (in my case) makes this not a very good fix.

Dave O

3 Likes

You can reduce the frequency of reconnects by setting the aucInactivity back to 60. I have made that change in the latest commit to feature/debug-cfod branch.

1 Like

Previously I was getting the CFoD at fairly regular intervals, and would continue to get cyan. Core would not be accessible when running overnight in my office by the next morning. I moved the core from about 10m distance to next to the router (2m), and it has now been running since 9th jan without disconnecting/needing a reset. Nothing else changed on our network or on my core.

So I wonder if it has something to do with signal strength or other networks in range (particularly open networks), when scanning on my phone I can see over x10 wifi networks… and I surmise that the correct wifi connection momentarily drops, and the cc3000 tries to connect to another network open network, especially when at distance the other networks have a stronger signal strength. Which then leads to a ‘credentials not accepted’

Maybe can replicate by momentarily switching originla router off/on with a stronger second open network present.

2 Likes

@mohit rocks!

My :spark: made it the first time “through the night”!

Edit:

  • Over 45.000 seconds uptime and still running!
  • :frowning: loop() died at 82.866 seconds - LED still pulsing… -> RESET
  • 141.260 seconds and running - LED is solid (sort of Soylent :wink: ) Green
1 Like

I have a 24 hours uptime! With only 5 reconnects. :smiley:

Thank you @mohit!

b.t.w. do spark have possibility to update CC3000 firmware? I mean how hard would it be to update if you find a solution (not a ‘reconnect’ workaround)?

Yes, we can update the CC3000 firmware; right now there’s a process to do so over usb using our patch programmer, but if this is a widespread issue that comes with a Texas Instruments fix (rather than a fix on our side), we can deliver a firmware update over the air to the CC3000.

any new updates regarding this bug. it seems to be kind of solved but just in a hacky non permanent way. I’m experiencing longer up times now but I also get the blinking green light permanently a couple of times which requires a hard reset to kick the Core back into working. Is this related you believe?

Im more then happy to try out some experiments if needed just let me know when and what to pull.

Also the Texas thread seems dead. Do we have any other updates regarding that lead? Are you working on getting the logs they requested?

@sjunnesson - I can only say I have run one test for 12 hrs and no freeze, since I only have one :spark: to dev and test on so I don’t get much time for long term tests. I have ordered several more :spark: yesterday so when they arrive I can do more long term tests. Anyone know the current lead time on new :spark:?

@sjunnesson This is definitely a work-around, not a complete fix, and it’s mostly about making sure that when it does run into an issue, the Core restarts so that it stays on long term.

Surprised it’s getting stuck in blinking green - what that suggests to me is that the Wi-Fi connection is weak, and perhaps it’s failing to connect to the network. Does this behavior happen when it’s very close to the router? We could put in some logic to automatically retry the network too.

We are getting a Core set up with the CC3000 logger so that they can get the data they need to do debugging, so while the TI thread is dead for the moment, we’ll be reviving it soon

Just had a great :spark: of creativity

Blip the green led based on the RSSI (received signal strength indication)

// blip = 150ms on, 150ms off
// 1 blip - RSSI = 0 - 20%
// 2 blips - RSSI = 20 - 40%
// 3 blips - RSSI = 40 - 60%
// 4 blips - RSSI = 60 - 80%
// 5 blips - RSSI = 80 - 100%
//
// Demo walks through blipping once through five times, then repeats.
//----------------------------------------------------------------
int y = 1;
void setup() {
    pinMode(D7,OUTPUT);
    digitalWrite(D7,HIGH);
    delay(1000);
    digitalWrite(D7,LOW);
}

void loop() {
    // This looping just ensures we don't block 
    // the main loop for more than 10 seconds...
    for(int x=0; x<5; x++) {
        blipLED(y);
    }
    y = (y==5)?1:y+1; // if y==5, set y back to 1, else inc. y
}

void blipLED( uint8_t num ) {
    RGB.control( true );
    while(num-- > 0) {
        RGB.color( 0, 255, 0 ); // green
        delay(150);
        RGB.color( 0, 0, 0 ); // off
        delay(150);
    }
    delay(850);
    RGB.control( false );
}

Interesting idea. Do we currently have a way to get the RSSI? There has to be some low level code in the WiFi library that has access to it.

Never mind! I figured out how to get the RSSI. I'm writing up a quick function for you guys to test out.

Essentially, you use wlan_ioctl_get_scan_results(unsigned long ulScanTimeout, unsigned char *ucResults) to read the scan table. This table is updated at a predetermined interval, the default being 10 minutes but the Spark Team may have it set to less. You can also set the scan interval yourself with another command, which I'll work on later.

The function spits out 52 bytes per AP found, the first four bytes being the number of APs left in the scan table; the CC3000 keeps track of the nearest 16 APs.

Here's a quick script I whipped up to read out a list of APs:

void loop() {

unsigned char wlan_results_raw[52];


    wlan_ioctl_get_scan_results(0, wlan_results_raw);

    for (int i = 0; i <= 52; i++) {
        Serial.print(wlan_results_raw[i]);
    }
    Serial.print(" ");
    Serial.println(loopNum);
    delay(1000);
    loopNum++;
    
}

This will query the scan table once a second. Here's the result from my core:

    50001000139596208410510911111610412132383283116971140000000000000000009651752322417116139208 0
            
                40001000137406201161051099895103117101115116000000000000000000000010251752322417116139208 1
            

300010008328620108105110107115121115105108108327097114109000000000000000000221822051835216139208 2
            
                200010007721620688570504800000000000000000000000000003898118817716139208 3
            
                1000100011963620801111071011143272105108108327097114109000000000000000000308224545116139208 4
    
        
0000100000000000000000000000000000000000000000000016139208 5

So we can easily use this to parse the RSSI from the current WLAN network!

Here's the documentation on the wlan_ioctl_get_scan_results(unsigned long ulScanTimeout, unsigned char *ucResults) function:

//*****************************************************************************
//
//!  wlan_ioctl_get_scan_results
//!
//!  @param[in]    scan_timeout   parameter not supported
//!  @param[out]   ucResults  scan results (_wlan_full_scan_results_args_t)
//!
//!  @return    On success, zero is returned. On error, -1 is returned        
//!
//!  @brief    Gets entry from scan result table.
//!            The scan results are returned one by one, and each entry 
//!            represents a single AP found in the area. The following is a 
//!            format of the scan result: 
//!                 - 4 Bytes: number of networks found
//!          - 4 Bytes: The status of the scan: 0 - aged results,
//!                     1 - results valid, 2 - no results
//!          - 42 bytes: Result entry, where the bytes are arranged as  follows:
//!              
//!                                          - 1 bit isValid - is result valid or not
//!                                         - 7 bits rssi - RSSI value;         
//!                 - 2 bits: securityMode - security mode of the AP:
//!                           0 - Open, 1 - WEP, 2 WPA, 3 WPA2
//!                                         - 6 bits: SSID name length
//!                                         - 2 bytes: the time at which the entry has entered into 
//!                            scans result table
//!                                         - 32 bytes: SSID name
//!                 - 6 bytes:        BSSID 
//!
//!  @Note      scan_timeout, is not supported on this version.
//!
//!  @sa        wlan_ioctl_set_scan_params 
//
//*****************************************************************************

Now, once you get the right SSID and read the RSSI bits, you need to do some math on it:

The RSSI is represented by the 7msb bits of byte #8. The result should then be substracted by 128.

The lsb bit, only indicates a valid result.

For instance, if you get 0xA7 ==> 10100111 ==> 1010011 ( wihtouth the last bit) ==> 0x53 ==>83(dec) --> the RSSI is 83-128 = -45dBm.

Once you've got the the actual signal strength, you can either display it on an LCD or use the map() function to work with @BDub's script above.

Now, I laid all this out here for general reference. I think within a couple of hours I can have a working function up, if you guys want to test it and it works well, I'll submit it to core-common to be included as a full function. :smile:

1 Like

Good job Tim! I was doing the same exact thing :smile: But I started watching my backlog of 3 episodes of Gold Rush so you got more results than I did. I was going to hard code the length of the array, but used strlen() instead and it was giving me trouble.

Glad to have some help!

Keep in mind we need to find the RSSI of the AP we are connected to, not our neighbor’s AP. I guess we need to compare the SSID to the stored one?

The first bit (bit0? or bit7?) denotes if the other 7 bits are valid apparently, so it’s important to check that. That forum post said lsb is first bit, but who’s to say it’s not bit 7? The way they organize that table of bytes seems like they are organizing it from MSB to LSB. Where the length of the SSID is in that character array will help I guess, can double check against actual SSID. Reverse engineering stuff that’s supposed to be released for production kinda sucks… grumbles… Ti.

This file seems to suggest it’s bit7:
http://mbed.org/users/dflet/code/CC3000Test/raw-file/17c37c0b0534/CC3000TestApp.cpp

isValid & rssi - 1 byte - a packed structure. The top bit (isValid)
indicates whether or not this structure has valid data,
the bottom 7 bits (rssi) are the signal strength of this AP.

Also RSSI is only relative percentage. I wouldn’t try to convert to dBm because without calibration for the Spark Core specifically it’s going to be worthless. Hopefully with 0 signal we get a value that’s 0?? Then it should be easy enough to map it to percentage.

These are the numbers you calibrate to make the scan results meaningful.

wlan_ioctl_set_scan_params(
            1000,   // enable start application scan
            100,    // minimum dwell time on each channel
            100,    // maximum dwell time on each channel
            5,      // number of probe requests
            0x7ff,  // channel mask
            -80,    // RSSI threshold
            0,      // SNR threshold
            205,    // probe TX power
            aiIntervalList  // table of scan intervals per channel
            );

I thought you might be working on it too. :wink:

I’ve made some great progress! I’ve almost got it decoding the entire 50 byte string and printing it out over the serial port:

5000100015359185084105109111116104121323832831169711400000000000000000096517523224171 SSID: Timothy & Star Loop: 183

I’m writing the code to decode the rest of the data, but I thought I’d start with SSID since it was easily verifiable data. Here’s the code to do that:

char ssid_name[31];
for (int i = 12; i <=43; i++) {
    int arrayPos = i - 12;
    ssid_name[arrayPos] = wlan_results_raw[i];
}

The SSID is supposed to be 32 bytes; it should start on byte 13 and end on byte 44. So we simply start at the appropriate place in the wlan_results_raw char array and iterate 32 bytes.

That dBm calibration was confirmed by TI engineer, apparently. The Spark Core conforms to TI’s reference design for the CC3000, right? Then it should be valid.

I don’t think we need to mess with wlan_ioctl_set_scan_params until we find out what the default Spark Core values are, since everything but the timeout is saved across restarts. It does seem like the Core is changing the default timeout value though, because my list is updating every 60 seconds.

I’m gonna finish up this parsing code so we have good values to work with. Do you want to see if you can find out a way to read what SSID the Core is currently connected to @BDub? I took a quick look through some of the code, but couldn’t see a way to easily get at the SSID acquired through SmartConfig. I imagine it’s got to be in the CC3000 EEPROM, right?

It would be nice if wlan_ioctl_get_scan_results would mark the currently connected AP. :signal_strength: