Can't Stop Particle.connect() If Bad Keys

Sometimes Particle private keys get corrupted, so I’m working on ways to consistently detect and correct this issue.

Unfortunately, if a device has bad private keys, the Particle.connect() handshake will take around 75 seconds. This process ALSO locks down the modem, meaning that I cannot perform some TCPClient operations until after that lengthy timeout. I also must manually stop Particle from reconnecting right away, or this blocking will keep happening.

I tried calling Particle.disconnect() to stop the process, but it has no effect until after the Particle.connect() timeout is reached. Is there any way to stop a Particle.connect() handshake that has already started?

Steps to reproduce:

  1. Load the device private key from a different device of the same type using particle keys save test for the other device and then particle keys load test for the device-under-test.
  2. Flash the following code
#include "Particle.h"

// STARTUP(cellular_credentials_set("hologram", "", "", NULL));    // if using 3rd party sim

// SYSTEM_THREAD(ENABLED);      // same result either way
SYSTEM_MODE(SEMI_AUTOMATIC);

#define START start_time = millis()
#define ELAPSED (millis() - start_time)

uint32_t start_time;

bool particle_disconnected() { return !Particle.connected(); }
bool cellular_disconnected() { return !Cellular.ready(); }

int last_cloud_status = cloud_status_disconnected;

bool wait_cloud_status_connecting() { return last_cloud_status == cloud_status_connecting; }

void cloud_status_handler(system_event_t event, int param)
{
    last_cloud_status = param;

    if (param == cloud_status_connecting)
    {
        Serial.println("Connecting to Particle Cloud...");
    }
    else if (param == cloud_status_connected)
    {
        Serial.println("Connected to Particle Cloud");
    }
    else if (param == cloud_status_disconnecting)
    {
        Serial.println("Disconnecting from Particle Cloud...");
    }
    else if (param == cloud_status_disconnected)
    {
        Serial.printlnf("Disconnected from Particle Cloud after %lu ms", ELAPSED);
    }
}


/* This function is called once at start up ----------------------------------*/
void setup()
{
    Particle.disconnect();

    Cellular.on();
    Cellular.connect();

    System.on(cloud_status, cloud_status_handler);

    Serial.begin(9600);
    delay(10000);

    Serial.println("Beginning Tests for AUTOMATIC Mode with thread ENABLED");

    Serial.println("Connecting to Particle");
    START;
    Particle.connect();
    Serial.printlnf("Particle.connect() returned after %lu ms", ELAPSED);

    Serial.println("Waiting for Particle Cloud connection process to begin");
    waitUntil(wait_cloud_status_connecting);
    Serial.printlnf("Connection process started after %lu ms", ELAPSED);

    Serial.println("Attempting Particle cloud disconnect");
    START;
    Particle.disconnect();
}


/* This function loops forever --------------------------------------------*/
void loop() {

}
  1. Monitor with particle serial monitor:
Beginning Tests for AUTOMATIC Mode with thread ENABLED
Connecting to Particle
Particle.connect() returned after 0 ms
Waiting for Particle Cloud connection process to begin
Connecting to Particle Cloud...
Connection process started after 91 ms
Attempting Particle cloud disconnect
Disconnected from Particle Cloud after 81058 ms

Interrupting the Particle connection in this case I believe to be impossible. Or at least I’ve never successfully been able to do it.

However, there is another way you could do this. When you have “bad keys” they’re not really bad. What happens is Device OS generates a new pair of keys, which no longer match what the cloud has. The Device Key Helper restores the keys to what they were previously, so they match what’s stored in the cloud.

While I didn’t implement it this way, I considered it:

There’s no reason you have to wait until the connection fails to restore the keys. If you know what the keys are supposed to be, you can examine them before doing Particle.connect. If they don’t match what you think they should be (stored in EEPROM, external flash, FRAM, sent MQTT, etc.) then you could preemptively put the correct keys back and the connection won’t fail.

3 Likes

The reason I didn’t do it that way in DeviceKeyHelperRK is that things get messy if you manually reset the keys using particle keys doctor if you do it that way.

If you wait for failure, then resetting the keys externally will update the keys on both sides, the connection will succeed, then on the next successful connection it will notice that the saved key no longer matches and will rewrite the saved key.

Just something to keep in mind if you preemptively restore the keys - you’ll need a way to deal with that case.

1 Like

Ah, that’s right, I hadn’t caught that nuance before. I had previously thought it only generated new keys if you wrote "0xFF"s to the key location, but I just confirmed via test that you’re correct.

Btw, if I load up a key that isn’t just “wrong” but doesn’t represent a valid ECP_DP_SECP256R1 private key (same size, but randomly change some bytes), the Electron immediately panics with an Assertion Failure. Any clue why this is happening and if there are any checks I can do to avoid this (besides a robust checksum)? Not sure why this is an irrecoverable condition.

To your point of restoring the keys immediately - that makes a lot of sense. I suppose I could use a file on my SD Card as a flag that I’ve tried to use my backup keys so that I avoid an infinite reset loop if they are also corrupt. That way, the first time a cloud connection attempt fails, I’ll try and load the backups to compare. If they are the same and I still can’t connect OR they are different but my special flag is set on the SD card I’ll stop trying to reconnect and send my server my device’s new key via MQTT to be registered with Particle

I didn’t fully understand your last point though - the important thing is that regardless of anything else I need to make sure that the private key on the device matches the public key that Particle has stored for that device, correct? Or are you just saying that if I send over a new key that I’ve newly registered, that I have to make sure the backup get’s updated too so that it doesn’t overwrite my new, correct, key?

As an added note, device keys corrupting is known to be related to this bug bounty. Bug Bounty: Electron not booting after battery discharges completely

The article a bit dated, as it refers to the Electron specifically, but can occur with the Photon too.There seems to be an issue with the STM32 microcontroller having a rare instance of corrupting the flash at different areas. In most extreme cases, it can be the bootloader, but I have taken note of it occurring with keys, wifi credentials, and at times, even device OS and user applications.

Yes, that's what I was saying.

Btw, if I load up a key that isn’t just “wrong” but doesn’t represent a valid ECP_DP_SECP256R1 private key (same size, but randomly change some bytes)

As far as I know, a corrupted key has never happened. What does happen is that either both configuration sectors get erased, or Device OS can't figure out which one is the valid one, so it just recreates everything from scratch with new keys.

1 Like

Good to know. I think I’m just going to upload the new key to my servers and make a server-side function to update the public key in the Particle cloud, so that I never need to worry about ever actually writing keys to memory onto the device. Seems to be working pretty well, and since it’s so rare, probably won’t bother with the local backup either.

Thanks for all the insights - super valuable to have you so active in the community here!

Hmm so if I’m understanding this correctly, Particle.connect() is completely blocking in all system modes and with System thread enabled and disabled? I thought that at first, but then I saw the system event cloud_status_connecting and figured why would that exist if no portion of user code would ever be able to analyze it as true. Am I misunderstanding something?

No. It only blocks if the Cellular is not already connected, AND it is in single threaded mode. In threaded operation it simply sets the flag for the system to attempt to reconnect at the next available opportunity. System Events don’t always hit user firmware immediately but the event does in fact imply what you think it does. It is also useful for tracking state changes irrespective of using it as a direct trigger.

Here is a table I created by testing each function myself for every mode:


(edit for a correction on the mode and table for particle.connect when cellular is not connected)

2 Likes

Thank you, much appreciated!