Corrupt Particle Keys Cause Constant Network/MQTT Disconnection

The problem
I recently have had a number of Electron devices out in the field (remote installations) develop corrupted keys (both device and server keys). My primary solution to this will be to begin using this library to backup the keys to my SD Cards for now, but that is still not a perfect solution, and I want an OTA option for true redundancy.

However, when the devices’ keys became corrupted, the devices also began constantly resetting their network connection at the system firmware level. I have other services via MQTT that I use, and every few seconds the devices would disconnect from the cell network. This was so bad that a cellular provider said they might have to reduce service quality for those devices because of the connection spam.

What I want to do
I ultimately would be interested in storing all my device key backups in a cloud database and sending them to the device over MQTT when I have a Particle Key error. However, I cannot do this if the system firmware is constantly disconnecting my network.

My questions
I am running my Electron devices in SYSTEM_MODE(AUTOMATIC); and with System Firmware v0.6.4 . I would really like to separate the network connectivity from whether or not I have proper Particle Cloud keys, so I have some questions to that end:

  • Why is the system firmware constantly resetting the entire network connection when only the keys are bad? Is this because the state machine is assuming a connection error after an unsuccessful cloud handshake?
  • How can I prevent the device from doing this? Is my only recourse to detect the number of disconnections and then explicitly permanently disconnect from the Particle cloud until I have the backup keys ready to load? (which is how the above library approaches bad key detection for v0.6.4)
  • Do subsequent system firmware versions fix any of this behavior? I use a lot of RAM and later system fw versions hog too much RAM for stable operation (amongst other issues) for me but if this is truly the best solution I can begin testing my fleet for a widespread upgrade (ugh).

@rickkas7 since you’ve worked with this in the past, do you have any insights? Btw your library above is rad / super helpful to probably fix this issue 99% of the time once I implement it, so thanks!

Also @ParticleD (whoops had tagged wrong Dave at particle haha) this was the issue I emailed support about a month ago. Zendesk auto-closed my ticket and my emails clearly weren’t getting through, so I gave up trying to get ahold of you guys. The support email thing really needs to get fixed, it hasn’t been functional for me at all.

1 Like

Wow I’m sorry the ticket got auto-closed. Do you have a ticket #?

If not I can try to search for it.

We have been really (really) busy over the last few weeks and our response times are higher than what they’ve been in the past, but a ticket that’s open should never close, so I want to determine what happened

So step one is generally reproducing the problem, and I’m trying to reproduce this behavior locally.

My error as experienced in the field was blinking cyan following by three flashes of orange. This was only fixed once I updated BOTH the device private/public key AND the server key. The device would continue to (kinda) function normally.

If I use the Particle CLI to load some intentionally “bad/corrupt” private/public device keys onto the device I get different behavior, with blinking cyan followed by a single flash of red (the first time) and then, after reset, the device blinks cyan then hard faults over and over and over.

So I have two questions (can’t find anything in the docs):

1. What is the difference between three flashes of orange, and one flash of red between cyan blinking?
2. Why is the device hard faulting over and over, but only after the second reset?

3. When using the System.set(SYSTEM_CONFIG_DEVICE_KEY, myKeyPtr, myKeyLength); command, it returns successfully, but if I set a “bad/corrupt” key with that, the device doesn’t appear to save it, after both pin and power resets. A particle keys save provides me the original key. How do I use System.set() properly? . I want to avoid using the direct dct writing approach because I don’t have enough space left in program flash for the overhead.

PS @ParticleD pm’d about the ticket part.

I’m pretty sure that System.set(SYSTEM_CONFIG_DEVICE_KEY, ...) does not work on the Electron, E Series, or Gen3 devices.

The function always writes to DCT_DEVICE_PRIVATE_KEY_OFFSET however the key for UDP devices is actually at DCT_ALT_DEVICE_PRIVATE_KEY_OFFSET. Same for System.get().

The code in my DeviceKeyHelperRK library to restore keys does not use System.set, it uses the underlying functions directly.

1 Like

Thanks, that makes sense, though it’s a bummer and should probably be removed from the Electron docs if possible. Guess I’ll have to see if I can gut some code to get enough space for those dct functions.

Regarding the dct stuff - I’ve found the equivalent function calls for fw v0.6.4, but I understand that they aren’t thread safe. Am I a fool to try to use them at all at that fw version in multi-threaded mode (even in single threaded block operation)? Am I setting myself up for failure if I don’t just upgrade to v0.7.0? I would only ever be writing to the block from an online database backup of the keys if there was a failure, so it would be rare, but I’m banking on it to be reasonably safe.

Then the other primary remaining issue is finding a way to reproduce the problem / understand the difference between the orange and red flashes.

@rickkas7, in your library you use the Private key offset, and then the size of both keys to copy the data from both keys. This makes sense for the photon.

However, for the electron, DCT_DEVICE_PRIVATE_KEY_OFFSET is located after DCT_DEVICE_PUBLIC_KEY_OFFSET, so the library appears to backup and restore the private key + part of the alt server public key. Is this intentional?

Looking at the public key on my electron, it’s all null. Further, particle keys doctor doesn’t touch the public key on the device at all.

Is there any reason to ever update the public key? It seems like all I need to do is just update the private key, but maybe I’m missing something. I guess I’ll use v0.7.0 after all to gain back some program memory size anyways, so the server key is a moot point.

I discovered that the attempts at connecting to the Particle cloud were blocking my TCP Client actions from completing, resulting in the disconnections I experienced.

I now resolve this by turning off the Particle cloud after consecutive failures and registering the device’s new keys over an MQTT connection.

System firmware version v0.7.0 and above automatically restore the Server public key, though it is also possible to restore a backup of that key if desired.

Since all of my questions were ultimately resolved, marking this as complete.

1 Like