Consistently loosing credentials on Argon

Hi Everybody,
I've been recently facing an issue with our argon fleet, and it is that we are looosing the wifi credentials, randomly, usually within a day or two.
The devices always have 2 credentials configured: one for using with a mobile hotspot for setup, used by our technicians on the field, and the customer wifi credentials.
The only way to get them going again, is clearing the wifi credentials with the buttons and rewritting them again.
I've not seen any red flashes so far, it just looses them.
We don't have any code for clearing the credentials, but to write them, and that code is only used by a particle funciton (on the console, we are able to write the credentials).
I don't know at this point if the firmware is somehow overwritting and corrupting the credentials, or is another issue.
Have you had anything similar? have you solved it? I keep on searching bugs on the firmware but I'm running out of ideas.
Any help would be appreciated.

Best regards

The Wi-Fi credentials are stored on the flash file system. If the flash file system is corrupted, it could be erased, which would cause the credentials to be cleared. Do you use the file system at all?

Hi Rick
We do use the EEPROM, not sure what else uses it...
I do have one specific device that was replaced with a P2 (no issues there), and I did a full memory erase as instructed, reflashed the firmware and still the same issue.
How do I know if the flash file system is corrupted? Some devices are working fine on the same firmware... some are not, and sometimes, we wake up and 5 devices out of 7 on the same customer are offline.
Best regards

If you retrieve the saved credentials when they have been "lost" what is stored on the device?
Is anything/anyone capable of triggering the MODE button? Holding it down for 10s wipes your saved wifi credentials: Status LED and Device Modes | Troubleshooting | Particle

The EEPROM is saved on the device's flash, but should not overwrite the WiFi credentials.

Adding trace logs and replicating the conditions where the credentials are lost should help identify what's happening. Any function that's calling either clearCredentials() - WiFi | Reference | Particle or setCredentials() - WiFi | Reference | Particle shoudl preferably include log.info lines to ensure it's noted when the function is called.

I added some log info to retrieve the credentials.
Once they are lost, it finds no credentials (there is an example on particle docs for retrieving them and I used it).
I expected to find some corrupted SSIDs, but no, empty.
We have to go on site to trigger the mode button, we've done it as you said, and we rewrite them. There is someone there that could help us triggering it, but there is no use for it, since he can't rewritte the credentials. They don't want it neither! they are upset of why the devices aren't working.
There is no hay to log the devices on site, just publishing some lines, and I don't know if it's going to reliably work, just before credentials are cleared by the firmware though.

I just rechecked the only function that sets credentials, and actually it publishes the credentials that it just saved.
I haven't seen that message on the events feed, so it means it is not going through that function.

int changeAPpassword(String command) {
    JsonDocument doc;
    char data[256];
    String newSSID = command.substring(0, command.indexOf(','));
    String newPW = command.substring(command.indexOf(',') + 1);

    doc["SSID"] = newSSID;
    doc["Password"] = newPW;
    serializeJson(doc, data, sizeof(data));
    Particle.publish("Wifi Credentials", data, NO_ACK);
    delay(1000);
    WiFi.disconnect();
    WiFi.setCredentials(newSSID, newPW, WPA2, WLAN_CIPHER_AES_TKIP);
    WiFi.connect();
    Particle.connect();
    return 1;
}

Referring to the ticket that's open about this topic - I believe you were able to replicate this on a device under your control.
Enabling logging to catch when the device falls offline will help us understand what could have caused this.
While the EEPROM is simulated in flash, frequent use or overflows should not be written to the DCT where WiFi credentials are stored.

Also do you have an exposed physical button connected to the MODE pin? Holding down the MODE button will clear credentials.

If you use an external mode momentary switch connected by wires you should add an additional pull-up resistor as the 100K pull-up on the Argon may not be sufficient with longer wires.

Hi Rick, thank you!
No, the devices are inside an enclosure and you need a special key to open them. It is not that easy to acces and it is not what is happening here. There are no external switches for mode or reset.
Something to look for in the code?

Something in common with the devices is that they are using ethernet for Modbus TCP and WiFi for network connection, and they've been updated via OTA.

Also, the exact same code is running on 3 units Photon2 with no issues.

Losing Wi-Fi credentials on the Argon is not a known issue. What I would do is store a backup copy of the credentails on the flash file system. If the backup file also disappears, the problem is that the flash file system is getting erased. If the backup file is still present, you can restore the credentials. This isn't ideal, but it will keep the device online and at least eliminates the file system itself as the cause.

Thank you.
I've been working on a backup solution for the credentials.
But the issue is growing on our fleet.
We keep loosing the credentials.
If we erase and save the credentials again, the devices are connecting right away.
Also, if we reset the devices they struggle to connect once again.
There are some devices that still have the credentials stored but they do not connect, we have to erase and save them again.
This is not happening on some Photon2 we have on our fleet, with the same firmware.
I ran out of ideas looking at the code for potential issues related to this.
Please help!!

Implementing the backup idea has two purposes: it may mask the symptom of the issue, but it also may help determine if the file system is being corrupted (and reformatted), or something is specifically happening to the Wi-Fi configuration file, which is stored on the flash file system. The Argon and P2 share the same flash file system code (LittleFS) and most of Device OS, and while the Wi-Fi hardware is completely different, the way credentials stored is the same. Like I said, it's not a known problem so there's something unusual about your situation but it's impossible to say what that might be without further information.

Hi Rick
My idea is to store them on the flash, and recover them as soon as I acknowledge that they are gone (sending a debug message).
That shows another issue there, and is that it seems I have to wait several seconds before saving another credential, the argon gets stuck somehow if I save two credentials one after the other.

In the the WiFiCredentials structure, use setValidate(false) to prevent the newly added credentials from being validated, which is probably why it seems like it's hanging.

Hi Rick.
Re reading this... We do have a considerable amount of data that is being stored at the emulated eeprom, and is working properly, no data loss there. We configure the devices with several parameters and each time they are rebooted, they are loaded, so my conclusion is that this is only affecting the credentials, no the whole fs..?

If the EEPROM is working, then it's probably just the Wi-Fi credentials. This is not a known issue, and it's weird because it's just a file, and the code that manipulates the file is the same on the P2 and Argon.

Hi
Here is an update!
I took 3 argons that weren't connecting/having the credential issue and updated them to 6.2.1, One connected ok aftwerwards, one connected but after a reboot it had keys synchronization problem (fast blinking cyan), and the other did not connect at all.
I did not rewrite the credentials.
Any thoughts?