Ledger not syncing on a limited number of devices

Hi,

We've flashed around 10 Borons with a set of code running on 6.1.0. This code has a ledger implementation meant to receive a cloud-to-device ledger update on occasion which updates various configuration settings (a long list, the ledger is 1.2k). Out of this group 2/10 of the devices don't seem to ever receive a ledger update though. They are both registered along with all other devices to receive the ledger, and we alter a parameter or more in the ledger and can see the cloud update the "Last Updated On" timestamp on the ledger, but the registered callback function in firmware never fires (firmware is identical on all 10 units):

deviceConfig = Particle.ledger("test-conf1");
...

// Register onSync callback
deviceConfig.onSync(&LedgerLoader::onLedgerSync);

...

void LedgerLoader::onLedgerSync(particle::Ledger ledger, void* context) {
    Serial.println("Ledger sync callback triggered.");
    LedgerLoader::instance().load();
    LedgerLoader::instance().hasSynced = true;
    Particle.publish("ledgerSync", "{\"e\":" + String::format("%x", LedgerLoader::instance().hasSynced) + "}",60, PRIVATE, WITH_ACK);
}

The other 8/10 devices all seem to work fine, receive the ledger, update parameters and function as expected. On the 2 failing devices nothing inside onLedgerSync() seems to fire, which I believe means something is broken between saving the ledger in the cloud and the callback firing.

A couple of questions:

  1. Can I force a ledger sync from the device in any manner to try to ensure the device is actively receiving the ledger?
  2. Can I use any debug indication between the ledger update timestamp verification, and the callback verification to trace the failure to receive?
  3. Is there any device parameter other than the firmware version, ledger registration, ledger content, of device OS that I should check that may be out of sorts on these 2 misbehaving units?

Another anecdotal note, both were previously loaded with other firmware which was replaced with the ledger enabled code. We ran a particle doctor wipe on the main test unit and that didn't seem to fix the issue. ...At the end of my list of ideas now on how to isolate the failure point in the ledger sync communication. Any ideas?

If you can attach a laptop to one of the affected machines by USB, there are some ledger log messages that may be helpful.

  • Use SerialLogHandler logHandler(LOG_LEVEL_TRACE)
  • If you are using SYSTEM_MODE(AUTOMATIC) switch it to SEMI_AUTOMATIC
  • At the beginning of setup(), waitFor(Serial.isConnected, 10000);
  • Then Particle.connect() if you switched from AUTOMATIC mode

Also, have the affected devices ever moved between sandbox and org, or between products using product ledgers, with the same name? There might be an edge case where it can keep looking at the old ledger with the same name.

I'll set up the trace log when I have a chance.

The devices were part of a recent migration from a personal sandbox into an org account product, although I don't think the previous product had the same name. I'll check with the other team members who may have more info here. If the edge case is in play can we do anything to fix it directly? Maybe change the name of the ledger we're using?

There may be other ways, but I do know that changing the name definitely works.

I tried changing the ledgers to reference the following (I'm using "prefix" to replace a name but the form is the same):
// Bind the named ledgers
deviceConfig = Particle.ledger("prefix-conf1");
statusLedger = Particle.ledger("prefix-state1");

I have the following logs now:

0000034131 [comm.coap] TRACE: Sending CoAP message

0000034131 [comm.coap] TRACE: CON POST /L size=87 token=62 id=282

0000034133 [comm.coap] TRACE: Received CoAP message

0000034133 [comm.coap] TRACE: CON POST /E/particle/device/updates/pending size=47 token=01 id=28912

0000034134 [comm.coap] TRACE: Sending CoAP message

0000034135 [comm.coap] TRACE: ACK 0.00 size=4 token= id=28912

0000034772 [comm.coap] TRACE: Received CoAP message

0000034772 [comm.coap] TRACE: ACK 2.04 size=59 token=62 id=282

0000034782 [comm.coap] TRACE: Sending CoAP message

0000034783 [comm.coap] TRACE: CON POST /L size=785 token=63 id=283

0000035125 [comm.coap] TRACE: Received CoAP message

0000035125 [comm.coap] TRACE: ACK 4.00 size=8 token=63 id=283

0000035126 [system.ledger] WARN: Ledger request failed: 4.00

0000035127 [system.ledger] ERROR: Failed to sync ledger: prefix-test-state1; result: 5

0000035127 [system.ledger] ERROR: Error while handling response: -2010

0000035128 [system.ledger] ERROR: Synchronization failed: -2010; retrying in 30s

0000044130 [app] WARN: Ledger not synced yet. Skipping load() to protect EEPROM.

Note that I have replace the actual ledger name with "prefix-test-state1" in the log above to remove the name, as mentioned. This follows the old ledger format which used the following:

// Bind the named ledgers
deviceConfig = Particle.ledger("prefix-test-conf1");
statusLedger = Particle.ledger("prefix-test-state1");

The test ledgers are still present and available in the product through the online console. The new ledger ("prefix-conf1") still doesn't sync and I don't see it referenced in the log. I can't explaing why "prefix-test-state1" still shows up because that no longer exists in our code. We are basically intending to have the conf1 as a cloud-to-device ledger provide configuration data, and then mirror this back to the state1 ledgers as a device-to-cloud in order to confirm the ledger data sync has the up-to-date data. Any next steps on troubleshooting? Can I manually wipe or flush the ledger? We tried even particle doctor to flush any stale data in the device that could be getting us stuck.

That log is very helpful, thank you. I don't know the cause yet, but -2010 is "Ledger request failed" so it's attempting to synchronize the ledger, but an error occurs while reading from the cloud, which is step toward figuring out what is actually failing.

Thank you, let me know if I can do anything else, including altering the device-to-cloud on our side, otherwise I will wait to see if anything can be cleared out on Particle's side. This device is on the bench by the way so a full factory reset/reinitialization is fine and I can also delete or remove it's ledgers without customer impact at the moment.

I ran the following code after some additional help:

// PROTOTYPE
static int removeAll();

// EXAMPLE
#include "Particle.h"

SYSTEM_MODE(SEMI_AUTOMATIC);

SYSTEM_THREAD(ENABLED);

SerialLogHandler logHandler(LOG_LEVEL_INFO);

void setup() {
// Remove any ledger data from the device.
// The device must not be connected to the Cloud. The operation will fail if any of the ledgers are in use.
Ledger::removeAll();
}

void loop() {
Serial.println("Ledger::removeAll completed");
}

Following this I flashed in the operational code again. When the device booted it encountered an issue connection to cellular, which may have been the MNO not the device. I ran a Particle Doctor anyway. After that I again flashed the operational code. Upon booting this last time, the unit received the intended Ledger data and has recovered. Thanks for your help @rickkas7 the log tracing certainly shed some light on the issue, and then some added input from support brought us to the complete resolution.

2 Likes

I'm glad you got it working! Using removeAll() was going to be my next suggestion; I'm glad you were able to figure that out and it solved the problem.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.