StorageHelperRK - Inconsistent Results Loading Data

I am getting close to fielding my LoRA Particle Gateway but I have run into an issue of the configuration and state data not being consistently loaded from FRAM at reset.

I currently have three storage classes - sysStatus (system configuration), current (current data from the last node) and nodeID (a JSON object that stores node configuration).

I shared the approach for putting these in classes here:

But, you can see the latest (large - sorry) release here:

My plan is to document and share this project with the community once I get this issue figured out. This is part of a long running project with @jgskarda and others to realized the best of both worlds with Particle and LoRA.

OK, here is the issue. About 50% of the time, the persistent data fails to load and I need to reset to factory defaults. I don’t know why, upon reset, the three objects - sysStatus, current and nodeID fail to load and 50% of the time they do. Also, before you ask, it is not “every other” but sometimes it loads correctly two or three times in a row and others it will fail consistently for a while. Since it works sometimes, I have to assume the code is basically correct.

My current theory is that there is i2c bus contention between the FRAM, the AB1805 clock / watchdog and the Boron’s use of that bus.

Any ideas / suggestions for mitigation?

Thanks

Chip

1 Like

Huh… Sorry to hear you are having troubles. Hopefully Rick or someone else can help work through it. I’m not as familiar with the StorageHelperRK as I just write the entire configuration JSON to the flash file system instead of using StorageHelperRK. This works since my config data is fairly static. That said I was wondering:

  • Anything interesting in the serial log that indicates some sort of failure in loading data? Generally speaking it seems most of Rick’s library still has debug logs and would possibly provide meaningful info there upon any errors.

  • Am I understanding correctly that if the data is not loaded properly, if you reset the same device again it will never load properly again until you perform a “save” or does it behave more like the file/data is corrupted and will never read again until you save again? Not sure if what I’m saying even makes sense with that library.

  • Is there an easy way to switch the storage mechanism with that library? I believe you are using FRAM. As a test, could you use the flash file system instead of FRAM? This isn’t a solution but may help isolate the problem.

The only unusual behavior I’ve seen with similar hardware is I had to add a 1 second delay at the very beginning of setup() otherwise the device would SOS. I can’t say for certain, but I think the red SOS occurred when calling setup on the AB1805. I spent some time troubleshooting it, but just adding the 1 second fixed it so I didn’t dive any deeper. I also can’t recall with 100% certainty if it was the AB1805.setup() or one of the other setup() from other hardware libraries/singleton classes. It’s been awhile since I had that issue and didn’t take good notes. Does adding a delay at the very first line in setup() change the behavior at all? Maybe things need to “power on” fully before it can respond to an I2C read command?

2 Likes

Are you using any I2C device from anything other than the loop thread? This includes worker threads or software timers.

The MB85RC256V-FRAM-RK library does not lock the I2C bus before accessing it. Actually, most I2C libraries don’t lock the I2C bus. If you have multi-threaded access to I2C that would definitely cause random failure to read data from the FRAM. It could also cause data to be read or written to the wrong location, causing data corruption.

@rickkas7, thank you for your reply.

I do not use threading in my code but I have enabled the system thread.

On the i2c bus, in addition to the Boron, I have the AB1805 and this FRAM nothing else.

Is there a way to know if there are reads or writes pending? My timing is not so tight so I could give those operations time to complete. I guess I could also try @jgskarda 's suggestion to add delays - this is easy in setup but I try to avoid delays in my main code.

Any suggestions?

Thanks,

Chip

1 Like

I created a new version of StorageHelperRK (0.0.3):

  • Added a new example for data validation and initialization (07-validate).
  • Added a new withLogData(true) option to log the data after reading and saving.
  • Added a new method to update the hash. This is normally done automatically, but the method is useful in special cases.

The first item will simplify your code for loading and initializing your data. There are two virtual methods that are designed for that purpose, and example 07-validate shows how to use them from your code.

I added a withLogData(true) that enables logging of the actual data that is read or saved, which may help isolate where the problem is.

3 Likes

@rickkas7 ,

Thank you very much for your help on this. I am working on implementing the new validate and initialize values. I am getting a bigger value for the storage object than I expected. Here is what I have now:

	class SysData {
	public:
		// This structure must always begin with the header (16 bytes)
		StorageHelperRK::PersistentDataBase::SavedDataHeader sysHeader;
		// Your fields go here. Once you've added a field you cannot add fields
		// (except at the end), insert fields, remove fields, change size of a field.
		// Doing so will cause the data to be corrupted!
		// Size of ssyStatus = 30 bytes + 16 for the header for 46 bytes total
		uint8_t nodeNumber;                               // Assigned by the gateway on joining the network
		uint8_t structuresVersion;                        // Version of the data structures (system and data)
		uint16_t magicNumber;							  // A way to identify nodes and gateways so they can trust each other
		uint8_t stayConnected;                          // Version of the device firmware (integer - aligned to particle prodict firmware)
		uint8_t resetCount;                               // reset count of device (0-256)
		uint8_t messageCount;							  // This is how many messages the Gateay has composed for the day
		time_t lastHookResponse;                   		  // Last time we got a valid Webhook response
		time_t lastConnection;                     		  // Last time we successfully connected to Particle
		uint16_t lastConnectionDuration;                  // How long - in seconds - did it take to last connect to the Particle cloud
		uint16_t frequencyMinutes;                        // When we are reporing at minute increments - what are they - for Gateways
		uint16_t updatedFrequencyMinutes;				  // When we update the reporting frequency, it is stored here
		uint8_t alertCodeGateway;                         // Alert code for Gateway Alerts
		time_t alertTimestampGateway;              		  // When was the last alert
		uint8_t openTime;                                 // Open time 24 hours
		uint8_t closeTime;                                // Close time 24 hours
		bool verizonSIM;                                  // Are we using a Verizon SIM?
		uint8_t sensorType;								  // What sensor if any is on this device (0-none, 1-PIR, 2-Pressure, ...)
	};
	SysData sysData;

When I run the code (using the new .withLogData(true)) it seems that sysData is 64 bytes not the 46 I expected. Can you please tell me where I am going wrong in my counting?

Also, since I am creating my own storage objects / classes, I would use the following in my main program’s setup():

    sysStatus
        .withLogData(true)
        .withSaveDelayMs(500)
        .load();

not sysStatus.setup() - correct?

Thanks,

Chip

The reason for the size discrepancy is alignment on ARM CPUs. The rules are:

  • Any 4-byte variable (int, long, uint32_t, int32_t, float, time32_t, …) must be 4-byte aligned (address ending in 0x0, 0x4, 0x8, 0xC).
  • Any 8-byte variable (double, long long, uint64_t, time_t, …) also must be 4-byte aligned.
  • Any 2-byte variable (short, uint16_t, int16_t) must be 2-byte aligned (address ending in 0x0, 0x2, 0x4, 0x6, …)
  • Any 1-byte variable (char, unsigned char, uint8_t, int8_t, bool, …) can be at any address.

The compiler automatically makes structures that comply with the rules. Some examples:


		uint8_t nodeNumber;                               // Assigned by the gateway on joining the network
		uint8_t structuresVersion;                        // Version of the data structures (system and data)
		uint16_t magicNumber;							  // A way to identify nodes and gateways so they can trust each other

This actually is 4 bytes because the uint16_t happens to be 2-byte aligned.


		uint8_t stayConnected;                          // Version of the device firmware (integer - aligned to particle prodict firmware)
		uint8_t resetCount;                               // reset count of device (0-256)
		uint8_t messageCount;							  // This is how many messages the Gateay has composed for the day
		// <-- there's a filler uint8_t here
		time_t lastHookResponse;

This part, however, is 12 bytes, not 7. The reason is that lastHookResponse must be 4-byte aligned, so there’s an extra filler byte inserted.

The second issue is that time_t is 64 bits (8 bytes) long. This is because the C++ standard library (newlib) switched to this a few years ago to eliminate the problem with the Unix clock rolling over to 0 in 2038.

This caused some issues, however, as parts of Device OS still depend on it being 32 bits long. If you look at the Device OS calls, they return a time32_t, not a time_t, because they use the older 32-bit time format.

The only other caveat is that structures are always rounded up to 4-byte alignment.

This is all of the offsets:

0000074628 [app] INFO: 16 nodeNumber
0000074629 [app] INFO: 17 structuresVersion
0000074629 [app] INFO: 18 magicNumber
0000074629 [app] INFO: 20 stayConnected
0000074629 [app] INFO: 21 resetCount
0000074630 [app] INFO: 22 messageCount
0000074630 [app] INFO: 24 lastHookResponse
0000074631 [app] INFO: 32 lastConnection
0000074631 [app] INFO: 40 lastConnectionDuration
0000074631 [app] INFO: 42 frequencyMinutes
0000074632 [app] INFO: 44 updatedFrequencyMinutes
0000074632 [app] INFO: 46 alertCodeGateway
0000074632 [app] INFO: 48 alertTimestampGateway
0000074633 [app] INFO: 56 openTime
0000074633 [app] INFO: 57 closeTime
0000074633 [app] INFO: 58 verizonSIM
0000074634 [app] INFO: 59 sensorType
0000074634 [app] INFO: sizeof(SysData): 64

And how I generated them:

    Log.info("%2u nodeNumber", offsetof(SysData, nodeNumber));
    Log.info("%2u structuresVersion", offsetof(SysData, structuresVersion));
    Log.info("%2u magicNumber", offsetof(SysData, magicNumber));
    Log.info("%2u stayConnected", offsetof(SysData, stayConnected));
    Log.info("%2u resetCount", offsetof(SysData, resetCount));
    Log.info("%2u messageCount", offsetof(SysData, messageCount));
    Log.info("%2u lastHookResponse", offsetof(SysData, lastHookResponse));
    Log.info("%2u lastConnection", offsetof(SysData, lastConnection));
    Log.info("%2u lastConnectionDuration", offsetof(SysData, lastConnectionDuration));
    Log.info("%2u frequencyMinutes", offsetof(SysData, frequencyMinutes));
    Log.info("%2u updatedFrequencyMinutes", offsetof(SysData, updatedFrequencyMinutes));
    Log.info("%2u alertCodeGateway", offsetof(SysData, alertCodeGateway));
    Log.info("%2u alertTimestampGateway", offsetof(SysData, alertTimestampGateway));
    Log.info("%2u openTime", offsetof(SysData, openTime));
    Log.info("%2u closeTime", offsetof(SysData, closeTime));
    Log.info("%2u verizonSIM", offsetof(SysData, verizonSIM));
    Log.info("%2u sensorType", offsetof(SysData, sensorType));
    Log.info("sizeof(SysData): %u", sizeof(SysData));

Correct on calling sysStatus that way.

@rickkas7 ,

Thank you for your comprehensive and illustrative response. I now see why my estimates of the size of each object and the required off-sets were too small. I think this was one of my key issues.

Another issue I had was on the frequency of flushing the data. I have moved to a model where I will flush the nodeID and current data on demand rather than polling in the main look so I can be sure that I catch changes as they occur during the few seconds that the gateway and nodes are awake each hour.

If I may, I have one more area that is nagging me and that is where the gateway stores information on the nodes. Most of the gateway’s interaction with the nodes is transactional - the node sends data, the gateway sends a response and the data from the node is captured in a webhook stored in the queue for the next connect time. However, there is a need for the gateway to keep track of some node data over time:

  • Node number for each node (this is for LoRA and is not too important for reporting)
  • deviceID mapped to each node (used in reporting to the back-end)
  • LastConnect time - helps the gateway tell if there may be an issue in communications and can trigger resetting of the LoRA radio or the Gateway itself.
  • Success rate, or what percent according to the nodes, do their data reports get through
  • Pending alerts that the gateway will send on the next interaction with the node (such as updating the sensor type)

I am storing this in a JSON object and saving this big (1024 bytes) object in the LoRA_Functions class using StorageHelperRK. I have accessor functions that use JSONParserGeneratorRK to access data to do things like get a node number given a deviceID, store connection times and test for node connection health.

Here is the question: In this model, I need to store the big JSON object every time there is a change and while this is not that often, the object is large and the reporting window can get busy. I am trying to create objects in the scope of a function - but is this is the right approach here?

Some specifics:

LoRA_Function Class header:

// JSON for node data
JsonParserStatic<1024, 50> jp;						// Make this global - reduce possibility of fragmentation

Then, in my LoRA_Functions::instance().setup() function, I load the JSON object and parse it:

...
	// Here is where we load the JSON object from memory and parse
	jp.addString(nodeID.get_nodeIDJson());				// Read in the JSON string from memory

	if (jp.parse()) Log.info("Parsed Successfully");
	else {
		nodeID.resetNodeIDs();
		Log.info("Parsing error resetting nodeID database");
	}
...

Finally, when I want to access the data, I use functions that create the node array and object container for the structure and the key value pairs that I can then examine and update, here is one such function again in the LoRA_Functions class:

bool LoRA_Functions::changeAlert(int nodeNumber, int newAlert) {
	int currentAlert;

	if (nodeNumber > 10) return false;										// Function only for configured nodes

	const JsonParserGeneratorRK::jsmntok_t *nodesArrayContainer;			// Token for the outer array
	jp.getValueTokenByKey(jp.getOuterObject(), "nodes", nodesArrayContainer);
	const JsonParserGeneratorRK::jsmntok_t *nodeObjectContainer;			// Token for the objects in the array

	nodeObjectContainer = jp.getTokenByIndex(nodesArrayContainer, nodeNumber-1);	// find the entry for the node of interest
	if(nodeObjectContainer == NULL) return false;							// Ran out of entries - node number entry not found triggers alert

	jp.getValueByKey(nodeObjectContainer, "pend", currentAlert);			// Now we have the oject for the specific node
	Log.info("Changing pending alert from %d to %d", currentAlert, newAlert);

	const JsonParserGeneratorRK::jsmntok_t *value;							// Node we have the key value pair for the "pend"ing alerts	
	jp.getValueTokenByKey(nodeObjectContainer, "pend", value);

	JsonModifier mod(jp);													// Create a modifier object
	mod.startModify(value);													// Update the pending alert value for the selected node
	mod.insertValue((int)newAlert);
	mod.finish();

	nodeID.set_nodeIDJson(jp.getBuffer());									// This updates the JSON object but doe not commit to to persistent storage

	return true;
}

So, after calling this function, if I have this correct, the JSON object “jp” which is scoped to the LoRA_Functions class will be updated with new pending alert data. If I call another function in this class to, for example, to print the current node data, it will see the update I made here.

So, I am only flushing the data to FRAM once on each interaction with a node even if there are multiple calls to functions that query and update the nodeID data. Still, since this is just one huge object - the whole 1024 bytes gets backed up each time. It works but, is this the best way to do this?

As always, any advice appreciated.

Chip