PublishQueueAsyncRK - incorrect queue count after spark/status offline for system/status

Using workbench/OS 2.0.1 and local compile/flash. In testing a new product I have been struggling with delayed publishing of events queued using PublishQueueAsyncRK 0.2.0 and my own version of the FRAM library designed to support SPI FRAM up to 4Mbit/512KB. Each event is tagged with the local time when created “date” in the JSON:

When there is a second or so time difference then all is well.
I have noticed that occasionally the time difference between creation and publish can stretch to 59 seconds. There is a device regular update event sent every minute. The creation of one appears to release the previous queued event - rather like the queue count has been zero’d when there is actually 1 in queue. A restart will clear the problem but will also mean the last event queued is not sent. I have been struggling to identify the cause/scenario.
I have been investigating an issue with SPI bus interference introduced with a board change, so have assumed this was corrupting the writing to FRAM and causing the queue count to lose sync. Then last night I noticed a spark/status offline followed by spark/status online and system/status - the regular cloud session update? After the diagnostics/update event the events were delayed by 58 seconds. @rickkas7 Any ideas what could be happening?

I created a new version of the library that has trace debugging statements in the code that manipulates the queue. If you can make it happen with logging captured, that will help determine what is failing.

0.2.2 (2021-05-10)

  • Added trace messages to event addition and removal with retained memory
  • Fixed several problems with clearEvents
  • Added automated test suite

I ported the automated test from PublishQueuePosixRK into this library. There’s device firmware that does simple tests like publishing, going online, offline, resetting, etc. It’s controlled over USB serial.

There’s a test runner written in node.js that chains together simpler tests to make complex tests.

The test runner monitors and sends commands to the USB serial port of the device. It also monitors the SSE event stream for the device to see what events reach the cloud. It also implements the electron-cloud-manipulator so it can discard packets sent from the device without touching the user firmware or Device OS.

I had to fix several bugs in the PublishQueueAsyncRK library to get the test suite to pass. Most were in clearEvents, so that probably won’t affect most code. However, there could be a few corner cases that could possible cause a similar thing to happen.

1 Like

@rickkas7 Thanks for the swift reply - I will try this new version out. I had originally run logging and used a modified version of your example sketch to test under various scenarios.

@rickkas7 I have installed 0.2.2 and added support for MB85RS (SPI FRAM) all seemed to go well in my testing with logging until the problem in the log below:

Starting just after the highlighted event the time between event creation “date” and publish at starts to increase and then there is a corruption of the message LKUS, in fact the LKUS event is sent but appears to not have been deleted correctly and then get partially sent twice. After this the queue appears out of sync and the gap between created at and published at extends to 60 seconds.

Looking at the log it appears this happened after more than 1 event (3 events) were queued?
I can’t see anything in the code to suggest what could be happening under this condition.

Any ideas please.
I enabled a bit more logging in the ‘discardOldEvent’ method. This appears to show that the when ‘after numEvents=0 nextFree=312’ shouldn’t nextFree be = 8?

[2nd Update]
Noticed this problem immediately on start-up, completely missed the data in the DEUP event. Appears to recover thereafter.

It does look like numEvents is out of sync with nextFree, but I can’t figure out why. I added some more debugging trace statements to the FRAM code to help figure out what’s happening.

0.2.3 (2021-05-18)

  • Added trace messages for debugging FRAM implementation
  • Optimized the way entries are read out of FRAM to not read way past the end of the actual data
  • Added automated test for FRAM

Hey Rick, thanks for the quick reply. There is a possibility that the access to the FRAM is the reason - something on the SPI bus. The PCB is going through a spin to optimise the SPI bus. I though initially it was due to discardOldEvent but then the read for send went wrong. I will try the new changes / extra debug and see if that solves or uncovers the problem.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.