Why is PublishQueueAsyncRK not working with SPI Flash chip/Boron?

Dear @rickkas7 ,

I have a Winbond W25Q128JVSIQ correctly connected to Boron (CS=D2).

When I run the following test code of PublishQueueAsyncRK using the retained SRAM (so not using “USE_SPI_FLASH” below), it works perfectly. I can call store() from the cloud, which stores 7 packets in retained, disconnects from cloud, and then when I press Reset and it reboots, it correctly re-uploads the previously trapped data packets.

When I run the same exact code using the SPI Flash, this does not happen. However, after such a store() + Reset, gn() then returns “7”. So it is writing the packets to the SPI Flash chip, but not reading them and reuploading them later when it can.

Why is this? There is no question the W25Q128JVSIQ is correctly connected because the file system mount succeeds (res=0), and the getNumEvents() indicates that the trapped packets are getting queued/written to/remembered into the SPI Flash chip - but never retrieved and reuploaded therefrom.

This is disappointing since I went through the hassle, wait, and expense of doing my own PCB design with surface mount components assembled including the memory chip, and if this library doesn’t work I would need to go back to my SD Card design.

Thanks for help.

#include <SpiffsParticleRK.h>
#include <PublishQueueAsyncRK.h>
SYSTEM_MODE(SEMI_AUTOMATIC);
SpiFlashWinbond spiFlash(SPI, D2);
SpiffsParticle fs(spiFlash);

#define USE_SPI_FLASH
#ifdef USE_SPI_FLASH
PublishQueueAsyncSpiffs publishQueue(fs, “events”);
#else
retained uint8_t publishQueueRetainedBuffer[2048];
PublishQueueAsync publishQueue(publishQueueRetainedBuffer, sizeof(publishQueueRetainedBuffer));
#endif

int store(String command) {
Particle.disconnect();
Cellular.off();
delay(2000);
publishQueue.publish(“M1”, “test”, PRIVATE, WITH_ACK);
publishQueue.publish(“M2”, “test”, PRIVATE, WITH_ACK);
publishQueue.publish(“M3”, “test”, PRIVATE, WITH_ACK);
publishQueue.publish(“M4”, “test”, PRIVATE, WITH_ACK);
publishQueue.publish(“M5”, “test”, PRIVATE, WITH_ACK);
publishQueue.publish(“M6”, “test”, PRIVATE, WITH_ACK);
publishQueue.publish(“M7”, “test”, PRIVATE, WITH_ACK);
return 0;
}

int gn(String c) { return publishQueue.getNumEvents() ; }

int add(String c) { publishQueue.publish(“M1”, “test”, PRIVATE, WITH_ACK); return 0; }

void setup() {
Cellular.on();
Cellular.connect();
Particle.connect();

spiFlash.begin();
fs.withPhysicalSize(16 * 1024);
s32_t res = fs.mountAndFormatIfNecessary();
Particle.publish("Setup", String::format("mount res=%ld", res));
if (res == 0) {
	publishQueue.setup();
} else {
    Particle.publish("Setup", "Error:SPIFlashInit");
}

Particle.function("Store", store);  Particle.function("gn", gn); Particle.function("add", add);

}

void loop() { Particle.process(); }

Proof of correct connection to Boron in my PCB:
immagine

The PublishQueueAsyncRK is based on the SpiFlashRK library. To rule out that and the underlying hardware and connection, first run the unit tests in SpiFlashRK.

This will make sure the flash can be read and written to using the SpiFlashRK library. If that works it would narrow down the problem to most likely be a bug in the SpiFlash code in PublishQueueAsyncRK.

Thanks @rickkas7 I will do this and then return with results. Your library will be very useful to use with SPI Flash.

@rickkas7, here is the output of that test which points to a bug in PublishQueueAsynchRK:

0000191657 [app] INFO: jedecId=ef4018
0000191658 [app] INFO: starting chipErase
0000194978 [net.ppp.client] TRACE: TX: 16
0000195011 [net.ppp.client] TRACE: RX: 25
0000199979 [net.ppp.client] TRACE: TX: 16
0000200011 [net.ppp.client] TRACE: RX: 24
0000204980 [net.ppp.client] TRACE: TX: 16
0000205011 [net.ppp.client] TRACE: RX: 25
0000209981 [net.ppp.client] TRACE: TX: 16
0000210012 [net.ppp.client] TRACE: RX: 25
0000214982 [net.ppp.client] TRACE: TX: 16
0000215013 [net.ppp.client] TRACE: RX: 24
0000219983 [net.ppp.client] TRACE: TX: 16
0000220014 [net.ppp.client] TRACE: RX: 24
0000220899 [app] INFO: finished chipErase: 29241 ms
0000220899 [app] INFO: running tests…
0000220900 [app] INFO: starting writePage
0000220901 [app] INFO: finished writePage: 1 ms
0000220902 [app] INFO: starting writePage one byte at a time
0000220929 [app] INFO: finished writePage one byte at a time: 27 ms
0000220929 [app] INFO: starting readPage one byte at a time
0000220938 [app] INFO: finished readPage one byte at a time: 9 ms
0000220939 [app] INFO: starting write across page boundary
0000220941 [app] INFO: finished write across page boundary: 2 ms
0000220950 [app] INFO: starting write 1K
0000220953 [app] INFO: finished write 1K: 3 ms
0000220954 [app] INFO: starting read 1K
0000220957 [app] INFO: finished read 1K: 3 ms
0000220959 [app] INFO: starting write 256K
0000221771 [app] INFO: finished write 256K: 812 ms
0000221771 [app] INFO: starting read 256K
0000222225 [app] INFO: finished read 256K: 454 ms
0000222226 [app] INFO: starting sectorErase
0000222253 [app] INFO: finished sectorErase: 27 ms
0000222677 [app] INFO: test complete!

It seems therefore that it can access the Winbond flash chip just fine. PublishQueueAsynchRK seems to be writing to the chip, but does not thereafter reupload from it once the connection is restored.

It looks like your example code is missing

SYSTEM_THREAD(ENABLED);

In the PublishQueueAsyncRK library, in more-examples, is SpiffsExample. Running this on a Boron with a Winbond SPI flash works as-is using the test command:

particle call boron4 test "3,4"

(3 = test publishing while offline, 4 = number of event to test)

If I comment out the call to enable threading, it will fail with:

0000079908 [comm.protocol] ERROR: Event loop error 18
0000079908 [system] WARN: Communication loop error, closing cloud socket
0000079909 [system] INFO: Cloud: disconnecting
0000079910 [system] INFO: Cloud: disconnected
0000079910 [system] INFO: Cloud: connecting
0000079910 [app.pubq] INFO: published failed, will retry in 30000 ms

I presume this is because you can’t publish from a background worker thread with threading disabled.

If it’s just missing from what you copied and pasted above, then trying the SpiffsExample is the next step to debugging.

Rick, you are correct that my sketch is omitting SYSTEM_THREAD(ENABLED). I was trying to move away from it in an attempt to improve stability. My seemingly error-free code still does occasionally hard fault/panic/red SOS for no valid reason, but with your library this shouldn’t be an issue because I am uploading minutely packets and the recovery does seem to happen within the minute.

I will try with SYSTEM_THREAD(ENABLED); and report results.

I note that the other methods I’ve tested with PublishQueueAsynchRK do work without threading enabled.
Also there is no note in the docs that system threading enabled is required for usage of this library, so hopefully this post/experience is useful to others.

I will try this. Thanks Rick.

@rickkas7, this unfortunately did not work. The addition of SYSTEM_THREAD(ENABLED); did not make PublishQueueAsyncRK reupload cached messages. The code I posted above still fails to reupload the stored (in store() ) “M1” - “M7” packets after power cycle, even though they are successfully written to the SPI Flash chip because gn() returns 7 after power cycle.

The output of the SpiffsExample is as follows:

0000131827 [app] INFO: TEST_PUBLISH_OFFLINE count=4
0000131828 [app] INFO: Going to Particle.disconnect()…
0000133829 [app] INFO: before publishing numEvents=0
0000133830 [app] INFO: publishing padded counter=0 size=0
0000133835 [app] INFO: publishing padded counter=1 size=0
0000133841 [app] INFO: publishing padded counter=2 size=0
0000133847 [app] INFO: publishing padded counter=3 size=0
0000133852 [app] INFO: after publishing numEvents=4
0000133852 [app] INFO: Going to Particle.connect()…
0000135087 [app.pubq] INFO: publishing testEvent 00000 ttl=60 flags=9
0000135390 [app.pubq] INFO: published successfully
0000136407 [app.pubq] INFO: publishing testEvent 00001 ttl=60 flags=9
0000137868 [app.pubq] INFO: published successfully
0000138886 [app.pubq] INFO: publishing testEvent 00002 ttl=60 flags=9
0000139194 [app.pubq] INFO: published successfully
0000140211 [app.pubq] INFO: publishing testEvent 00003 ttl=60 flags=9
0000140513 [app.pubq] INFO: published successfully

I found the bug.

0.2.0 (2020-11-06)

  • Fixed a bug in all file-based implementations (Spiffs, SdFat) where events were not published after a reboot.
  • Added a new test suite function (7) to disconnect, post events to the queue, then reboot.
  • Added support for storing events on the POSIX file system on Gen 3 devices (Argon, Boron, Tracker SoM) in 2.0.0-rc.3 and later.
2 Likes

Thanks @rickkas7! I will give this a spin.

@rickkas7 Confirmed working with the reboot test with the new version of the library. Thanks Rick.

1 Like