PublishQueueAsyncRK publish hanging

cbrake · November 15, 2019, 3:28am

Hi, I’m routinely seeing a PublishQueueAsyncRK publish() function hang (block?) on Argon and Boron systems. Here is the code:

github.com

simpleiot/firmware/blob/master/siot-fw/src/siot-fw.ino#L133


            OneWireErrorString(ret));
    } else if (!ret) {
        Serial.printf("sample: %s\n", sample.string().c_str());
        if (publish) {
            jw.startArray();
            jw.startObject();
            sample.toJSON(&jw);
            jw.finishObjectOrArray();
            jw.finishObjectOrArray();
            Serial.printf("publishing %s\n", jw.getBuffer());
            publishQueue.publish("sample", jw.getBuffer(), PRIVATE, WITH_ACK);
        }
    }


    if (i >= 100) {
        Serial.println("Warning, read loop is not terminating properly");
        break;
    }
}


if (BLE.connected()) {

I suspect I have memory leak somewhere, but thought I would check if there is anything obvious I’m doing wrong with how the library is used.

How do we know if we’re out of memory or other critical situations?

Thanks!

RWB · November 15, 2019, 3:48am

@rickkas7

ScruffR · November 15, 2019, 6:14am

If you suspect a memory leak you should be able to see that via System.freeMemory().
But if it's not a real leak but heap fragmentation you may not see it that way.

armor · November 15, 2019, 9:38am

I think that this is the problem for you with cellular comms.

I have been using this library and observed 2 things with serialloghandler switched on -

It fails silently if the buffer overflows (publishes are lost). The solution in my case was to increase the buffer size and reduce some of the message data payloads. A better solution would be to significantly increase the buffer size by using FRAM say.
Removing the , WITH_ACK has seemed to cure the issue I was seeing with WITH_ACK but that was with an Xenon in an ethernet featherwing. [Edit: Removing WITH_ACK seemed to cure the problem but in time it came back - so not the cause and as seen below it was a problem with the library]

cbrake · November 15, 2019, 1:59pm

Is there any way to know when you have heap fragmentation issues? I’m never quite sure how much dynamic memory allocation I can get away with. With systems that are dynamic (say a variable number of sensors may be connected), it is hard to do without dynamic allocations.

(I’ve been programming with Go on Embedded Linux for the last 3 years, so a bit spoiled.)

ScruffR · November 15, 2019, 2:07pm

Heap fragmentation may become an issue when you allocate and free space a lot.
If you only allocate and keep hold of the objects stored therein this shouldn’t be too much of an issue.

It’s also “mutating” objects (e.g. String) that may cause unintentional creation or temporary objects and relocation of their payload resuliting in a heap looking like Swiss Cheese.

rickkas7 · November 15, 2019, 3:26pm

It’s a bug in version 0.1.0 of the library. I released a new version:

0.1.1 (2019-11-15)

Fixed a bug that causes thread deadlock when the publish queue is full.

cbrake · November 15, 2019, 4:53pm

thanks! – giving it a try here …

cbrake · November 16, 2019, 11:57am

19h without deadlock – looking good …

armor · November 16, 2019, 4:03pm

Rick, would you mind confirming what you have changed - I can see on github you have changed the .cpp version but on VSC Workbench I am not sure it has pulled in the correct copy.

Secondary question, looking at the log output I have been getting quite a few retained RAM full messages which I can’t afford, and hanging around in the loop() to wait for queue space isn’t an option. I have a 1M I2C FRAM, which should cure this space problem - is this fast enough as the buffer memory? I noticed a full initialisation of the 1Mbits can take 15 seconds.

ScruffR · November 16, 2019, 4:38pm

0.1.1 is online in the public library store (as seen in Web IDE)

In order to use it you may need to remove the previous version and install the new version.
Workbench (just like the other IDEs) don’t automatically update libraries and that is good so in case of a “broken” update or an application relying on the presence of some “bug”.

You can check the version by looking into the library.properties file of that lib and/or the dependency entry in your project.properties file.

armor · November 16, 2019, 4:52pm

You can check the version by looking into the library.properties file of that lib and/or the dependency entry in your project.properties file.

I am familiar with this but wasn't sure that changing the version in the project.properties actually pulls in that version - there is no change log or tagging in the .cpp or .h files. Did not see a log item describing pulling it in and the SRC does not appear to have changed.

ScruffR · November 16, 2019, 5:14pm

I only mentioned these for checking the currently used version. Changing that in project.properties will only affect cloud builds

Hence I said this

That would affect local and cloud builds.

armor · November 16, 2019, 8:46pm

And indeed a local build did not update the files so I had to do so manually.

rickkas7 · November 17, 2019, 10:46am

It’s hard to say for sure, but the FRAM should be fast enough for normal use in storing events. It only writes 8 bytes + length of event name + length of event data + a few more bytes of overhead and padding. Things might slow down if you have a very large event queue that has fallen behind in sending because of the way the queue is maintained in FRAM. The queue is rewritten on remove, which is different than how it works for flash.

Still, if you’re only going for say 10 Kbytes of stored events it should be more than fast enough.

Also, it’s not necessary to erase the FRAM before passing it to PublishQueueAsync.

armor · November 17, 2019, 3:18pm

That was just to illustrate the speed of writing!

armor · November 17, 2019, 11:50pm

Rick, I have just tried with your MB85RC256V fram library and with PublishQueueAsync (V0.1.1).

Previously I had tried with a MB85RC1MT fram and had modified the PublishQueueAsync to handle this and could not get my application to work whereas with retained RAM it is fine (apart from occasionally filling the queue). With both frams I get the same error, the first event published is queued and published and the log says successful but then the same message is tried again and again and in the meantime I can see other publishes being queued. The first message data appears to be truncated in the log and on the console. Is it possible there is something going on with this fram in an ethernet featherwing? You mentioned the delete process was different with fram?

rickkas7 · November 18, 2019, 5:49pm

Sorry, the bug I fixed in 0.1.1 for thread deadlock also affected FRAM, but in a different way.

While I was at it, I added support for the MB85RC1M and other sizes of similar FRAM.

0.1.2 (2019-11-18)

Fixed a bug that causes thread deadlock when using FRAM
Fixed a bug that can cause corrupted event data when FRAM is full
Upgraded to MB85RC256V version 0.0.4 for FRAM example (adds support for MB85RC64, MB85RC512, and MB85RC1M)

armor · November 18, 2019, 6:20pm

Cool - I will try it out now.

By the way - could you help with some of your expertise on this topic? Thanks in advance. Mesh pub-sub at the gateway

I have tried both PublishQueueAsyncRK 0.1.2 with and without FRAM - retesting…SOS 10 flashes … but forgot that the Retained RAM had filled with messages - perhaps there needs to be a method to reset the message queue?

Appears to be working fine with 0.1.2 Retained RAM and with FRAM (256V only tried). I am doing a fram.erase() at startup to avoid debris causing startup issues.

system · January 17, 2020, 6:20pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Adapting PublishQueueAsyncRK for Flash Memory General electron	10	1060	November 22, 2019
PublishQueue Library bug Firmware	4	609	June 7, 2018
PublishQueueAsyncRK and P2 Libraries	15	91	March 21, 2025
PublishQueueAsyncRK issue after a restart Libraries photon	3	533	March 29, 2021
PublishQueueAsyncRK Hanging Libraries	4	383	June 3, 2022

PublishQueueAsyncRK publish hanging

0.1.1 (2019-11-15)

0.1.2 (2019-11-18)

Related topics