PublishQueueAsyncRK publish hanging

Hi, I’m routinely seeing a PublishQueueAsyncRK publish() function hang (block?) on Argon and Boron systems. Here is the code:

I suspect I have memory leak somewhere, but thought I would check if there is anything obvious I’m doing wrong with how the library is used.

How do we know if we’re out of memory or other critical situations?

Thanks!

@rickkas7

If you suspect a memory leak you should be able to see that via System.freeMemory().
But if it's not a real leak but heap fragmentation you may not see it that way.

I think that this is the problem for you with cellular comms.

I have been using this library and observed 2 things with serialloghandler switched on -

  1. It fails silently if the buffer overflows (publishes are lost). The solution in my case was to increase the buffer size and reduce some of the message data payloads. A better solution would be to significantly increase the buffer size by using FRAM say.
  2. Removing the , WITH_ACK has seemed to cure the issue I was seeing with WITH_ACK but that was with an Xenon in an ethernet featherwing. [Edit: Removing WITH_ACK seemed to cure the problem but in time it came back - so not the cause and as seen below it was a problem with the library]

Is there any way to know when you have heap fragmentation issues? I’m never quite sure how much dynamic memory allocation I can get away with. With systems that are dynamic (say a variable number of sensors may be connected), it is hard to do without dynamic allocations.

(I’ve been programming with Go on Embedded Linux for the last 3 years, so a bit spoiled.)

Heap fragmentation may become an issue when you allocate and free space a lot.
If you only allocate and keep hold of the objects stored therein this shouldn’t be too much of an issue.

It’s also “mutating” objects (e.g. String) that may cause unintentional creation or temporary objects and relocation of their payload resuliting in a heap looking like Swiss Cheese.

It’s a bug in version 0.1.0 of the library. I released a new version:

0.1.1 (2019-11-15)

  • Fixed a bug that causes thread deadlock when the publish queue is full.
5 Likes

thanks! – giving it a try here …

19h without deadlock – looking good …

1 Like

Rick, would you mind confirming what you have changed - I can see on github you have changed the .cpp version but on VSC Workbench I am not sure it has pulled in the correct copy.

Secondary question, looking at the log output I have been getting quite a few retained RAM full messages which I can’t afford, and hanging around in the loop() to wait for queue space isn’t an option. I have a 1M I2C FRAM, which should cure this space problem - is this fast enough as the buffer memory? I noticed a full initialisation of the 1Mbits can take 15 seconds.

0.1.1 is online in the public library store (as seen in Web IDE)
image

In order to use it you may need to remove the previous version and install the new version.
Workbench (just like the other IDEs) don’t automatically update libraries and that is good so in case of a “broken” update or an application relying on the presence of some “bug”.

You can check the version by looking into the library.properties file of that lib and/or the dependency entry in your project.properties file.

You can check the version by looking into the library.properties file of that lib and/or the dependency entry in your project.properties file.

I am familiar with this but wasn't sure that changing the version in the project.properties actually pulls in that version - there is no change log or tagging in the .cpp or .h files. Did not see a log item describing pulling it in and the SRC does not appear to have changed.

I only mentioned these for checking the currently used version. Changing that in project.properties will only affect cloud builds

Hence I said this

That would affect local and cloud builds.

And indeed a local build did not update the files so I had to do so manually.

It’s hard to say for sure, but the FRAM should be fast enough for normal use in storing events. It only writes 8 bytes + length of event name + length of event data + a few more bytes of overhead and padding. Things might slow down if you have a very large event queue that has fallen behind in sending because of the way the queue is maintained in FRAM. The queue is rewritten on remove, which is different than how it works for flash.

Still, if you’re only going for say 10 Kbytes of stored events it should be more than fast enough.

Also, it’s not necessary to erase the FRAM before passing it to PublishQueueAsync.

1 Like

That was just to illustrate the speed of writing!

Rick, I have just tried with your MB85RC256V fram library and with PublishQueueAsync (V0.1.1).

Previously I had tried with a MB85RC1MT fram and had modified the PublishQueueAsync to handle this and could not get my application to work whereas with retained RAM it is fine (apart from occasionally filling the queue). With both frams I get the same error, the first event published is queued and published and the log says successful but then the same message is tried again and again and in the meantime I can see other publishes being queued. The first message data appears to be truncated in the log and on the console. Is it possible there is something going on with this fram in an ethernet featherwing? You mentioned the delete process was different with fram?

Sorry, the bug I fixed in 0.1.1 for thread deadlock also affected FRAM, but in a different way.

While I was at it, I added support for the MB85RC1M and other sizes of similar FRAM.

0.1.2 (2019-11-18)

  • Fixed a bug that causes thread deadlock when using FRAM
  • Fixed a bug that can cause corrupted event data when FRAM is full
  • Upgraded to MB85RC256V version 0.0.4 for FRAM example (adds support for MB85RC64, MB85RC512, and MB85RC1M)
2 Likes

Cool - I will try it out now.

By the way - could you help with some of your expertise on this topic? Thanks in advance. Mesh pub-sub at the gateway

I have tried both PublishQueueAsyncRK 0.1.2 with and without FRAM - retesting…SOS 10 flashes … but forgot that the Retained RAM had filled with messages - perhaps there needs to be a method to reset the message queue?

Appears to be working fine with 0.1.2 Retained RAM and with FRAM (256V only tried). I am doing a fram.erase() at startup to avoid debris causing startup issues.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.