Argon failing to receive subscribed messages

I’m about to describe something really strange, and I hope someone can help. Our Escape Rooms use Argons running 2.0.1 firmware. Recently, something very strange started happening. Some of the Argons would no longer receive messages. I can call their functions, and those respond correctly, but a published message is not received. Ok, that strange part number 1.

Strange part number 2 is that, once they get into this state, they only way to get them out is to power them off for 10 seconds and then power them back up. Rebooting will not clear the state. Reflashing will not clear the state!

Some notes on the environment. I mentioned that I am using 2.0.1. I use PublishQueueAsyncRK, but, I removed the “retained” operator from the message queue. At first I thought that I hadn’t, and the “retained” was the source of the problem (I still think is related), but when I check I had already removed it. I also use SYSTEM_THREAD(ENABLED). I commented it out to see if it would help, and it didn’t. This is also where I discovered that reflashing doesn’t clear the state.

Another item in the environment is that I use C++ class as the subscribed handler. I don’t know if that matters, but I’m trying to put out as many items as possible to see if it helps.

The problem has occurred on multiple - but not all - my Argons. They all run the same framework, so the message handling, firmware version, and other items are identical, with only the game logic being different. It reoccurs randomly, sometimes within 30 minutes.

I am able to detect the current state by publishing a message that they must respond to, and anyone who doesn’t respond tells me who has the problem. We’ve had to hook up remote power switches to bring them back on line. I have also had to take other measures to handle it (using published functions) for when the problem occurs in the middle of a the game.

Another room using a similar system does not have the problem, but that is running 1.5.2 because we still have some Xenons there.

So, my suspicions:

  1. It is related to 2.0.1
  2. It is related to retained (because of the poweroff clear)

Question: Can I do software reboot that dumps the all retained values? If so I could at least clear the situation with a reboot.

Thanks. It’s a real head scratcher, but hopefully someone has an idea.

I’ve started seeing the same problem on Argon’s running 1.5.2. Will not receive messages it is subscribed to, but I can call functions. Rebooting does not clear the problem. Neither, amazingly, does reflashing. Only a 10 second powerdown seems to reset it.

This may be related, not sure. When they are in that state, try to recover by downloading slightly modified code over the air:

I have tried a modified version and it did not work. I filed a support ticket, and it looks like this might be a Particle issue, and they are working on it. Not sure if it it is the same thing that you are experiencing, since I don’t use web hooks. I hope it is fixed soon, as it is really problematic for our room.

Thanks.

Ok. In our case it is also a subscription that stops working, just a subscription to a web-hook response.

Hey, and update for you all. I had a device enter into this “no respond” state. I flashed a completely different program onto it, then put the original program back. After that it started responding again. So, there is a software way out.

BTW, Particle is actively working on the underlying problem.