Listening mode on the Photon cannot work reliably in current implementation

I already created a GitHub issue, but posting here for visibility and discussion.

Listening mode on the photon requires 70% of the user available RAM on the photon if threading is enabled.

This means that if the user application uses more than 13kB of RAM, the device will be out of memory when listening mode is triggered. This causes lockups, WiFi not restarting, or hard faults.

Now you might think, what if I use System.on(setup_begin) to try to free all my memory before listening mode runs? Nope.
The SoftAP application is constructed before that hook is called.

So the only way to make listening mode work at the moment, is to have the application memory use under 13kb at all times.

See more details in this GitHub issue:

Possible solutions:

  • Don’t include the HTTP server on the photon. I have no idea why it is default included on the photon, but not on other platforms. This saves a lot of RAM
  • Check available memory on SoftAP start. If free RAM is really low, only start Serial. Add more services if there is room.
  • Give the user control over which services are included.

Anyway, the current implementation is horribly broken, but most users won’t notice. This is because:

  • They have a very small app not using much RAM
  • They hit a memory limit, but the symptoms are vague. Hard fault on out of memory is not enabled by default (which is good). But that means that all kinds of other things stop working randomly instead of a hard crash.

Would it be possible to have my own version of SoftAP triggered on button hold? So I can exclude mode of the services myself? I’d rather see a fix from Particle to save myself the work, but if it takes too long I will have to find a workaround.

Alternatively, can I completely disable listening mode in the user app? I have other ways to configure WiFi over Serial. No listening mode is better than a crash on demand button.

Is safe mode a viable alternative? Can I jump to safe mode instead when listening mode is triggered?

3 Likes

@Elco Some testing I did over the weekend indicates that the problem goes back to at least OS 1.2.1 in different ways but is making use of SoftAP for WiFi setup unusable alongside commercial applications. @UMD has a topic on something very similar.

PS, if anyone at Particle needs help analyzing memory use (@avtolstoy ?), I use this script to start the cross compiled gcc app under valgrind, with massif to log allocations.

The result can be viewed with massif-visualizer and kcachegrind to see what uses memory.
EEPROM emulation in gcc build allocates a lot of memory, because the file is re-opened and closed for every byte. That pollutes the view a bit and might be improved.
My attempt to ignore these didn’t really work. See below. If you find a fix, I’d be happy to hear about it.

--ignore-fn=call_init.part.0 \
--ignore-fn=_IO_file_doallocate \

To be valuable tool, the gcc build maybe needs to mimic the hardware build more closely, but when trying to reduce memory use, it has been a great tool for me.

The gcc build doesn’t have listening mode as far as I know, but perhaps mocking parts of listening mode in the gcc build will be easier for you than tracking memory use on hardware.

#!/bin/bash
MY_DIR=$(dirname $(readlink -f $0))
BUILD_DIR="$MY_DIR/target/user/platform-3/firmware/brewblox"
EXECUTABLE_DIR="$MY_DIR/target/brewblox-gcc"
EXECUTABLE="$EXECUTABLE_DIR/brewblox"
OUTPUT_DIR="$MY_DIR/coverage"
DEVICE_KEY="$EXECUTABLE_DIR/device_key.der"
SERVER_KEY="$EXECUTABLE_DIR/server_key.der"
STATE_DIR="$EXECUTABLE_DIR/state"
EEPROM_FILE="$EXECUTABLE_DIR/eeprom.bin"

ls "$EXECUTABLE" 
if [ ! -f "$EXECUTABLE" ]; then
    echo "brewblox executable not found!"
    exit 1
fi

pushd "$EXECUTABLE_DIR" || exit

# eeprom file writes cause a lot of memory allocation. Import blocks manually after start for a less polluted
touch "$DEVICE_KEY" "$SERVER_KEY" "$EEPROM_FILE"
mkdir -p "$STATE_DIR"
mkdir -p "$OUTPUT_DIR"

rm "$EXECUTABLE_DIR/massif.out"
rm "$EXECUTABLE_DIR/xtmemory.kcg"

valgrind --tool=massif --threshold=0.1 \
--xtree-memory=full --xtree-memory-file="$EXECUTABLE_DIR/xtmemory.kcg" \
--ignore-fn=call_init.part.0 \
--ignore-fn=_IO_file_doallocate \
--massif-out-file="$EXECUTABLE_DIR/massif.out" "$EXECUTABLE" --device_id 123456789012345678901234 --device_key="$DEVICE_KEY" --server_key="$SERVER_KEY"

popd || exit

# open massif.out with massif-visualizer
massif-visualizer "$EXECUTABLE_DIR/massif.out" &
# open xtmemory.kcg with kcachegrind
kcachegrind "$EXECUTABLE_DIR/xtmemory.kcg" 

1 Like

Recently Particle employees kept mentioning that filing a GitHub issue will not count as a bug report and they won’t monitor the repository issues as “closely” as they “used to do”.

If you want a bug report to be considered you should rather open support tickets.

Yes, I don’t understand this policy at all. And I have let them know this.

I have an issue about a software bug, but the place to report this is not in a website for open source software specifically designed to report issues and collaborate?

No, I had to open a support issue, pick a category that didn’t really fit and use a form that is less fit for purpose than a GitHub issue. Now when they start working on the issue, there is not really a way to follow this compared to GitHub.

To quote:
“we prefer that issue are reported using the support portal. It allows us much better visibility and avoids issues being created that are related to something other than that which Particle has control over.
We have an internal issue escalation process where incoming issues are discussed and prioritised with engineering weekly - GitHub issues are not part of that discussion.”

Why the hell are GitHub issues not part of the engineering weekly discussions?
Please use this inferior tool, because we are ignoring the tool that is designed to manage software issues in favor of our generic catch all ticket system.

Maybe they don’t want the discussion of software bugs public? I cannot think of another reason.

P.S. I understand that “I have an issue getting my program to work correctly” should not really be a GitHub issue if it is likely a user error. It should be a forum post.
But a bug report like this, with a test app and a well researched problem report is a another story.

4 Likes

Me neither :wink:

2 Likes

@elco, you have well explained the ongoing SoftAP issue which is a sore point for some.

I have been in the process of completing a long winded (read years of work) commercialising a codebase which is filled to the brim.

Have had many posts on this topic through the years and now that I am nearing completion, wish to get SoftAP functioning again.

Had to perform the following to get it to work: Using Soft AP example - can't connect to HTTP, which was fine under DeviceOS 1.4.4 but failed under DeviceOS 1.5.0 due to the heap memory issue.

In the interim have simply disabled SoftAP and use another “user friendly” method of generating a configuration barcode and reading that in (obviously using an onboard barcode reader which is at my disposal).

Re a fix: I broke up setup() into two parts, setup() and setup2(); setup2() is called once in loop(), not setup().

I placed objects in setup2() that allocate their own heap memory in their begin() methods. Upon entry, setup2() first checks to see if it is in listening mode, if so, it exits without calling the begin() methods.

Of course, you have missing functionality, but this is acceptable because you are in “SoftAP configuration mode”, which means that you need to reset once you have finished using SoftAP, ie no big deal.

Hope this has been helpful. Keep up the investigations!

PS - Really impressed with your memory analysis work. Phenomenal!

Closing in on product launch and a tester reported an issue with listening mode not working today.

So I went looking for an old thread I recalled about issues with listening mode and memory usage just as a refresher.

I saw this post and thought to myself “Yes this looks like it, how long ago was this posted?”
1 day. 1 ****** day.

Having now seen this thread I’ve thrown up.

You should all just assume that I’m now posting lots of sweary words because I don’t want my post deleted.

This issue is not new. If you search the forum for “listening mode memory”, you’ll see many old topics saying that you need 32kb of memory for listening mode to work.

The problem is that it was not considered an issue by Particle that listening mode uses so much memory. And that freeing memory before entering listening mode is impossible, because the callback hook is executed AFTER SoftAP allocates all that extra memory.

Your only reliable option if you app always requires wifi is to not start the app until wifi is configured. I hope to see a confirmation from Particle soon that they agree this is a serious issue and that the current design is flawed.

The “new” part is that is was working up until recently pre v1.5.0.

My mistake was not rechecking memory usage after moving to 1.5.0.

I have not tested the sample app I provided in the GitHub issue on 1.4. It could be that the base memory use increased too, pushing it over the edge.
But it is also possible that the bugs just manifest differently. When the system runs out of memory and random things stop functioning, behavior becomes unpredictable.

PS, please also send in a support ticket referring to this bug. I think many users are suffering from this, but it can be hidden and hard to pinpoint. I think this should be top priority at particle right now.

1 Like

This seems to be the only workable approach at the moment, but it has a downside:

  • The user can only trigger listening mode before setup2 is called. It requires a delay in startup to give the user time to do that.
  • After setup2 has been called, the listening mode button still exists and has become a 'crash on demand' button.

My application can also function on USB alone, without WiFi. So I don't want to wait until WiFi has been set up. But I can configure WiFi over USB with my own protocol. That is why I would be happy to disable listening mode entirely, just to remove the 'crash on demand' button. I would just instruct my users to use USB at first and switch to WiFi after configuring it over USB.

@Elco, have never experienced "cash on demand" from the Setup button... not good!

Have you tried using System.buttonPushed() as described here: https://docs.particle.io/reference/device-os/firmware/photon/#buttonpushed-

I have never used it, but looks like registering a button_handler might take over what the button does normally.

Agree with your comments re the downsides of the method. In the end, I disabled SoftAP, for now.

Re:

Your only reliable option if you app always requires wifi is to not start the app until wifi is configured.

What we do is to always set up a "factory default" WiFi credential (after first checking to see if it already exists).

I wouldn't mind a 'cash on demand' button :wink:

That button handler might be a good solution for me indeed. I could just start my own version of listening mode that just does the serial bit and skips the rest. Thanks for pointing that out.

But listening mode is triggered when the button is held over 5 seconds, not sure this hook would override that. The docs are lacking.

How did you disable SoftAp?

Are you doing that because you don't know about this option?
WiFi.connect(WIFI_CONNECT_SKIP_LISTEN);

https://docs.particle.io/reference/device-os/firmware/photon/#connect-

The “cash on demand” statement was technically correct!!

Re the button handler - just test it out using the Web IDE (I find that it is the quickest way of performing little experiments like this). Let us know how it goes. Hope it works for you.

By disabling SoftAP, I mean just not including the handler in the STARTUP() macro:

STARTUP(
    Keyboard.begin();    // Allows the HID device attach for the first time after boot 
                         // with *both* Serial and Keyboard

#ifdef SOFTAP_ENABLED
	// Refer https://community.particle.io/t/application-softap-http-pages-issue/22499/4
	// Be sure to initialize the softAP pages in a STARTUP() macro so they
	// are setup *before* the device connects to the internet.
	//
	// If it is initialized in the setup() method, then SoftAP pages
	// won’t be available until the device has connected to the cloud.
	softap_set_application_page_handler(myPage, nullptr);
#endif  // SOFTAP_ENABLED
);

I can’t tell you if performing the above saves on heap memory as I have not bothered checking.

Re the WiFi factory default, we just find it really handy for a number of reasons.

That just overrides the default pages. If you don’t add that handler, softap is still enabled but with the defaults.

Not new at all… I’ve slammed into the AP mode RAM consumption issue since the initial WPAE compatibility integration starting with 0.7.x. I have a large fleet of P-series devices that continue using 0.6.3 as we have an app-based AP mode provisioning process.

1 Like

This is just in. Particle considers this a feature request, not a bug.

:angry:

Well, technically, having your wifi setup process work without having 70% of your memory free could be considered a feature request.
But given how unlikely it is that 70% of memory is free in any serious application, I think assuming that’s the case and just starting the HTTP server with SoftAP, without checking memory, is a bug.

And given how easy it would be to NOT make SOFTAP_HTTP default 1 on the photon, as a first remedy, I think this should be part of the 1.5.1 milestone.

@avtolstoy, did you get a look at this? I find it hard to believe you think this implementation is okay too.

3 Likes

Shall we collect some names of application developers affected by this issue?

The folks at Particle don’t appear to think this is a serious issue. I think @jimini being stuck at 0.6.3 is a telling example of how this bug affects developers of serious applications on the device-os framework.

I feel a bit like a whistle-blower trying to get attention for a major flaw in the framework, but so far I have not gotten one serious reply from Particle that showed me they understood what I am trying to tell them.

1 Like

One of our applications is quite RAM intensive doing a lot of extended 1KHz analog sampling/recording for waveform analytics, and we’ve gotten around this by staying put @ 0.6.3. These devices are WiFi provisioned and softAP mode operation is paramount.

While we don’t feel that hamstrung by not being able to use a DevOS > 0.6.3, we can indeed debate the merit of feature vs. bug (where have I heard this trade-off before?).

If we’re on the SoftAP subject in general, I think it’s another bigger oversight/omission that I can’t surface SoftAP on Gen3 platform (that might be on the release radar).