WiFi reconnection issue - what does this trace mean?

@dan.s, suggest that you answer the question by disabling SoftAP and seeing if this resolves the problem. Please report back!

I had so many issues with SoftAP with my specific environment that I disabled it in the end.

I workaround I have used successfully is to check the free memory is >38800 before starting SoftAP wifi setup. Otherwise, if the memory is less then odd things happen - it is variable where it happens but generally it will hang at some point. The exact value of free memory changes with device OS - as I understand 17K (heap and stack) is required to keep the device OS working correctly and 21-22K is needed by SoftAP to load the pages.
Device OS 1.5.X just uses 10-11K more memory than 1.4.4.
Fortunately, 2.0.0-rc.4 fixes this issue (but increases App flash space by 1800 bytes).

@armor,

Regarding your comment:

Which issue is it fixing? Is is specifically in the rc.4 release? I admit to not having tried the latest DeviceOS because it did not seem to have any great relevance to the Photon/P1 operation.

2.0.0-rc.4 fixes the memory issue that was introduced in 1.5.X and fixes other issues with WiFi behaviour that have been broken since 1.0.1
I can only attest to rc.4 behaviour.

How do you disable softAP?

I donā€™t really understand how it could be affecting it. When running normally, until the user clears credentials, it should never reach softAP again.
It is during this normal operation that Iā€™m seeing strange connection/reset issues due to memory, even though free memory is at 26kB (running v1.5.0).
As @armor suggests, maybe there is a solution in v2.0 when the production release comes outā€¦

@dan.s,
To disable SoftAP I used #ifdefā€™s - here are the snippets of code.

#ifdef SOFTAP_ENABLED
#include "softap.h"
#endif

STARTUP(
...
#ifdef SOFTAP_ENABLED
    // Refer https://community.particle.io/t/application-softap-http-pages-issue/22499/4
    // Be sure to initialize the softAP pages in a STARTUP() macro so they
    // are setup *before* the device connects to the internet.
    //
    // If it is initialized in the setup() method, then SoftAP pages
    // wonā€™t be available until the device has connected to the cloud.
    softap_set_application_page_handler(myPage, nullptr);
#endif
);

#ifdef SOFTAP_ENABLED
    //
    // Set up SoftAP pages
    //
    softap_setup(); // NOTE - calls WiFi.on()
#endif              // SOFTAP_ENABLED

Ah it looks like youā€™re setting up your own custom softAP page. Not sure this would work for us as we just use the default, so no explicit calls to softAP are ever mentioned - I guess the references would be in WiFi.listen()

Dan
I think the answer/confirmation you may be looking for is - if the free memory is more than a certain size then call WiFi.listen(); and wait for WiFi credentials to be entered with a timeout, otherwise donā€™t because there is a high likelihood it will hang due to insufficient heap.

@dan.s,

Am not sure what you mean. Referring to https://docs.particle.io/reference/device-os/firmware/photon/#softap-http-pages you must supply the pages within your code.

Have you got this include in your code?

#include "softap_http.h"

If so, remove it and see what happens.

@armor that is a good idea, however I've programmed it so whenever the device is in listening mode, it restarts and does not load any of the large memory buffers, therefore reducing the free memory down to as low as it could go. See here. This has worked well for us and we've not had problems with softAP since. The problem lies in normal operation after a long amount of time on weak wifi. I know it is happening on weak wifi because there are these [hal.wlan] TRACE: connect cancel scattered around the logs before it goes offline.

@UMD we don't use any softap include statements. From the docs, it states there is a default page;

When a browser requests the default page ( http://192.168.0.1/ ) the system internally redirects this to /index so that it can be handled by the application.

This index is pre-written, although it can be redirected to a custom page like you may have done. But I'm not sure this default page can be disabled?

@dan.s, well that is news to me, I had no idea that SoftAP is enabled by default, it operating as an Access Point in listening mode. I will test thatā€¦

Am now wondering if SoftAP can actually be disabled?

Hopefully you will be able to reduce your memory usage to assist with the issue.

Got it. I think the good news is that this behaviour (memory loss or fragmentation with poor WiFi) seems to have been corrected in OS 2.0.0-rc.4.

@UMD
If you have a look at the reference docs for Photon it is described thus:

When the device is in listening mode, it creates a temporary access point (AP) and a HTTP server on port 80. The HTTP server is used to configure the Wi-Fi access points the device attempts to connect to. As well as the system providing HTTP URLs, applications can add their own pages to the SoftAP HTTP server.

SoftAP HTTP Pages is presently an advanced feature, requiring moderate C++ knowledge. To begin using the feature:

  • add #include "Particle.h" below that, then
  • add #include "softap_http.h" below that still
// SYNTAX

void myPages(const char* url, ResponseCallback* cb, void* cbArg, Reader* body, Writer* result, void* reserved);

STARTUP(softap_set_application_page_handler(myPages, nullptr));

The softap_set_application_page_handler is set during startup. When the system is in setup mode (listening mode, blinking dark blue), and a request is made for an unknown URL, the system calls the page handler function provided by the application (here, myPages .)

To not use SoftAP - donā€™t include softap_http.h, donā€™t call the STARTUP macro with the softap_set_application_page_handler and lastly donā€™t put the device in listening mode. The RAM usage is only started when the temporary AP is created i.e. once WiFi.listen() is called.

1 Like

@armor, we are on the same page so to speak!

I have just confirmed for myself that SoftAP is operational in Listening Mode and that without the DIY pages, it does not do much. You live and learn.

So it seems that the implementation of the DIY HTML pages can be a cause of memory grief, not the underlying SoftAP mechanism itself.

Anyhow, back to @dan.sā€™s issue - this may be related: Comm.protocol Event Loop error 24.

Would be real interesting @dan.s if you take on @armorā€™s (implied) advice and try DeviceOS 2.0.0-rc.4. Does this resolve your issue?

@UMD Problem is I canā€™t seem to reproduce the problem, just seeing it on a small percentage of devices we have in the field. I may try updating one of the devices showing the problem - itā€™s a bit risky though because these are fairly remote customer devices and I donā€™t want to end up crashing a device due to a completely different bug appearing because of the update! Would rather do some fully formed testing on the LTS production release.

Any idea when the production version of DeviceOS 2.0.0 will come out?

@dan.s, I now exactly what you mean re fault reproduction - in field and lab bench environment can be wildly different. For example, WiFi interference, power issues, electrical interference (like a train line nearby), etc.

Would be great if you could reproduce the fault somehow using the exact same hardware and current firmware.

Have you tried the usuals whilst viewing fully instrumented logs in the lab?

  • log system.freeMemory() in loop() <=== really important
  • power cycle the access point
  • reduce the transmit power from the access point
  • force channels on the access point
  • place the access point at a large distance from the device
  • disconnect and reconnect the ethernet cable from the access point
  • (others can pitch in!)

If you canā€™t reproduce the fault, suggest that you try the DeviceOS upgrade (after inhouse testing) on the most problematic in field device. Unfortunately, due to the nature of the fault, this is going to take timeā€¦

1 Like

Some time after 3 December which is when the bug bounty ends and depending upon the issues discovered that need to be addressed before release. Someone from Particle Product Management would be better at giving a definite date - end of 2020 was the stated aim.

1 Like
  • (others can pitch in!) People moving around attenuate the WiFi signal!
1 Like

So after some effort, I managed to reproduce the problems with a portable hotspot placing it far away and turning it on and off. When signal was weak or non existent, the photon was stuck flashing cyan and stalled after outputting:
v1.5.0 blinking cyan freeze after disconnect AP

So tried upgrading to v2.0.0-rc.4 which of course wasnā€™t straight forward - it gave us lots to fix such as casting time_t everywhere etc. but then the compiler complained that our SRAM overflowed by 1800 bytes! So it looks like the new firmware uses some of the SRAM space now?

The benefit of the new firmware seems that it goes to hard error more easily rather than stalling. This was eye opening too because it eventually made us realise that we were also overflowing on EEPROM. We are pushing the SRAM and EEPROM to the limits and I hold my hands up, we were not keeping track. This was not an easy fix though and has forced us to rethink the whole product architecture (not just the code).

Long story short, thanks to the upgraded compilation/runtime errors of 2.0.0 we were able to bring the SRAM and EEPROM memory within limits and the reconnection problems seem to disappear, even on v1.5.0!

1 Like

@dan.s,

Excellent!

Keeping up with the latest DeviceOS is the way forward, even though free memory reduces with each releaseā€¦

So, in short, the issue was memory, memory, memoryā€¦

Wouldnā€™t it be great if a hardware compatible ā€œP2ā€ was released with double the RAM and FLASHā€¦ that would solve a lot of problemsā€¦