SoftAP .scan() in Busy WiFi Environment with Photon Triggers SOS

I’ve discovered a problem with the SoftAP .scan() call. Sometimes the Photon returns the scan results of local APs properly. But sometimes, immediately upon issuing the .scan() request, the Photon begins irregularly flashing red. This goes on for 5-15 seconds on average, after which the Photon then power cycles. I am on v0.4.4 with no custom application code.

In my testing, this problem presents both with @mebrunet’s softap-setup-page and with the @brewnerd’s browserified implementation of softap-setup. I don’t think it has much to do with the JS side of things.

In my home, there is an upper bound of about 10 APs within range, and I have never seen this problem there. (I am going to script up a loop to test this more rigorously later today).

However, at my office, where there are 20+ APs on a regularly basis, this issue occurs almost half of the time I try to issue the .scan() request.

the irregular flashing red will be an SOS followed by a number of flashes, the number of flashes will let you know what type of error that made the photon crash. Have a look at the docs page for more info

2 Likes

Ok, understood!

I currently am in my home environment (10 WiFi APs) and I made a short script that just keeps executing .scan(). So far, the Photon has returned results with 100% success. Not a single SOS.

I will post back when I manage to cause an SOS, what type of SOS it is.

> I share this because it does provide a single piece of information which is that whatever is causing the SOS, it is correlated not to the machine or browser or Photon I use, but the surrounding WiFi environment alone.

2 Likes

Edit:
Ok, sorry, I lied: The SOS count is #11 (invalid case)

Original:
Ok, I’ve gone through the SoftAP process back at my office (3 separate times). The SOS count is #12 (pure virtual call).

1 Like

Ok, we've canvassed our office and discovered something interesting: the problem is directly correlated to the presence of a specific router:

Belkin AC750 Dual-Band Wireless Router
F9K1116 v1

When this router is on and nearby, SOS code #11 frequently happens. As soon as we unplug it, the SOS's stop occurring (as far as can be experimentally determined).

  • This router was broadcasting an SSID 32 chars long. We changed this to 6 chars as our first attempt to determine the underlying issue. This did not stop the SOS problem.
  • We also moved the router to different locations, locations where previously the SOS error never occurred. The SOS errors always followed. The errors seemed more frequent when we pointed the internal antenna of the Photon right at the router.
  • We next left the router on, but logged in and disabled both its 2.4GHz and 5.0GHz SSIDs. The SOS errors stopped.
  • We turned 5.0GHz back on. The SOS errors still did not occur.
  • We turned 5.0GHz off, and 2.4GHz on. Still, no errors.
  • We turned both 2.4GHz and 5.0GHz on, and restored the original 32 char SSIDs. This is the exact same configuration state as the router was in at the very beginning of this thread. The errors have not yet returned.

I'm not sure what to conclude from this. I'd like to say it was a router-specific issue that was resolved after we updated the configuration parameters a few times. However, if this was true:

  1. Why haven't any other WiFi devices ever had a problem with this router?
  2. Why would a corrupt or improperly configured router (again, an assumption that seems to contradict #1) cause an "Invalid Case" software failure in the Photon's firmware?

To my mind #2 is the most important issue at hand here. Even if the WiFi environment is improperly configured (assumption), the Photon should at least fail gracefully.

1 Like

Thanks for this insight and for taking time to toubleshoot and reserach the issue :+1:

SOS of invalid case, that’s much more tractable. Although looking through our source code, this isn’t used anywhere on the Photon, and only in firmware for the Core, so I’m really puzzled!

I totally agree that the Photon should never SOS regardless of the WiFi environment. As soon as we can determine what the exact cause is we’ll take steps to avoid this from happening.

OK, I just went back into the router, and changed the 5GHz channel from “Do Not Broadcast” to “Broadcast”. I hit scan, and I immediately get another Invalid Case SOS. Definitely 11 flashes, we filmed it and watched it in slo-mo. It’s definitely back to being a highly reproducable error now.

The main difficulty in diagnosing the true cause is that almost as often as it SOS’s, it works and successfully returns the AP scan results.

http://orig02.deviantart.net/5b6e/f/2014/192/7/6/ancient_aliens_guy_hd_meme_by_pstrooper-d7p5dz1.png

Thanks for this, always appreciative of more data, and great that you have got the issue back to being reproducible so we can test out any proposed fixes.

It puzzles me though since the photon doesn’t even have a 5GHz radio, so whatever is triggering the SOS is occurring inside the router. Even stranger that our firmware doesn’t use Invalid Case so this is especially weird! I’m sure there’s things that will come to light to explain this, so I’ll just keep digging! :smile:

I’d like to point out that we get the same error when the 5GHz channel is on but set to “Do Not Broadcast.”

Regarding the SOS I could swear I’ve seen both #11 and #12, but I only have video evidence of #11 so that’s all I can really stand behind.

Other data that I’d probably check:

  • Are the 5GHz and 2.4GHz radios using the same SSID, or different ones?
  • What are the wifi channel numbers for each freq?
  • Is there anything unusual about the security config on that router? (WPA2 Enterprise, rather than PSK, for example)

That’s all I can think of at the moment…

  • SSID for both frequencies are and have always been the same.
  • 5GHz has always been set to “Auto” channel. 2.4GHz has been set to channel 3 for months. A few minutes ago I set that to “Auto” as well, but then immediately got SOS #11 on my next scan request, so I’m not sure if that helps…
  • The only security option on this router is WPA/WPA2-Personal (PSK). Current authentication is set to WPA2-PSK.

The next thing I would try, just for fun (if you haven’t already): Change the 2.4GHz SSID to something less than 32 characters. Maybe even less than 16 characters.

just to confirm, are you counting the flashes after the SOS? or are you including them too?

Im wondering if your actually seeing Non-maskable interrupt fault, or a Memory Manager fault

Sorry if it’s less than clear: I have already tried that. It caused no change in behaviour. The SOSs still occurred.

By “the SSIDs are and have always been the same” I mean that I always change the 5GHz and 2.4GHz to the same SSID; but I have tried different SSIDs. The shortesd SSID I tried was 6 chars long, still caused SOS on scan.

No. I start counting after 3 short, 3 long, 3 short. I consider red flash 10 to be 1 for SOS error coding purposes.

1 Like

Hey, is there an update on the topic?
I’ve the same problem, we’ve counted SOS + 11 but I think it is really a second sequence.
SOS +1, SOS +1 … We’ve watched the video a couple times and its pretty consistent.
I have a pretty noisy environment here as well and sometimes i can scan up to 15 times in a row,
but it crashes often at the 2nd, 3rd try
Thanks,
JKW

I’m in the same boat here, I think. I’ve got an ASUS router (AC1900) that crashes the Photon if it’s on the network during an AP scan. I’ve tried separating the 2.4 and 5GHz SSIDs, hiding 5GHz, and shutting off the 5GHz radio, but there was no effect. It seems like maybe 10% of the time it will complete a successful scan, but I can’t figure out a way to reliably eliminate the error, except to shut the router off and use a different one.

Please see this github issue - if it’s the same as what you’re experiencing then it will be fixed in 0.5.0. https://github.com/spark/firmware/issues/651

Ahhh, thanks. I’ll see if I can try that dev firmware and fix the issue. Would you expect to see any kind of other weird behavior related? Since my post, I’ve now seen it reset the SSID back to Photon from our custom prefix, change serial numbers entirely, and now I can’t configure any wifi at all it seems, even with re-flashing the system firmware and reverting to tinker over USB.

That doesn’t sound like the same issue, but rather it sounds like the DCT area has become corrupted - this is the region where many persistent settings are stored.

This of course shouldn’t happen, so if you’d like us to investigate, please run:

dfu-util -d 2b04:d006 -a 0 -s 0x8004000:0x8000 -U dct.bin

and send the file to me.