[Bug] Argon won't go back online after a reboot


#1

Observed behavior: the Argon can be gotten online only by restarting the new device setup procedure in the Particle mobile app. It cannot be gotten back online with particle serial wifi.

After a reboot, it starts with a moderate green flash, then transitions to a rapid/fast green flash. It never proceeds past this point.

When using particle serial wifi, it successfully detects the network. If intentionally I enter the wrong password, the green light flashes in the identical pattern to if I input it correctly.

Expected behavior: the Argon should reconnect after boot (i.e. not rapidly flash green and should show up in the console)

Firmware: 1.4.4

Note 1: I do not have a second Argon so cannot test if this is the network or the device.
Note 2: I have a JTAG debugger and can poke around.


#2

More information, and a refined understanding of the bug:

Opening serial monitor for com port: "/dev/tty.usbmodem1421301"
Serial monitor opened successfully:
0000005341 [net.th] TRACE: OpenThread state changed: 12a5
0000005342 [net.ifapi] INFO: Netif th1 link UP
0000005343 [system.nm] INFO: State changed: IFACE_UP -> IFACE_LINK_UP
0000005345 [net.th] TRACE: OT_CHANGED_THREAD_NETDATA
0000005345 [net.th] TRACE: Synchronizing IP state with LwIP
0000005346 [net.ifapi] TRACE: Netif th1 ipv6 addr state changed
0000005348 [net.ifapi] TRACE: Netif th1 ipv6 addr state changed
0000005350 [net.th] TRACE: Added FD9C:4A3:4B8F::FF:FE00:FC00 0
0000005351 [system.ot] TRACE: IPv6 address was added
0000005351 [system.ot] INFO: Role changed: leader
0000005352 [system.ot] TRACE: RLOC was added
0000005353 [system.ot] TRACE: Partition ID changed
0000005354 [system.ot] TRACE: Thread network data changed
0000005355 [system.ot] TRACE: Subscribed to IPv6 multicast address
0000005356 [hal] TRACE: NCP ready to accept AT commands
0000005357 [ncp.at] TRACE: > AT+CMUX=0
0000005359 [ncp.at] TRACE: < OK
0000005360 [gsm0710muxer] INFO: Starting GSM07.10 muxer
0000005361 [gsm0710muxer] INFO: Openning mux channel 0
0000005361 [gsm0710muxer] INFO: GSM07.10 muxer thread started
0000005414 [gsm0710muxer] INFO: Resuming channel 0
0000005415 [gsm0710muxer] INFO: Openning mux channel 1
0000005466 [gsm0710muxer] INFO: Resuming channel 1
0000005467 [gsm0710muxer] INFO: Resuming channel 1
0000005468 [ncp.at] TRACE: > AT
0000005519 [ncp.at] TRACE: < OK
0000005520 [ncp.at] TRACE: > AT+CWDHCP=0,3
0000005569 [ncp.at] TRACE: < OK
0000005569 [hal] TRACE: NCP state changed: 1
0000005570 [net.esp32ncp] TRACE: NCP event 1
0000005574 [gsm0710muxer] INFO: Openning mux channel 2
0000005669 [gsm0710muxer] INFO: Resuming channel 2
0000005670 [hal] TRACE: Connecting to "Interpenguin"
0000008071 [ncp.at] TRACE: < WIFI CONNECTED
0000009021 [ncp.at] TRACE: < OK
0000009021 [hal] TRACE: NCP connection state changed: 2
0000009022 [net.esp32ncp] TRACE: NCP event 2
0000009023 [net.esp32ncp] TRACE: State changed event: 2
0000009024 [net.ifapi] INFO: Netif wl3 link UP

This line tells the tale: 0000005670 [hal] TRACE: Connecting to "Interpenguin"

Interpenguin is not the correct AP. Somewhere an old access point which hasn’t been used in a year is being retained, and the new access point is being lost after reboot.

@rickkas7, is this something which you’ve seen before?


#3

Okay, so figured out the bug. The access point Interpenguin was still active, although it has no internet connection. The Argon was trying this access point first, before attempting to connect to the one which had been configured during setup. So since it could get onto that access point, but it couldn’t get online, it froze.

The problem then comes down to the Argon either not trying or not trying sufficiently hard to reconnect to the last known good AP. Furthermore, after an extended period of failing to get online, it did not try any of the other known access points.

Strangely, exactly once after a power down did I see the device get onto the cloud, which means that that time it connected to the correct AP. Maybe it’s a race condition?

I’d love to know more about this, because I think it has some sharp edges. From this experience, we know that an Argon can effectively brick itself if it has connected previously to an AP which later on is disconnected from the WAN.

P.S. The solution was to boot using SYSTEM_MODE(MANUAL);, then WiFi.clearCredentials();, and then WiFi.setCredentials(), before returning to normal (by removing all the new code)