Particle unable to connect with particle cloud after facing network connectivity issues. Unable to recover

Hello,
I want to understand the behavior of WiFi in the following scenario.

I have configured P1 with two valid WiFi network configurations stored, say network1 and network2. At the time of storing WiFi configurations, network1 and network2 are good i.e. P1 can connect to internet on either network.

After a while the network behaviour changed as follows,
On network1, P1 can connect to WiFi but can't connect to internet.
On network2, P1 can connect to WiFi and internet.

Whenever a device can't connect to internet, I see the following log messages on the particle console

unable to resolve IP for device.spark.io
Cloud socket connection failed: -1
Internet Test Failed!
Resetting WLAN due to 2 failed connect attempts
Handling cloud error: 2
Resetting WLAN due to SPARK_WLAN_RESET

I believe this calls WiFi.off() and WiFi.on(). The P1 can store 5 WiFi configurations. Since I have 2 WiFi configurations, the P1 will attempts to connect Network1 and then if it fails it should try Network2.

Is this retrying mechanism true if P1 is able to connect to WiFi on Network1 but not to the internet? When WLAN is reset due to failed internet test, does Particle again try network1 or does it try connecting to the next saved network credentials?

Based on observations on some particle devices, they end up in a state where they are connected to WiFi but not to the particle cloud(flashing cyan)

Thanks
Dheeraj

Sometimes my particle ends up in rapid blinking cyan. It doesn’t recover.
Here are the log messages:

0000387191 [hal.wlan] INFO: Using internal antenna
0000387196 [system] INFO: ARM_WLAN_WD 1
0000402033 [system] INFO: ARM_WLAN_WD 2
0000402033 [hal.wlan] INFO: Bringing WiFi interface up with DHCP
0000402067 [system] INFO: CLR_WLAN_WD 1, DHCP success
0000402071 [system] INFO: Cloud: connecting
0000402071 [system] INFO: Read Server Address = type:1,domain:device.spark.io
0000402081 [system] INFO: Resolved host device.spark.io to 34.224.xx.yy
0000402218 [system] INFO: connected to cloud 34.224.xx.yy:5xx3
0000402220 [system] INFO: Cloud socket connected
0000402220 [system] INFO: Starting handshake: presense_announce=1
0000402222 [comm.sparkprotocol.handshake] INFO: Started: Receive nonce
0000402318 [comm.sparkprotocol.handshake] INFO: Encrypting handshake nonce
0000402412 [comm.sparkprotocol.handshake] INFO: Sending encrypted nonce
0000402413 [comm.sparkprotocol.handshake] INFO: Receive key
0000402529 [comm.sparkprotocol.handshake] INFO: Setting key
0000402811 [comm.sparkprotocol.handshake] ERROR: Could not set key, 134918311
0000402813 [system] WARN: Cloud handshake failed, code=8
0000403063 [system] INFO: Cloud: disconnecting
0000403064 [system] INFO: Cloud: disconnected
0000403864 [system] INFO: Cloud: connecting
0000403864 [system] INFO: Read Server Address = type:1,domain:device.spark.io
0000403874 [system] INFO: Resolved host device.spark.io to 34.224.xx.yy
0000403997 [system] INFO: connected to cloud 34.224.xx.yy:5xx3
0000403997 [system] INFO: Cloud socket connected
0000403999 [system] INFO: Starting handshake: presense_announce=1
0000403999 [comm.sparkprotocol.handshake] INFO: Started: Receive nonce
0000404105 [comm.sparkprotocol.handshake] INFO: Encrypting handshake nonce
0000404107 [comm.sparkprotocol.handshake] ERROR: RSA encrypt error -1087
0000404107 [system] WARN: Cloud handshake failed, code=-1087
0000404359 [system] INFO: Cloud: disconnecting
0000404360 [system] INFO: Cloud: disconnected
0000405160 [system] INFO: Cloud: connecting
0000405160 [system] INFO: Read Server Address = type:1,domain:device.spark.io
0000405170 [system] INFO: Resolved host device.spark.io to 34.224.xx.yy
0000405279 [system] INFO: connected to cloud 34.224.xx.yy:5683
0000405279 [system] INFO: Cloud socket connected
0000405279 [system] INFO: Starting handshake: presense_announce=1
0000405281 [comm.sparkprotocol.handshake] INFO: Started: Receive nonce
0000405387 [comm.sparkprotocol.handshake] INFO: Encrypting handshake nonce

After this rapid blinking cyan…

Any ideas?

Thanks
Dheeraj

Try to use particle.io

npm uninstall -g spark-cli
https://nodejs.org/en/
npm update -g particle-cli

Running from source (advanced)
To grab the CLI source and play with it locally
git clone git@github.com:spark/particle-cli.git
cd particle-cli
npm install
node bin/particle help

This is irrelevant to my post

Here is a second connectivity behavior where the particle ends up in blinking cyan

0000403284 [system] INFO: Resolved host device.spark.io to 107.xx.yy.43
0000403438 [system] INFO: connected to cloud 107.xx.yy.43:5683
0000403438 [system] INFO: Cloud socket connected
0000403438 [system] INFO: Starting handshake: presense_announce=1
0000403440 [comm.sparkprotocol.handshake] INFO: Started: Receive nonce
0000403537 [comm.sparkprotocol.handshake] INFO: Encrypting handshake nonce
0000403537 [comm.sparkprotocol.handshake] ERROR: RSA encrypt error -1087
0000403539 [system] WARN: Cloud handshake failed, code=-1087
0000403789 [system] INFO: Cloud: disconnecting
0000403790 [system] INFO: Cloud: disconnected
0000419590 [system] INFO: Cloud: connecting
0000419590 [system] INFO: Read Server Address = type:1,domain:device.spark.io
0000419758 [system] INFO: Resolved host device.spark.io to 52.xx.yy.61
0000419861 [system] INFO: connected to cloud 52.xx.yy.61:5683
0000419861 [system] INFO: Cloud socket connected
0000419861 [system] INFO: Starting handshake: presense_announce=1
0000419861 [comm.sparkprotocol.handshake] INFO: Started: Receive nonce
0000419971 [comm.sparkprotocol.handshake] INFO: Encrypting handshake nonce
0000419971 [comm.sparkprotocol.handshake] ERROR: RSA encrypt error -1087
0000419971 [system] WARN: Cloud handshake failed, code=-1087
0000420223 [system] INFO: Cloud: disconnecting
0000420224 [system] INFO: Cloud: disconnected
0000436024 [system] INFO: Cloud: connecting
0000436024 [system] INFO: Read Server Address = type:1,domain:device.spark.io
0000436038 [system] INFO: Resolved host device.spark.io to 52.xx.yy.61
0000436160 [system] INFO: connected to cloud 52.xx.yy.61:5683
0000436160 [system] INFO: Cloud socket connected
0000436162 [system] INFO: Starting handshake: presense_announce=1
0000436162 [comm.sparkprotocol.handshake] INFO: Started: Receive nonce
0000436264 [comm.sparkprotocol.handshake] INFO: Encrypting handshake nonce
0000436264 [comm.sparkprotocol.handshake] ERROR: RSA encrypt error -1087
0000436266 [system] WARN: Cloud handshake failed, code=-1087
0000436516 [system] INFO: Cloud: disconnecting
0000436517 [system] INFO: Cloud: disconnected
0000452317 [system] INFO: Cloud: connecting
0000452317 [system] INFO: Read Server Address = type:1,domain:device.spark.io
0000452333 [system] INFO: Resolved host device.spark.io to 52.xx.yy.61
0000452563 [system] INFO: connected to cloud 52.xx.yy.61:5683
0000452563 [system] INFO: Cloud socket connected
0000452563 [system] INFO: Starting handshake: presense_announce=1
0000452565 [comm.sparkprotocol.handshake] INFO: Started: Receive nonce
0000452664 [comm.sparkprotocol.handshake] INFO: Encrypting handshake nonce
0000452664 [comm.sparkprotocol.handshake] ERROR: RSA encrypt error -1087
0000452666 [system] WARN: Cloud handshake failed, code=-1087
0000452916 [system] INFO: Cloud: disconnecting
0000452917 [system] INFO: Cloud: disconnected
0000484717 [system] INFO: Cloud: connecting
0000484717 [system] INFO: Read Server Address = type:1,domain:device.spark.io
0000485162 [system] INFO: Resolved host device.spark.io to 107.xx.yy.43
0000485293 [system] INFO: connected to cloud 107.xx.yy.43:5683
0000485293 [system] INFO: Cloud socket connected
0000485295 [system] INFO: Starting handshake: presense_announce=1
0000485295 [comm.sparkprotocol.handshake] INFO: Started: Receive nonce
0000485395 [comm.sparkprotocol.handshake] INFO: Encrypting handshake nonce
0000485395 [comm.sparkprotocol.handshake] ERROR: RSA encrypt error -1087
0000485397 [system] WARN: Cloud handshake failed, code=-1087
0000485647 [system] INFO: Cloud: disconnecting
0000485648 [system] INFO: Cloud: disconnected

In this case, P1 isn’t stuck but it keeps on trying and the error code is -1087.

Is there a way for the application to access these error codes?

Thanks
Dheeraj

Hi @dheerajdake

I didn’t find -1087 in the RSA error codes in MBED pem.h but there is a -0x1080 which is a key file format error.

Could the keys on this device be corrupted?

Has this device been provisioned to Particle Cloud?

You are definitely having key problems connecting to cloud.

1 Like

Hi @bko,
I don't believe that it's a key corruption issue because restarting the device fixes the problem. Yes, this has been provisioned to the cloud long back. This behaviour can be easily reproducible. My test behaviour is as follows:

  1. Launch a hotspot on my phone and connect particle P1 to it
  2. After the particle is connected to mobile phone hotspot, turn data off. This will keep the WiFi on and internet off. After about after 20 seconds particle starts to blink cyan. At this point of time if I enable and disable data before WLAN gets reset, I end up in one of the following states I mentioned.

Either particle keeps retrying to connect forever in blinking cyan or particle stops retrying with the message

0000405279 [system] INFO: Cloud socket connected

Could you also clarify about the WiFi retrying mechanism?

Thanks
Dheeraj

I doubt that the device will switch WiFi network once it has found the WiFi access is possible - irrespective of internet availability
But this would be a good use-case for this issue

But for the reason why your connection fails check your firewall settings. (I don’t think it’s a keys issue when it can connect via alternative networks)

2 Likes

This part of the most recent log, which is repeated many times, shows that:

  1. You can resolve the DNS name of cloud and get an address. Good, the internet is connected.
  2. You can open a socket to the cloud. Good, you can reach the cloud.
  3. You can announce your presence and receive the 40-bit nonce from the cloud. Still good.
  4. You somehow cannot encrypt the nonce+deviceID with the cloud's public key and return it.

According to this log, you are connecting over TCP here, you just cannot get the cloud connection all the way up.

As to why you cannot encrypt the nonce+deviceID, there could be many reasons. The cloud public key could be wrong on the device, the deviceID could be wrong on the device, the encrypted message (which should 256 bytes) is the wrong length from the device, the decrypted message on the cloud server has the wrong deviceID or nonce, the cloud does not have a public key for your device, etc. That is why I asked about provisioning and keys.

1 Like

Yes the network definitely has internet problems I agree. Working on it to get the connection fixed. I was able to emulate this by connecting P1 to mobile hotspot and turning data on and off. Also, I am not able to disconnect/reconnect WiFi or turn off WiFi module when P1 is stuck in blinking green state. So the only way to get out of this situation looks like particle reboot.

How to make sure that my provisioning keys aren’t corrupted? Also this happens at random times after restart. I was able to reproduce this by disabling/enabling internet to the AP.

I think we should get someone from Particle to check the logs for your device on the cloud side.

@KyleG can you get someone to help @dheerajdake with his P1 connectivity issue?

1 Like

This is surprising. I looked at the spark firmware too and I couldn't find that error code.

https://github.com/spark/firmware/blob/develop/crypto/mbedtls/include/mbedtls/rsa.h#L42

Let me ping someone that might be able to help, @rickkas7 are you able to assist?

1 Like

Hey All / @dheerajdake,

So, it sounds like you’re inducing a scenario where the network is valid, but the internet connection is lost, but you’ve found that resetting the device fixes the issue. If you’re managing your device in another system_mode other than AUTOMATIC, you might need to recreate some of the automatic recovery modes into your firmware.

In particular I recommend measuring the time your device has spent offline, and if it exceeds some value (lets say, 5 minutes, or 15 minutes), it performs a full restart of the device. This can help you recover from any kind of stuck issue that your firmware has landed in while running in something other than automatic mode, etc.

Also, if your device has an additional bluetooth radio, it’s possible your firmware is generating harmful bluetooth interference during the wifi reconnection, if say, your firmware transmits over bluetooth while the WiFi module is attempting to connect, etc.

Thanks,
David

3 Likes

Thank you for the reply. I am running in semi-automatic mode. For now, I have added the logic to reboot the system if it’s offline for 10 minutes.

I’ll disable BLE advertisement and test the behaviour.

Thanks
Dheeraj

1 Like