SOLVED: "Error claiming the device. Could not claim the device to your account"

I just updated to the latest version of particle-agent and I can no longer SSH into my Pi.

Here is what happens when I try right after reboot:

nrobinson@particle.local's password: 

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Nov 24 15:40:53 2016 from 2604:6000:8680:d300:1487:c185:f414:bb3e
-bash: /home/nrobinson/.bashrc: Input/output error
Connection to particle.local closed.

And if I try again:

Connection reset by 2604:6000:8680:d300:2cd3:eddd:805c:2b74 port 22

I’m going to try booting with a monitor.

Update:
I can now SSH again. Weird.

So that I can further investigate this, what is the DNS name that the agent connects to? everything else on my network is handling DNS just fine. I’ll try today on a totally different network and see if it makes a difference.

It connects to YOUR_PI_DEVICE_ID.agent.particle.io

You can test the DNS by doing

dig YOUR_PI_DEVICE_ID.agent.particle.io

in a terminal.

There are several different IP’s that you can get.

Okay, that resolves to one of the Spark nodes with dig and nslookup, so the DNS part appears to be working… though the logs are still showing 0.0.0.0.

What if you try the installer again?

bash <( curl -sL https://particle.io/install-pi )

No joy. I’m now looking for part of the particle-agent source code that’s generating those log lines so I can backtrack and debug, but they are unusually elusive…

I’m looking deeper into the logs and I’m finding this:

0000000004 system: INFO: Device 7ab717a71de432a31bcbe174 started
0000000004 hal: INFO: Virtual WLAN init
0000000004 system: INFO: ready():false,connecting():false,listening():false
0000000004 hal: INFO: Virtual WLAN on
0000000004 hal: INFO: Virtual WLAN connecting
0000000005 hal: INFO: Virtual WLAN connected
0000000006 hal: INFO: device key: (an actual valid key)~
0000000006 hal: INFO: server key: 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000~
0000000006 system: INFO: Cloud: connecting
0000000007 system: ERROR: connection failed to 0.0.0.0:5683, code=111
0000000007 system: INFO: Cloud socket connected
0000000008 comm: WARN: receive error -1
0000000008 comm: ERROR: Handshake: could not receive nonce: -1
0000000008 comm: WARN: handshake failed with code 2
0000000009 system: WARN: Cloud handshake failed, code=2
0000000259 system: INFO: Cloud: disconnecting
0000000264 system: INFO: Cloud: disconnected

So two things come to mind: 1 - why is the server key 0000…, as I checked /usr/share/particle/keys/server_key.der and there is definitely one there, and 2 - this turns out to be the actual firmware (tinker) that is reporting errors, not the particle-agent itself.

Also worth it to note that I am now on a totally different network with the same results. No other DNS issues on any other devices (though I think my earlier post already put that possibility to bed)

Are you sure the /usr/share/particle/keys/server_key.der is intact?

Here is what I get for the md5sum.

5c8c2370820fe5fcfd00a08e504524a7  /usr/share/particle/keys/server_key.der

Yes, that matches mine.

Strange. What if you try to put new firmware on the Pi?

Here are the commands to restore tinker:

$ sudo service particle-agent stop
$ sudo cp /usr/share/particle/binaries/tinker /var/lib/particle/devices/<id>/firmware.bin
$ sudo service particle-agent start

Nope, that’s not made any difference, either.

On the command line I am able to nc <devid>.agent.particle.io 5683 and it does connect and spew some encrypted data as I would expect (likely asking for a handshake). I think this whole issue has something to do with the build toolchain or “system firmware” for Pi; the firmware itself doesn’t seem to be resolving DNS. My guess (not knowing too much about the cloud claiming system) is that the device needs to be reachable by the cloud before it can be claimed… and if the firmware isn’t connecting, then it can’t contact the “device” (firmware) and thus cannot “claim” it… correct me if I’m wrong.

The cloud must be reachable by the device. The device contacts the cloud.

Okay, just a flip from the way I interpreted it. Because the device cannot contact the cloud, the cloud has no record of it being a valid device… so it cannot be claimed. Does that make sense?

I’m digging through the source of the firmware (raspberry-pi-0.6.0 branch) and comparing with the Photon code see that it uses a different mechanism to resolve hostnames.

Is there a way to compile tinker for Raspberry Pi with the firmware-level debug/trace statements enabled? I think this could offer a more telling story. I don’t have a full local toolchain set up for this.

You problem sounds somewhat similar to the one in this thread:

Over there, the hostname will not resolve correctly until after a restart of the Particle agent. Have you tried stop/start or restart?

One very strange difference:

His log file contained additional error messages:

0000000005 system: INFO: Cloud: connecting
terminate called after throwing an instance of 'boost::exception_detail::clone_impl >'
what(): resolve: Host not found (non-authoritative), try again later
Firmware exited with status pid 682 SIGABRT (signal 6)

(after particle agent restart)

0000000005 system: INFO: Cloud: connecting
0000000121 system: INFO: Resolved host xxxxxxxxxxxxxxxxxx.agent.particle.io to 52.91.48.237
0000000239 system: INFO: connected to cloud 52.91.48.237:5683
0000000239 system: INFO: Cloud socket connected

Mine does not include ANY of the lines between “Cloud: connecting” and “Cloud socket connected”. No successful DNS lookup, no “connected to cloud”… nothing. It’s strange because I have been reviewing the code and the process should throw some sort of error before it even hits “Cloud socket connected” if it can’t resolve the host.

I did some digging; In firmware/hal/src/photon/inet_hal.cpp, inet_gethostbyname(), there’s a code path that essentially allows inet_gethostbyname to fail but still set the IP address to 0.0.0.0. The only way this would be caught is if the caller checks the return value, which it does:

There’s no code path past this step without generating at least one of those two log lines. How in the world is it skipping this on my Pi?

I think I’m getting closer… I found a way for those lines to be skipped.

When the cloud connection is initialized in system_cloud_internal.cpp, it “initializes” the server address with whatever it can pull from deviceConfig.server_key.

As we can see in my previous logs, the server_key is being interpreted as entirely zeros. This results in a “server address” of 0 (0.0.0.0), this will then be interpreted as a valid IP address in parseServerAddressData():

where IP_ADDRESS has been defined as 0 in ota_flash_hal.h, so if the first byte of the server_key at the specified offset is 0, it is interpreted as an IP address - containing entirely 0s.

This is then checked by determine_server_address and because only the port is validated it passes this check.

At this point the system believes it has a totally valid IP address and attempts connecting to it, without bothering to check the DNS records.

All of this would not happen if one thing were different: the server key. Why is the server_key being read as all-zeros?

I fixed it. Somehow, through all of the installs and reinstalls of the Particle agent, it never once copied a valid server_key.der to /var/lib/particle/devices/<device_id>/server_key.der - it was a completely empty file. Once I copied it from /usr/share/particle/keys/server_key.der to that location and restarted the particle agent service, it connected instantly!

I was then able to re-run particle-agent setup and claim my device.

2 Likes

Okay, so I found why this was ultimately caused.

I have known Raspberry Pis to take quite a while (seconds) to properly sync their filesystems to SD card once filesystem changes are made. Keep this in mind — RPi 1’s are especially vulnerable to this.

  1. Install fresh Raspbian to SD card.
  2. Boot on an old RPi 1 B
  3. Install particle-agent. This provisions a device_id and copies files into the proper directory in /var/lib/particle/devices.
  4. When the USB bus fails (as detailed in my other thread), unplug power within seconds. At this point, an inode has been created for /var/lib/particle/devices/<devid>/server_key.der, but no data was actually synced to the SD card!
  5. Now, move that SD card into a RPi 2 B which is not vulnerable to the USB bus weirdness, and boot.
  6. The Particle firmware silently accepts the zero-length server_key, and extrapolates a 0.0.0.0 IP address from it, causing the behaviour detailed in this thread.
  7. Subsequent re-runs of particle-agent setup do not fix this problem, due to this code, which doesn’t bother copying the key if the server_key merely exists (ignoring whether it is actually valid data or not): https://github.com/spark/particle-agent/blob/master/lib/particle_agent/setup.rb#L189-L191

Things that could have avoided this:

  • If the Raspberry Pi 1 B synced to SD card faster, or the user gave it more time to do so (what can I say, I was excited to get all of this working!)
  • Check that the server_key is not empty, either in firmware or in particle-agent setup.
  • In firmware, throw errors if it is empty.
  • In particle-agent setup, re-copy the file from /usr/share/particle/keys if it is determined to be empty.
3 Likes

Hi there,
I am pretty newbie can you please help me.
I have a raspberry pi 4b with raspbian GNU/Linux 10 (buster) installed (Linux 5.10.103-v7l+
).
I also use this device as a unifi server controller and the pirelay appltication. So installing a fresh raspbian is no option.
I tried installing the particlepi and also get the problem “Error claiming the device. Could not claim the device to your account”.
I tried updating & upgrade the raspbian and all applications.
What am i doing wrong or am i missing something?
If you need more information please let me know.

screen 4