Why do some Photon's lose Particle connection so often?

Hi. I’m trying to figure out why some of our Photons have trouble connecting to Particle even when they have Wifi connectivity.

For brief background, we poll all our Photon’s in the field every 4 hours. And at each poll, we’ll have a handful of devices that time out. Occasionally, we go to the console and, yep, off line.

So, in the last build we added a new capability which is a 30 second timer that checks Wifi.ready() and Particle.connected() and if either return a false, the code increments one of the counters and these two number are then available in a published variable.

With a random check of devices, we usually get “0;0”. However, with one device, it reported “15;432” after being booted for only 28ish hours (when both counters would have been 0). That means 15 times on the 30 second interval it’s wifi.ready() was not working, BUT 432-15 times the Particle cloud was not connected but Wifi was working.

That doesn’t sound good! Thoughts?

Tahl

What do you mean by in the field?
How is the WiFi coverage at the particular location?
How noisy is the radio band?
Any other 2.4GHz signals (e.g. BT)?
How many other devices are using the same AP?
How is the IP lease time set on the AP/router?
Have you ever set static IP on the device?
Have you ever had external antenna selected on the device but are now running without?

Loads of possible contributing factors, so we can’t tell with a full set of info to base an answer on.

1 Like

@ScruffR,

Hi. Our devices sit is users’ homes. So each device is a different story. I’m not necessarily trying to diagnose a problem with a specific device in a specific home as we have lotsa users and lotsa homes. Instead, I’m trying to narrow down why when we poll devices, we will have a random handful of devices that will timeout when we check that device’s variable. If we have a particular device that always or usually has a problem, we reach out to the user and try to solve the “install problem”. I’m not trying to solve a device problem that recurs. I’m seeing if there is anything we can do to reduce the number of random devices that are temporarily down.

So, in that last firmware release we started counting wifi.ready() and particle.connected() on 30 second intervals. I did this to get a large sample size of the connection problem I just described. I found it interesting that a particular device might have no wifi.ready() problems but lots of particle.connected() problems. I found it interesting because I been assuming that wifi.ready() basically means it can talk to the Internet so does that mean Particle is off-line? Or?

Given that background, I suppose it possible that a given location might have wifi coverage problems, noise radio band problems, etc. at random and then it goes away. If that’s the reality, we’ll have create a strategy around that. Like I say, if it recurs with a particular device, we address it. I cannot rule out a design problem with the device.

Sorry for the long email. Hope that explains. Thanks for any insight.

1 Like

I see the Photon in my house dropping connection when the network is congested and at times, randomly.

Try to see if you can co-relate the temporary disconnect with the network usage (video streaming, multiple laptops etc) of a particular customer.

Definitely seen it happening in my home and even the NAS would say it’s offline and back up momentarily.

I have IFTTT triggers on several of my devices (Core and Photon) for the “Offline” event and all of the devices on a particular router seem to go offline once per day almost exactly 24 hours from one event to the next.

Checking my router logs, I have found out that this is when the DHCP lease is renewed for each device. The devices go offline for 30-120 seconds typically.

2 Likes

@kennethlimcp,

If I can find a group of willing users, I could do that. Problem is that the “random” problem may never hit that group of users, and we’d have to somehow notify the user that the Photon device is offline so that other activity in the home could be noted. It’s a daunting problem.

Indeed! I wanted to find out what’s the root cause for mine but it’s hard to co-relate even with my own home.

That’s why designing the system to run when WiFi is not available is critical (and recover once it’s up) :wink:

@bko,

If the Photon-based device was off-line for a couple of minutes, no problem! The problem we’re trying to solve is where the device is unresponsive for hours. For example, a device is working fine for let’s say a week, then is “goes dark” for, let’s say, 3 hours. We have no idea why.

That’s why we have the device gathering what information it can so when it does come back on line, it has a story to tell, so to speak.

I guess thing we could ping the Internet from the device. That way we could test the WiFi without assuming Particle is connected… Thinking out loud here…

I’ve been checking this too, and it seems to be random in my case. I had two Photons running a logging program, and there was no correlation between when the two devices lost connection to the cloud. The one that was running nothing but the logging program (just checking WiFi.connected() and Particle.connected()) rarely lost the cloud connection, while the one running my cat door loses it 5-15 times per day. I’m still trying to figure out why this is; there’s no blocking code, and it’s not doing much more than taking a couple of analog reads most of the time. It does have a somewhat weaker signal from the router, but it almost never reports false for WiFi.connected().

@Ric, @Tahl, on the Photon side, does your code time a cloud not connected condition and do a System.reset() if it exceeds a certain amount of time?

Mine doesn’t, but it doesn’t need that. They always reconnect on their own, usually in either 1 second, or in 7 seconds. I don’t know if those particular reconnect time are significant or not.

@peekay123,

Great question. The answer is no because I’m trying to characterize the problem first. I’m afraid if the system counts the number of not-connects and then reboots, it would erase the “evidence”. (Btw, we do log every time a device boots by the way including whether it is power-up or soft.) Secondly, I’m afraid that the unit might reboot over and over if the problem lasts for an extended period of time. Again, I want to characterize the problem first then come up with a solution. I suspect I’ll have to do a reboot as you mentioned at some point.

So maybe in addition to checking wifi.ready() and particle.connect(), I should do an Internet ping as a way of testing whether wifi is really ready (i.e., able to connect to the Internet). Sounds messy but …

Bottomline: I need to know why a Photon-based devices go off-line – randomly – for hours at a time then come back on-line like nothing happened.

Not if you use retained variables for your counters (or even EEPROM)

1 Like

That’s true.

Okay, looks like if I don’t get a “good” connection, I’ll reboot. “Good” I’ll define as either a Proton outbound message to my cloud with an ack and/or an Internet public website ping. I’ll do that in addition to counting wifi.ready()s and particle.connects() both of which are good information in their own right.

2 Likes

Hey @Tahl,
Did you find out a way to reduce the disconnects or something interesting regarding this?

Nothing to reduce the disconnects. I think that is more of a hardware issue and does seem to be very site-specific.

Thanks Tahl

I have been working on a similar problem for some time. In my case I see many disconnects on the console, maybe 50 overnight. I am 5ft from the WIFI access point, I tried another location 20ft away with same problem. I get Blue Blink, sometimes I even get magenta.

I have a working theory, last night I disconnected all the leads except USB power and only got 1 disconnect. I think some stray RF is getting into the device on the leads, I am using D6 and A0 as inputs. These leads are fairly long - maybe 50 ft and go to a switch and a thermistor. I plan on putting a scope on max hold on these leads and look for stray RF over time. If I find something perhaps I can filter out the RF and fix the problem. I could also try to eliminate the 50 ft wire and move the Photon closer to the sensors.

This will pick up lots of stray interference esp. if there are fluorescent tubes or dimmable LED lights - these tend to generate significant EMI and on long cables its a sure bet.

2 solutions - move closer (2-3 feet) or try a shield coaxial cable and attach the shield only at the photon end to 0V/Gnd

2 Likes

I added 1uF electrolytic capacitors to the inputs A0 and D6 and the strange disconnects are gone. The cap smoothed out the thermistor voltage and added a delay to the switch input, but both changes work well in my application.

2 Likes