I have many (>20) Xenons with an Ethernet Featerwings all running the same code on different (customers) networks. This is a standalone device without a mesh network. I have one that will momentarily disconnect often, 10 times an hour. How do I go about troubleshooting? Customer claims there is nothing wrong with their network. I have already tried replacing the Xenon. Running Device OS 1.1.0
What is unique about that customer’s network? I know they say “nothing wrong” but that is very subjective. What type of router/switch are they using (SOHO like Linksys/Netgear or Enterprise like Sonicwall/Watchguard/Cisco/Meraki etc)? What are their UDP timeout settings (since Xenons use UDP to communicate with the cloud)? What is the customer’s ISP? (…just to compare small biz ISPs like Comcast vs enterprise connections like from an LEC that provides an SLA-type agreement with symmetrical, guaranteed bandwidth.) How large is the internal network? How congested is the internal network? What is the topology (which would answer questions about controlling excessive multicast/broadcast traffic)? What QoS settings are active on the network and gateway routers, if any?
Also, is the device exhibiting abnormal behavior because of the disconnects? Or is this “kind of concerning” to see? I ask because as long as they don’t affect function, disconnecst could just be a benign thing.
From the code perspective, is there any part of your code that is blocking? Is the system thread enabled? It is possible that there are small hiccups in sensors, network, etc. that cause your code to hang briefly causing the disconnect. There are certain operations that are blocking irregardless of system threading enabled or not (for example, calling a Particle.publish() with the cloud disconnected, intentionally or not, may block and cause disconnects).
I don’t know much about the network since I’m just a vendor. I do know that it is an enterprise network with multiple locations, with 50-70 computers/printers/etc per location. I don’t know what their UDP timeout setting is, however I have run into UDP timeout settings causing problems before and since added Particle.keepAlive(20); to my code and haven’t had any issues since. I can find out what ISP and get back with you, I’m not sure how much they would be willing to share about congestion, topology, or QoS.
The product uses a web browser to interact with the Particle device via webhooks to update the screen in (near) real-time. When the device is offline the end user can’t complete their task. Since it is a web page on my server, if their internet was down, they wouldn’t even get to the step of opening the webhook.
Since reading your post I set SYSTEM_THREAD(ENABLED); I will report back if it helps, however there as already been a disconnect since then.
The this is the code running on the Xenon.
SYSTEM_THREAD(ENABLED);
int photoresistor = A0;
int power = A5;
int relay1 = D7;
int relay2 = D6;
bool beamBroken = false; bool isLockOrOpen = false;
int analogvalue = 0;
String localIP;
void setup() {
pinMode(photoresistor,INPUT);
pinMode(power,OUTPUT);
pinMode(relay1,OUTPUT);
pinMode(relay2,OUTPUT);
digitalWrite(power,HIGH);
digitalWrite(relay1,LOW);
digitalWrite(relay2,LOW);
localIP = Ethernet.localIP().toString();
Particle.variable("beamIntact", beamBroken);
Particle.variable("isLockOrOpen", isLockOrOpen);
Particle.function("cabinet",cabinetToggle);
Particle.variable("localIP", localIP);
Particle.keepAlive(20);
}
void loop() {
analogvalue = analogRead(photoresistor);
if (analogvalue>3400) {
if (beamBroken==true) {
delay(200);
if (analogRead(photoresistor)>3400) {
Particle.publish("beamStatus","broken",60,PRIVATE);
beamBroken=false;
isLockOrOpen=true;
}
}
else {
}
}
else if (analogvalue<2800){
if (beamBroken==false) {
delay(200);
if (analogRead(photoresistor)<2800){
Particle.publish("beamStatus","intact",60,PRIVATE);
beamBroken=true;
isLockOrOpen=false;
}
}
else {
}
}
}
int cabinetToggle(String command) {
if (command=="unlock") {
digitalWrite(relay1,HIGH);
delay(250);
digitalWrite(relay1,LOW);
return 1;
}
else if (command=="lock") {
digitalWrite(relay2,HIGH);
delay(500);
digitalWrite(relay2,LOW);
return 0;
}
else {
return -1;
}
}
Also worth noting, as a temporary solution before deploying the Xenon, we had a Photon running the same code and didn’t have these disconnects. We plugged a Linksys (home) router into the customers network, connected a computer into the Linksys Ethernet port, and connected the Photon to the Linksys wifi. The Photon rarely every dropped connection to the cloud, once a week or less. When we swapped the Linksys router and Photon with a Startech switch and Xenon (with Ethernet), is when the disconnects started. I have also swapped out the Xenon but that didn’t have any changes.
I have a similar application to this and prior to firmware 1.2.1.-rc.1, I would get many cloud / network events over a 24 hour period esp. when the USA “woke up” and started the business day (issues were far less on weekend for example) - so my view is that this is less about the customer network and more about the general state of the internet and connectivity paths. I put this
System.on(all_events, handle_all_the_events);
in my setup() and a function to display the system events as they happen using
void handle_all_the_events(system_event_t event, int param)
{
Serial.printlnf("got event %d with value %d", event, param);
}
and referencing the System Events table here to work out whats going on in the connections.
P.S. I may be wrong but as I understand it the communications to particle cloud are UDP based and this, unlike TCP, does not have handshaking and retry structures built into it - its more of a “fire and forget” method, so if a packet is dropped due to contention or congestion - it is gone…
Update: We swapped out the switch for a Linksys home router and plugged the Xenon into the router’s Ethernet port and the uplink into the customer’s network. Since then zero disconnects over 24 hours.
Clearly, there is something that the Xenon does not like about their network that the router is “fixing”, but where do I even start? All of their computers and my Raspberry Pi computer work just fine.
Looking for suggestions on where to being to diagnose.
Does Particle have a network requirements list somewhere?
@amillen, which switch are they using?
@peekay123 the switch we had out there was a StarTech DS81072 8 port desktop switch that was plugged into the customer’s network. The customer’s switch that our switch was plugged into I’m sure was a managed enterprise grade switch, but I don’t know what one because it is locked away in an IT closet.
The addition of a intermediate router is an interesting fix. So consider what that changes:
-
DHCP for the particle device is now supplied by the local router instead of the Enterprise router. There could be a change in DHCP lease durations andscope options. DNS should pass through unless you set that specifically on the local router. If you didn’t set a static IP on the local router WAN interface, perhaps that local router just handles the DHCP renewal more gracefully.
-
The local router effectively stops all multicast and broadcast traffic on the Enterprise network from reaching the particle device.
Having an official network requirements document would definitely help troubleshooting. I am currently testing Xenon and have similar problems.