How to Diagnose Disconnects with Xenon on Ethernet

amillen · May 21, 2019, 9:06pm

I have many (>20) Xenons with an Ethernet Featerwings all running the same code on different (customers) networks. This is a standalone device without a mesh network. I have one that will momentarily disconnect often, 10 times an hour. How do I go about troubleshooting? Customer claims there is nothing wrong with their network. I have already tried replacing the Xenon. Running Device OS 1.1.0

ninjatill · May 24, 2019, 3:27pm

What is unique about that customer’s network? I know they say “nothing wrong” but that is very subjective. What type of router/switch are they using (SOHO like Linksys/Netgear or Enterprise like Sonicwall/Watchguard/Cisco/Meraki etc)? What are their UDP timeout settings (since Xenons use UDP to communicate with the cloud)? What is the customer’s ISP? (…just to compare small biz ISPs like Comcast vs enterprise connections like from an LEC that provides an SLA-type agreement with symmetrical, guaranteed bandwidth.) How large is the internal network? How congested is the internal network? What is the topology (which would answer questions about controlling excessive multicast/broadcast traffic)? What QoS settings are active on the network and gateway routers, if any?

Also, is the device exhibiting abnormal behavior because of the disconnects? Or is this “kind of concerning” to see? I ask because as long as they don’t affect function, disconnecst could just be a benign thing.

From the code perspective, is there any part of your code that is blocking? Is the system thread enabled? It is possible that there are small hiccups in sensors, network, etc. that cause your code to hang briefly causing the disconnect. There are certain operations that are blocking irregardless of system threading enabled or not (for example, calling a Particle.publish() with the cloud disconnected, intentionally or not, may block and cause disconnects).

amillen · May 24, 2019, 6:41pm

I don’t know much about the network since I’m just a vendor. I do know that it is an enterprise network with multiple locations, with 50-70 computers/printers/etc per location. I don’t know what their UDP timeout setting is, however I have run into UDP timeout settings causing problems before and since added Particle.keepAlive(20); to my code and haven’t had any issues since. I can find out what ISP and get back with you, I’m not sure how much they would be willing to share about congestion, topology, or QoS.

The product uses a web browser to interact with the Particle device via webhooks to update the screen in (near) real-time. When the device is offline the end user can’t complete their task. Since it is a web page on my server, if their internet was down, they wouldn’t even get to the step of opening the webhook.

Since reading your post I set SYSTEM_THREAD(ENABLED); I will report back if it helps, however there as already been a disconnect since then.

The this is the code running on the Xenon.

SYSTEM_THREAD(ENABLED);

int photoresistor = A0;
int power = A5;
int relay1 = D7; 
int relay2 = D6; 
bool beamBroken = false; bool isLockOrOpen = false;

int analogvalue = 0;

String localIP;

void setup() {
    pinMode(photoresistor,INPUT);
    pinMode(power,OUTPUT);
    
    pinMode(relay1,OUTPUT);
    pinMode(relay2,OUTPUT);
	
	digitalWrite(power,HIGH);
    digitalWrite(relay1,LOW);
    digitalWrite(relay2,LOW);

    localIP = Ethernet.localIP().toString();

    Particle.variable("beamIntact", beamBroken);
    Particle.variable("isLockOrOpen", isLockOrOpen);
    Particle.function("cabinet",cabinetToggle);
    Particle.variable("localIP", localIP);
    
    Particle.keepAlive(20);
}


void loop() {
    analogvalue = analogRead(photoresistor);
 if (analogvalue>3400) {
    if (beamBroken==true) {

        delay(200);
        if (analogRead(photoresistor)>3400) {
            Particle.publish("beamStatus","broken",60,PRIVATE);
            beamBroken=false;
            isLockOrOpen=true;
        }
    }
    else {

    }
  }
  else if (analogvalue<2800){
      if (beamBroken==false) {
        delay(200);
        if (analogRead(photoresistor)<2800){
        Particle.publish("beamStatus","intact",60,PRIVATE);
        beamBroken=true;
        isLockOrOpen=false;
        }
      }
      else {

      }
  }
}

int cabinetToggle(String command) {
    if (command=="unlock") {
        digitalWrite(relay1,HIGH);
        delay(250);
        digitalWrite(relay1,LOW);
        return 1;
    }
    else if (command=="lock") {
        digitalWrite(relay2,HIGH);
        delay(500);
        digitalWrite(relay2,LOW);
        return 0;
    }
    else {
        return -1;
    }
}

Also worth noting, as a temporary solution before deploying the Xenon, we had a Photon running the same code and didn’t have these disconnects. We plugged a Linksys (home) router into the customers network, connected a computer into the Linksys Ethernet port, and connected the Photon to the Linksys wifi. The Photon rarely every dropped connection to the cloud, once a week or less. When we swapped the Linksys router and Photon with a Startech switch and Xenon (with Ethernet), is when the disconnects started. I have also swapped out the Xenon but that didn’t have any changes.

shanevanj · May 25, 2019, 6:56am

I have a similar application to this and prior to firmware 1.2.1.-rc.1, I would get many cloud / network events over a 24 hour period esp. when the USA “woke up” and started the business day (issues were far less on weekend for example) - so my view is that this is less about the customer network and more about the general state of the internet and connectivity paths. I put this

System.on(all_events, handle_all_the_events);

in my setup() and a function to display the system events as they happen using

void handle_all_the_events(system_event_t event, int param)
{
    Serial.printlnf("got event %d with value %d", event, param);
}

and referencing the System Events table here to work out whats going on in the connections.

P.S. I may be wrong but as I understand it the communications to particle cloud are UDP based and this, unlike TCP, does not have handshaking and retry structures built into it - its more of a “fire and forget” method, so if a packet is dropped due to contention or congestion - it is gone…

amillen · May 29, 2019, 6:48pm

Update: We swapped out the switch for a Linksys home router and plugged the Xenon into the router’s Ethernet port and the uplink into the customer’s network. Since then zero disconnects over 24 hours.

Clearly, there is something that the Xenon does not like about their network that the router is “fixing”, but where do I even start? All of their computers and my Raspberry Pi computer work just fine.

Looking for suggestions on where to being to diagnose.

Does Particle have a network requirements list somewhere?

peekay123 · May 29, 2019, 7:30pm

@amillen, which switch are they using?

amillen · May 29, 2019, 8:58pm

@peekay123 the switch we had out there was a StarTech DS81072 8 port desktop switch that was plugged into the customer’s network. The customer’s switch that our switch was plugged into I’m sure was a managed enterprise grade switch, but I don’t know what one because it is locked away in an IT closet.

ninjatill · June 4, 2019, 7:02pm

The addition of a intermediate router is an interesting fix. So consider what that changes:

DHCP for the particle device is now supplied by the local router instead of the Enterprise router. There could be a change in DHCP lease durations andscope options. DNS should pass through unless you set that specifically on the local router. If you didn’t set a static IP on the local router WAN interface, perhaps that local router just handles the DHCP renewal more gracefully.
The local router effectively stops all multicast and broadcast traffic on the Enterprise network from reaching the particle device.

patboul · August 28, 2019, 11:06am

Having an official network requirements document would definitely help troubleshooting. I am currently testing Xenon and have similar problems.

Topic		Replies	Views
Many unresponsive Xenons with Ethernet Troubleshooting xenon	34	1708	July 2, 2019
Xenon Blinking Cyan followed by 3 Orange Troubleshooting xenon	10	981	May 8, 2019
Xenon lost connectivity Troubleshooting	6	876	December 25, 2018
Xenon mesh connectivity issue Troubleshooting xenon	4	862	May 10, 2019
Xenon mesh unstable BLE // NFC	19	2150	November 19, 2018

How to Diagnose Disconnects with Xenon on Ethernet

Related topics