Robust UDP communications


#1

I have eventually tracked down a UDP error that occasionally interrupted UDP communications with a Core/Photon server. I cannot offer a cure but I have identified an additional try/except test at the client end that seems to keep things running.
I’ve been using a Core and more recently a Photon as a UDP server to send monitoring data to a PC or Raspberry Pi.
The Particle devices work very reliably but because they occasionally go offline (as far as UDP is concerned) while they renegotiate WiFi or Cloud communications it is necessary to trap socket timeout errors at the client end. Making sure that a socket is connected and then trapping timeout errors reduces crashes from hourly to once in several days but does not eliminate them.
The additional error that needs to be trapped is socket connection refused which was a surprise as it occurs almost immediately after a successful socket.connect() instruction. It took a while to track this down so it is offered here in case anyone else experiences something similar.

With an additional try/except block UDP has been running for weeks now without intervention and traps a socket refused error every couple of days. I guess that sometimes the Particle takes longer than usual to read the request from the client. Perhaps there is a more elegant way to avoid the socket refused.

# Python UDP client code snippet illustrating additional error trapping needed for robust UDP communication

while True:
    s = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)   
    print(s)
    s.settimeout(3)

# while we have a socket poll for data from Spark every 15 seconds    
    while s:
        theTime = datetime.now()
        #untidy way to start as close as possible to 15.0 secs after last poll
        if ((theTime.second % 15 == 0) and (theTime.microsecond <100000)): #it will always take more than 0.1 secs to process my request
            print (theTime)
            try:
                  s.connect((host, port))        # connect to Spark server
                  s.sendall(b'Pi Ready\0 ')      # client ready for data
            except socket.error:
                print('unable to connect')
                socketErrors +=1
                break

            r='not read anything'
            try:
                r = s.recv(1024)
            except socket.timeout:
                print ("socket timeout")
                timeoutErrors +=1
                break
            except socket.error, msg:      #despite having already trapped socket errors at s.connect we need a second check here

                print ("socket refused: %s" % msg)
                socketErrors +=1
                break

            if r == 0:          # if r is 0 then the sender has closed for good
                print('socket disconnected')
                print(s)
                break

 # should now have received text from server in r
 # do stuff with the data - validate and parse etc.
 # ...
 # ...
    s.close()

The UDP server code is pretty simple - here are the relevant parts:

UDP udp;
unsigned int localPort = 5212;  //reserved for incoming traffic
char UDPinData[64];
char UDPoutData[1024];          

///////////////////////////////////////////
void setup() {

    udp.begin(localPort);
    memset(&UDPoutData[0], 0, sizeof (UDPoutData));
}

///////////////////////////////////////////
void loop() {
    // check whether there has been a request to the server and process it
    packetSize = udp.parsePacket();
    if (packetSize) {
        udp.read(UDPinData, 64);
//generate the data to be sent by the particle server here 
// ....
        udp.beginPacket(udp.remoteIP(), udp.remotePort());
        sprintf(UDPoutData, "put your formatting and data here %d \n", mydata);
        udp.write((unsigned char*)UDPoutData,768); //768 is actual size of data as against buffer size in my case
        udp.endPacket();
        memset(&UDPoutData[0], 0, sizeof (UDPoutData));
    }//finished writing packet
}//end of loop

Sending real-time audio data using particle photon
#2

How long does it take to process a UDP request on the Photon. If your LOOP execution time is longer that about 20 seconds the Photon will disconnect from the Cloud and need to reconnect.

If you call Particle.Process every 20 seconds or less this might keep the Photon connected to the Cloud and solve your problem.


#3

Thanks for the suggestion @jbstcyr The loop execution time is either a few milliseconds or about a second depending on whether a UDP request is received, so that should keep things alive as far as calls to Particle.Process is concerned. It is never as long as 20 seconds. However the Photon occasionally goes from breathing cyan to flashing cyan as it reconnects with the cloud - I don’t know whether it is my router (the lease time on the network connection is weeks), noise at 2.4GHz or something at the cloud end that causes this.

It takes about a second for the Photon to process each UDP request and a new request is sent every 15 seconds.
If there is a request it samples 20 half cycles of UK mains frequency and returns some statistics.
In parallel with this there are interrupts triggered by a rising edge on 3 digital inputs. The interrupts occur at intervals between around 1 sec and several hours. The interrupt routines are very quick (just incrementing a counter) and use properly declared volatile variables.


#4

Perhaps TCP is what you really need here? I’m sure that you are implementing UDP for a reason, but, by the time you get this working bulletproof with the client and server, regardless of hardware iteration, you will pretty much implemented most of the aspects of TCP that take care of these issues.
I hate to be one of “those guys”, but it seems as if you are going way out of your way to make UDP work in this case.


#5

You are probably right. I spent quite a while fiddling with TCP with rather less success than with the UDP but that is my problem. I’ll look out for someone’s successful TCP implementation. In the meantime the UDP hasn’t needed attention since I posted the note despite a couple of power cuts - I just lost a couple of minute’s data but everything continued where it left off after the power came back.