Sending UDP multicast fails after several hours

I have a simple app for my Photon which reads a bunch of inputs and once a second sends a UDP multicast. I have left it running overnight and it works fine. But I got home tonight, after it had been running for about 24 hours, and the calls to UDP.sendPacket were failing with a status of -26. (Serial output was still working fine, which is how I know this).

For my particular app (model train control) I guess I can just reset when this happens. But it would be nice to avoid it. Anyone know why this happens or a gentler way to cure it?

Thanks…

I wonder what 26 references to. My best guess is - https://github.com/spark/firmware/blob/f924cb9bed07913579f4742f6b46674a7eac8c61/hal/src/photon/socket_hal.cpp#L716

Seems like something has gone wrong with the socket? Not too sure… Wondering what’s a good fix…

When sendPacket fails, why not close the socket using udp.stop() and then re-open it as if you are starting up again?

This can happen for a variety of reasons, such as, your router decides to give you a new IP address via DHCP, etc. You have to program defensively.

1 Like

@bko i thought of suggesting that but thought that wouldn’t work since UDP is connectionless… sounds like it’s worth a try!

@bko - that’s what I’ll try next. And if that doesn’t fix it rapidly, then the nuclear option of resetting. Would be nice to know why it happens though. Presumably there is some kind of resource leak somewhere…

Hi @harper493

It could be resources on the Photon or it could be a router issue as I alluded to above. You cannot depend on sockets staying open forever and your program is the only mechanism by which a closed socket gets reopened.

It’s not a router issue - the DHCP address hasn’t changed in weeks (I checked). So it must be some kind of resource leak in the firmware I guess. The same thing happened last night, so it isn’t random. I’ll tweak the code tonight and see if it works.

Hi @harper493

DHCP changing was just an example. Your router can force your socket to close in a number of ways. Most likely it is something in your code, but it could be in the system firmware.

If you want more help, post your code and we will look at it for you!

Happy for you to look at it. Stripped down to the essentials, here it is. There are no calls to malloc or new in any of the code I’ve suppressed, nor anything else that consumes resources (just reads some of the pins).

TBH I don’t see how the router can do anything to affect a UDP socket. Even if I turn it off, the multicast gets transmitted, but nobody is listening (well, actually the other hosts on the same WiFi network can be). Unless it’s something in the cloud code which is burning resources somewhere. I haven’t tried turning that off.


/*
 *send_message - send a UDP multicast message, and send it also to the serial port
 */

void send_message(char *buffer, int sz)
{
    int status = udp.sendPacket(buffer, sz, remoteIP, udp_port);
    Serial.printlnf("status %5d %5d msg %s", status, errno, buffer);    
}

/*
 * make_status_message - create a status message in the given buffer
 *
 * Format is:
 *
 * unit_id message_type(=1) sequence_no time power-voltage pin-status
 */

int make_status_message(char *buffer, int bufsz)
{
    bool dummy;
    int result = snprintf(buffer, bufsz, "%d %d %d %d %d %04x",
	     my_id, MSG_SENSOR_STATUS, sequence, millis(), get_voltage(), detector::get_all());
	++sequence;
    return result;
}

/*
 * send_status - send a UDP message with the detector status
 */
 
bool status_repeat_pending = false;
int32_t last_send_time = 0;

void read_and_send_status()
{
    bool changed;
    detector::get_all(changed);
    if (changed || status_repeat_pending || millis() - last_send_time > MESSAGE_INTERVAL) {
	    status_repeat_pending = changed;
	    int sz = make_status_message(buffer, buffer_size);
	    send_message(buffer, sz);
	    last_send_time = millis();
	    the_led.blink();
    }
}

/*
 * setup code
 */

int detector_list[] = { D0, D1, D2, D3, D4, D5, D6, A2, A3, A4, A5, A6, -1 };

void setup() {
    pinMode(voltage_pin, INPUT);
    udp.begin(udp_port);
    Serial.begin(9600);
    //Serial.println("IP address: ", WiFi.localIP());
    detector::setup(detector_list);
    get_id();
}

/*
 * main loop
 */

void loop() {

    detector::poll_all();

    const size_t bufferSize = 1024;
    char buf[bufferSize];

    read_and_send_status();
    the_led.action();
    delay(loop_interval);
}

Well, I modified the code the reset the UDP connection (see below) after an error. This does indeed hang in there - after 15 hours of operation, the restart counter is currently running at 570. I suspect some flakiness in my wifi router though, although I’m not sure why this would affect UDP multicast. I notice that when the counter increases, the Phton goes through a brief period of flashing green.

void send_message(char *buffer, int sz)
{
int status = udp.sendPacket(buffer, sz, remoteIP, udp_port);
if (status<0) {
udp.stop();
udp.begin(udp_port);
++restarts;
}
}

I’m having the same problem, UDP stops working after a few hours. I think it’s flaky wifi in my case.

This looks like a good solution for resetting the socket when sending a packet fails, but what about on the receiving end? My code only sends a UDP message every once in a while, but it needs to be listening for UDP all the time, so when it breaks it won’t be reset (and thus not listening) until the next time it tries to send a packet. Is there a way to check at the beginning of the loop if the UDP socket is working without actually sending a packet? Seems needless to send hundreds of packets per second just to see if the socket is working.

Guys - see my comments here

I was rebuilding my UDP socket after a reconnect and this was causing problems with my ability to reconnect, I put a short timer after my reconnect to delay before restarting the socket and it has solved any problem I've been having.

May not affect you but drawing your attention to it in case!

1 Like

Two obvious approaches: (1) if you know you’ll be receiving a message say every second, reset after you’ve missed two of them, (2) transmit something. Hundreds of packets/sec sounds a lot, do you really have real-time requirements that strict? If so you don’t have much choice. At least, based on what little I know about the API. Maybe there’s a call you can do to check locally whether the UDP object is still connected.