Hi All, bear with me, this is a long one-
I have a couple of Core and Photon devices driving custom lighting fixtures I've made for my home.
For a long time, I've simply made multiple Particle API calls through either an Alexa skill or web-based controller UI, and it works pretty well. Color/mode changes happen at roughly the same time, and the code isn't insanely messy, though it's not pretty either.
I wanted to be able to do some more precision orchestration/transitions, as well as be able to demo this stuff in other locations, so I thought I'd build a "hub" of sorts that served as a single web connection, and controlled the rest of the devices locally.
So, I built a controller called "MCP" which sends UDP broadcast packets to a certain port, and then got all my existing devices (and some simple testers) to listen to that port.
Here's what works:
Photon receives web function call, sends UDP packet to 192.168.1.255:11111 (Broadcast)
UDP broadcast packet is observed in Wireshark.
Cores/Photons listen to packet, and obey correctly....for a while.
Here's what doesn't:
Somewhere between 2-12 hours after all devices are freshly reset and reconnected to the network (seriously, I've tried to narrow it down but it sometimes worked fine for 10+ hours) I start losing listeners, and this gets worse over time.
Example:
3 hours after resets: 4/4 devices obeying the broadcast packets, instantaneous switching, exactly what I want!
3-6 hours after resets: 3/4 devices obeying - the 1 that has dropped off still responds to direct web functions, hooks, subscribe events, and code flashes, so it's running just fine, it has just stopped listening to UDP
6-12 hours after resets: 2/4 devices obeying - same as the last one, the 2 devices that no longer follow the leader are working just fine in every other respect, they just stop hearing the UDP
12-14 hours after resets: 1/4 devices obeying.
I've not tested to see if the final one falls, because the amount of time involved, and I keep wanting to try fixes.
Some of the fixes I have tried:
Making sure the Controller and the Devices are bound to different UDP ports. This extended the dropoff from 15 minutes to 10+ hours.
"Kicking" the UDP server every time a command packet is received - ie, running a function that calls Udp.stop(), waits a second, then calls Udp.begin(), thus restarting the UDP listener. (adapted from UDP listening stop after some time - #2 by dermotos)
This last "kicking" part actually works quite well - as long as I am sending commands frequently, everything works perfectly for as long as I have tried it. Similarly, if one listener drops off, and I "kick" that particular device via an Event in the Console (no reset, no flashing, just have it run stop and begin) it rejoins the flock perfectly after that. So it's definitely something with prolonged listening to UDP, and can be remedied by destroying and restarting the UDP listener, without any harm to the other device functions. The kick process even seems pretty fast.
The hiccup comes when I go to bed, and then work, and no commands are sent for a while. It's annoying to have to kick all the devices manually in order to restore desired behavior.
Now, I could set up a timed watchdog that just periodically runs the Kick function, and I am actually very confident this would work, though it could create scenarios where it's kicking right as I am trying to give it a command, which I am not wild about, and it also just seems very sloppy, like a bucket on the floor instead of fixing the leak.
My hope is that one of you can tell me why UDP listeners stop listening after some amount of time, and I can fix it in a more elegant way. I've tried to do as much troubleshooting on my own to narrow the problem scope. I don't know if listening to UDP chews memory in some way that the Stop/Begin frees up, and if there is any way to stop that from happening. I notice that the UDP.flush() method in the docs says it currently does nothing - is this the issue? Would flushing clear the memory in a less aggressive way than stop/begin?
Thanks in advance!
EDIT: Forgot to mention, these devices are on 0.6.2. If this is all fixed up in 0.7.x release candidate, I can try that.
Here's some of my code in case it helps:
MCP code:
RGBClass RGBLED; char szArgs[13]; long lastChange; int r = 0; int g = 0; int b = 0; unsigned int localPort = 8888; UDP Udp; IPAddress remoteIP(192,168,1,255); int remotePort = 11111;
void setup() { Udp.begin(localPort); RGBLED.control(true); RGBLED.color(255,255,255); lastChange = millis(); Particle.function("setcolor",setRGB); }
int setRGB(String args) { //communicate to devices that it's a color args.toCharArray(szArgs, 12); sscanf(szArgs, "%d,%d,%d", &r, &g, &b); RGBLED.color(r,g,b); lastChange = millis(); //broadcast a command Particle.publish("MCP-commands", NULL, 60, PRIVATE); Udp.sendPacket((const char *)szArgs, sizeof(szArgs), remoteIP, remotePort); return 200; }
void loop() { long elapsed = lastChange + 4000; if(millis() > elapsed) { RGBLED.color(255,255,255); } }
Listener code:
RGBClass RGBLED; char szArgs[13]; long lastChange; int r = 0; int g = 0; int b = 0; unsigned int localPort = 11111; UDP Udp; IPAddress remoteIP(192,168,1,255);
void setup() { Particle.subscribe("udpKick", udpKick); Udp.begin(localPort); RGBLED.control(true); RGBLED.color(255,255,255); lastChange = millis(); }
void loop() { // Check if data has been received checkNetwork(); long elapsed = lastChange + 4000; if(millis() > elapsed) { RGBLED.color(255,255,255); } }
void checkNetwork() { int size = Udp.parsePacket(); if (size > 0) { char data[size]; Udp.read(data,size); Particle.publish("Sark-obeys", String(data), 60, PRIVATE); sscanf(data, "%d,%d,%d", &r, &g, &b); RGBLED.color(r,g,b); lastChange = millis(); Udp.flush(); kickUDP(); } delay(1); }
void udpKick(const char *event, const char *data) { kickUDP(); }
void kickUDP() { Udp.stop(); delay(100); Udp.begin(localPort); Particle.publish("listener-kicked", String("KICK IT"), 60, PRIVATE); }