Strange UDP bug

dermotos · February 4, 2014, 3:54am

I’m finding that when using the UDP library, if I send messages continuously, say, once a second in the main loop, it sends them almost indefinitely (until the core restarts, flashes green, flashes cyan etc…). But if I instead attach the send function to a button press, and do it now, and in a minute from now, the messages no longer send, and wont until I reboot the core. The LED is still breathing cyan, and the D7 led flashes to indicate my sending function is executing, but nothing comes over the network.

If this is related to the CFOD then it can be ignored, but if not, is the team aware of this?

I’ll be honest, I’m pretty disappointed after spending $400+ on 10 cores and find that they are totally unreliable/unusable. I know the team and others are working on the CFOD bug, and I appreciate everyones effort, but I would’ve hoped finding a fix for such a critical bug would have been given more priority by the spark team. A $100 credit note is a bit on the cheap side as a bounty to fix basic functionality of your only product.

I know I’m probably coming across as impatient, but I’ve been experiencing this bug for over 6 weeks now, constantly, and combined with the money i’ve spent on these, compared to the value spark is putting on fixing it (about an hours wages for a developer?) I’m kind of feeling a little bit screwed over.

/rant

BDub · February 4, 2014, 6:18am

@dermotos would you please post your code… I’ll try to help out if I can. I understand your frustration with the bugs, and $400 is a lot of money. I don’t know what to really say about the bounty, I can agree that it’s definitely a big task. A bunch of people are pouring days of time into it, testing, coding, searching for bugs. I know the Spark Team is spending a bunch of time and resources internally on the problem… fwiw, I never see the CFOD at home I can run my cores for days… but I do see it pretty regularly at work. Let me see if your code at least works reliably on my core at home.

dermotos · February 4, 2014, 7:09am

Hey @BDub, thanks for the offer. I’ve just put together a simple example which also exhibits the error.

Here’s a simple node.js server to print to the console each time a message is received from the core:

Server code

var dgram = require("dgram");
var udpServer = dgram.createSocket("udp4");
udpServer.on("message", function(msg,rinfo){
	console.log(msg.toString());
});

udpServer.bind(9999);
console.info("Test server listening for UDP packets on port 9999");

If you haven’t used node before, just install it from http://nodejs.org/download/ , save the above to test.js then run node test in the directory of the js file from the terminal

Core code

Change the SERVER_IP #define to whatever address the above server is running on.

#include "application.h"

#define SERVER_IP 10,0,0,5
//------------------------------
#define SERVER_PORT 9999

UDP udpClient;
IPAddress serverAddress(SERVER_IP);

void setup() {
  pinMode(D7,OUTPUT);
  udpClient.begin(SERVER_PORT);
}

unsigned int lastMillis = 0;

void loop() {
    if(millis() > lastMillis + 60000){
        udpClient.beginPacket(serverAddress,SERVER_PORT);
        udpClient.write("tick");
        udpClient.endPacket();
        lastMillis = millis();
        digitalWrite(D7,HIGH);
        delay(20);
        digitalWrite(D7,LOW);
  }
}

You can see that every 60 seconds, “tick” is sent to the server, which subsequently prints it to the screen. If you change 60000 to 1000 (once a second), the ticks will come through steadily until the CFOD or similar happens, which could be up to several hours.
But if it is left at 60000 maybe one tick will come through, and then no more until a hard reset. The core continues to execute code (D7 LED still flashes once minute/second) and it seems connected to the cloud (cyan breathing) but no UDP traffic.

Its like the UDP need to be kept alive with traffic, which obviously makes no sense for a connectionless protocol.

bko · February 4, 2014, 12:31pm

I don’t see anything big, but I do see a few small things that are worth trying.

I noticed that udpClient.write has two signatures, one byte or n-bytes and a length. Your code is not really using either of those. Maybe you could try changing the write to:

updClient.write("tick",4);

I would also change 60000 to 60000UL and declare lastMillis as unsigned long to match millis().

I have UDP code I am working on that hits a NTP server over UDP every few minutes and it stay up for a long time, at least overnight.

BDub · February 4, 2014, 3:22pm

Are you kidding me, if I could marry NODE I would! I had the server code up and running in 5 minutes, including walking down in the basement to my server and forwarding the port. Was planning on trying from work... then left for work. Would have been nice if I looked at the code and added some kind of echo.

Got here, and see the same issue bko already pointed out:

unsigned int lastMillis = 0;

if(millis() > lastMillis + 60000) {
   ...
   lastMillis = millis();
}

millis() returns uint32_t, but unsigned int has a range of 0 to 65,535, so your if() statement should quickly start always being true... however when I test it, it seems to fire off every 60 seconds which is strange. I've accidentally made this same mistake and what usually happens is millis() gets larger than what you are comparing it to and it runs every time. Not sure what's going on here exactly.

Regardless, try bko's suggestion and see if your situation changes

BTW: @Dave the Sparkulator does not the following code:

#define SERVER_IP 25.1.45.123
IPAddress serverAddress(SERVER_IP);

the_user_app.cpp:8:25: error: too many decimal points in number
make: *** [the_user_app.o] Error 1

Dave · February 4, 2014, 3:38pm

Well, that’s because IPAddress has these constructors:

IPAddress(uint8_t first_octet, uint8_t second_octet, uint8_t third_octet, uint8_t fourth_octet);
IPAddress(uint32_t address);
IPAddress(const uint8_t *address);

a string constructor would be really nice though, imho, then you could do “a.b.c.d” instead of storing the octets separately

BDub · February 4, 2014, 3:45pm

@Dave don’t forget:

virtual size_t printTo(Print& p) const;

size_t IPAddress::printTo(Print& p) const
{
	size_t n = 0;
	for (int i =0; i < 3; i++)
	{
		n += p.print(_address[i], DEC);
		n += p.print('.');
	}
	n += p.print(_address[3], DEC);
	return n;
}

There are more as well…

EDIT: never mind!

david_s5 · February 4, 2014, 4:21pm

The original had commas 10,0,0,5… - with my eyes they are getting hard to see

david_s5 · February 4, 2014, 5:20pm

@Dave @zachary @BDub I have built and deployed the following from the spark.io build

#define SERVER_IP 10,10,0,1
//------------------------------
#define SERVER_PORT 9999
#define WAIT_TIME 60000

UDP udpClient;
IPAddress serverAddress(SERVER_IP);

void setup() {
  pinMode(D2,OUTPUT);
  udpClient.begin(SERVER_PORT);
  Serial1.begin(115200);
}

unsigned int lastMillis = (unsigned int) -WAIT_TIME;

void loop() {
    if(millis() > lastMillis + WAIT_TIME){
        udpClient.beginPacket(serverAddress,SERVER_PORT);
        udpClient.write("tick");
        udpClient.endPacket();
        lastMillis = millis();
        Serial1.print(lastMillis);
        Serial1.println(" tick ************");
        digitalWrite(D2,HIGH);
        delay(20);
        digitalWrite(D2,LOW);
  }

and can confirm only one tick is seen

I have also built this with the davids5/spark repos on git hub and can confirm it works as expected.

BDub · February 4, 2014, 5:26pm

Too right, I just typed in my IP like a good ip should be typed. Lol.

So how's that Print constructor supposed to work then? Seems like it's destined for normal IP as a string.

Dave · February 4, 2014, 5:42pm

Hi @david_s5,

This is awesome! Sorry about the delay in merging these fixes in, both the Zach’s have been traveling, and I want to give @zachary a chance to review. I think he will have a chance to review and merge pull requests later this week or next week. I’m really looking forward to getting these fixes merged in.

Thank you @david_s5!

david_s5 · February 4, 2014, 6:28pm

The issues that @dermotos is having can be fixed with the addition of

unsigned long millis(void); to his code

unsigned int lastMillis = 0;
**unsigned long millis(void);** 
void loop() {

millis() is not defined consistently and I wrapped it in extern "C" in in commit 06c2fe21 david_s5/core-firmware.

said with a wisper…listen to the warnings…

david_s5 · February 4, 2014, 6:33pm

it is just good-ol macro subsitution

a comma b comma c comma d (spelled it out so I can see it )

is just 4 bytes - it is more light weight then a string.

BDub · February 4, 2014, 8:25pm

No I mean the Print constructor I posted… I was thinking it sucks in something like “255.255.255.255” and assigns the 255’s to _address[0] - _address[4]; but it’s just printing out the IP address… derp.

dermotos · February 4, 2014, 11:19pm

The unsigned int was just a typo really when making the example app. My actual use case is a button press that triggers a udp packet send. If I didn't press the button for a few minutes, it wouldn't work. However if I had keep-alive style packets being sent every few seconds, then the button press would send its packet successfully whenever pressed.

I've tried that but it doesn't make any difference.

I've pulled the latest from git (the core-firmware repo) and tried again but the issue remains.

david_s5 · February 4, 2014, 11:21pm

Also the inactivity timeout is set at 60 Seconds. This will result in the socket being closed.

So set you time shy of the inactivity timeout.

@Dave If it is not in there, we might want to add the the network timeout to the network object’s API

The spark/master is defaulted to 60 sec, davids5/master is 32 Sec

david_s5 · February 4, 2014, 11:24pm

I would say the UDP object is going to need some more work.

I am not familiar with the typical arduio use case for udp. So I do not know what behavior is expected by the arduio community.

If you do, and describe it to me i can mod the UDP object to do what is expected.

Just keep in mind that in my repos sendTo will return the actual sent length. if it is -1 you will need to do a stop/begin again. I believe that the original TI code and thus spar/core-firmware repo
will not return the -1.

dermotos · February 4, 2014, 11:49pm

If set at 60 seconds, I receive one "tick" and no more.
When set at 55 seconds, I receive two "tick" messages, then radio silence
When set at 30 seconds, it seems to run as expected indefinitely (or at least for the last 20 minutes)

david_s5 · February 4, 2014, 11:51pm

check the call Set_NetApp_Timeout in spark_wan.cpp what is the value of aucInactivity?

the other thing would be

in loop()
…
udpClient.begin(SERVER_PORT);
udpClient.beginPacket(serverAddress,SERVER_PORT);
udpClient.write(“tick”);
udpClient.endPacket();
lastMillis = millis();
digitalWrite(D2,HIGH);
delay(20);
digitalWrite(D2,LOW);
udpClient.stop();

dermotos · February 5, 2014, 12:31am

unsigned long aucInactivity = 60;

I've tried this and it seems to work, but I would imagine that means I cannot receive messages over UDP if I stop it? I'll have to run a test later tonight (11am/Sydney here).

Interestingly, if I remove the udpClient.stop() call, and just called udpClient.begin(SERVER_PORT) before each send, using a 60 second delay, it seems to be working as expected.

void loop() {
    if(millis() > lastMillis + 60000){
        udpClient.begin(SERVER_PORT);
        udpClient.beginPacket(serverAddress,SERVER_PORT);
        udpClient.write("tick");
        udpClient.endPacket();
        lastMillis = millis();
         digitalWrite(D7,HIGH);
        delay(20);
        digitalWrite(D7,LOW);
        //udpClient.stop();
  }
}

Topic		Replies	Views
Simple UDP program breaks the core Troubleshooting	89	15034	October 9, 2014
Bug bounty: OTA flash reliability issue	68	6976	July 16, 2014
Spark Core no longer communicating with the cloud Troubleshooting	9	1308	April 7, 2015
Unreliable UDP: crashes/freezes when sending at high frequency Troubleshooting	28	10732	June 5, 2014
Problem with connection Troubleshooting	15	2386	August 29, 2014

Strange UDP bug

Related topics