Simple UDP program breaks the core

There are severe problems in code of the Spark UDP class which, if fixed, would make Spark UDP much better. Not all the issues are in the CC3000. That the UDP packet must be assembled by UDP.write() and the C sendto() not called until UDP.endPacket() is one of those. Indeed, this one fix will improve things a whole lot.

As UDP datagrams have a maximum length (all that is guaranteed by the protocol spec is 576 bytes) it would be entirely valid AND HIGHLY DESIRABLE for UDP.write() to return an error indicator if this length is exceeded.

But I actually see no need for UDP.startPacket() and UDP.endPacket(). Better if these were discarded and no false impression gained by their availability: UDP.write() is all that is required. All that UDP.write() should do is call sendto(). No need then for the UDP class to do any buffering.

I’m not disputing that, but we have no direct control over the internal buffer reuse scheme in the CC3000.

AFAIK, fixing UDP in the :spark: firmware is on the list, the problems are well documented and understood.

ok back on topic! Is there any way to debug the cloud interactions with the core? Is the cloud code available yet? Lets get the cloud working when using udp :smile:

-Bri

1 Like

Heya @SomeFixItDude,

Local cloud is coming this summer, and I’d love any help or feedback :slight_smile:

The local cloud beta list thread is here: https://community.spark.io/t/where-is-the-source-code-for-the-cloud/1381

Thanks,
David

2 Likes

With deep update installed. This simple program (almost the above). User loop stops running in under a minute. Two minutes later rapid CFOD. Another minute goes by flashing red, back to CFOD. Rinse and repeat.

UDP Udp;

unsigned char TxMsg[12] = { 1, 254, 1, 254, 1, 254, 43, 212, 71, 184, 3, 252 };
unsigned char recbuf[12];


void setup() {
    Udp.begin(9000);
    pinMode(D7, OUTPUT);
    digitalWrite(D7, LOW);
}

void loop() {
    int32_t packetLen = Udp.parsePacket();
    
    //Dump any packets RX
    while (packetLen > 0) {
        Udp.read(recbuf, packetLen >= 12 ? 12 : packetLen);
        packetLen = Udp.parsePacket();
    }
    
    Udp.beginPacket(IPAddress(10,0,0,2), 9000);
    Udp.write(TxMsg, 12);
    Udp.endPacket();

    digitalWrite(D7, HIGH);
    delay(200);
    digitalWrite(D7, LOW);
    delay(200);
}

Side note, there is nothing listening at 10.0.0.2.

Please note after you flash this code you can’t flash OTA it rarely starts to take the flash and always never finishes. I am updating the title of the problem to encompass more info. Hope someone can help debug this problem.

Thanks

2 Likes

Hi @SomeFixItDude

I feel your pain but deep update really didn’t change the behavior of this code. It failed before deep update and it fails after in the same way for me.

I have a slightly modified version of this test that looks at the return values from beginPacket() and write() so it does not CFOD, but does have failures where write() returns -1 indicating failure. Over in the CFOD thread I pointed out that not receiving is what is triggering the bug. I changed the test to send NTP packets to a host that responds and it never fails. I adjusted for packet size since NTP is 48 bytes. I don’t think it is a race condition per se since I have slowed the rate down to one packet every 2 sec.

I think there is something in the UDP TX host driver that depends on the UDP RX side doing something.

I believe this code runs out of sockets on the TI CC3000 since it always fails after exactly 6 packets are sent. From that point forward, it will send one packet and then fail (catching the failure with return codes) and repeat from there every other packet failing, since my code does UDP.stop() on getting -1 from UDP.write() releasing the socket.

I will keep looking at it but we may need to get the Spark team involved.

Thanks @bko

How do we do that? I don't think this will go anywhere otherwise.

Also I am guessing "most" users are not using UDP or TCP to transmit data at any high rates? Have you seen or have example code of anything like that? And when you say-

Are you failing at a 2 second pace or does your code continue to execute? B/C one packet every 2 seconds is really slow. This makes me wonder if anyone is able to pull off transmitting data from the core not using the publish or other cloud functions. UDP as you know does not require anyone to be on the other end listening. Being only able to transmit data through the cloud is a serious problem and keeps people from interfacing with other 3rd parties.

Thanks again

My test code is very similar to yours but instead of the two delay(200); calls to flash the LED I have delay(2000); just so I can see what is happening.

void loop() {
    int pin = digitalRead(D0);  //allow OTA flash by pulling D0 up and resetting
    if (HIGH==pin) {
        for(;;) {
        SPARK_WLAN_Loop();
        }
    }

    int beginReady = Udp.beginPacket(IPAddress(10,0,0,100), 9000);
    int wrBytes = 0;
    if (beginReady!=0) {
        wrBytes = Udp.write(TxMsg, TXSIZE);
        Udp.endPacket();
    }
    if (wrBytes==-1) {
        Udp.stop();
        Udp.begin(9000);
    }

    digitalWrite(D7, HIGH);
    delay(200);
    digitalWrite(D7, LOW);
//dump debug data here--I am using a serial display
    delay(2000);
}

I have similar code that uses an NTP server (one from the pool) over UDP that works great and the only difference I can find is that is also receives UDP packets.

Here is another good experiment to try–I used the subnet broadcast address (10.0.255 in my case) so I would get my own packets back. I then added this to the loop() so I would flush any received bytes.

    if (Udp.parsePacket()) {
        Udp.flush();
    }

I have run over 300 packets through this code so far without any problems.

Thanks for examples @bko. I’ll give them a try. How do we get the spark team to actually look at the problem? This is a bug, it can be re-produced. The implementation somewhere (host driver?) is broken. I can’t be the only person sending data not via the cloud faster than 2 second intervals :stuck_out_tongue:

Thanks

@SomeFixItDude would you mind creating an issue on Github? That’s the best way to line this up for the engineering team.

1 Like

Heya @SomeFixItDude,

I’m happy to create an issue as well if that’s easier, having a solid test case and an open issue makes it easy for us to test for this and fix the problem.

Thanks!
David

@Dave thank you! That would be great if you could make the issue. Thanks for everyone’s efforts as well. Hope to see the spark core a great stable development tool.

Created a starting issue for this here: https://github.com/spark/core-firmware/issues/240

Thanks,
David

1 Like

How long do issues take to get looked at?

Depends how busy we are; right now we’re very busy, trying to hire more people so that we can get to these things faster :slight_smile: I think that @towynlin is bringing in some big guns to dig in on UDP issues

1 Like

It looks to me like the latest firmware fixes this issue–my test program that failed after 6 iterations in the past, has run over 50 iterations and it going strong!

Nice job Spark Team!

That is great news @bko ! Hopefully I’ll have some time to flash the test app here and give it a go as well. I am flashing my other program that dies relatively fast as well that sends udp packets roughly once every 60 - 90 seconds right now to see how it does. I’ll update with the findings.

Thanks spark team and community!

@wtfuzz 's fix made it in, yay! here: https://community.spark.io/t/bug-bounty-kill-the-cyan-flash-of-death/1322/509?u=somefixitdude I am sure other users have contributed as well that is just one I had my eye on.

Also really psyched about the driver stuff @wtfuzz has been mentioning. https://community.spark.io/t/bug-bounty-kill-the-cyan-flash-of-death/1322/512?u=somefixitdude hope to see something come of that as well.

Thanks,
Brian

I've been extremely busy with #dayjob the last little while, but I'm itching to get back to work on that driver. It's looking very promising, I'll keep you all posted!

@bko my original program still dies. I will tweak it to use the return values as you suggested and see how it does. I also tested another program I have that sends to an address every 60 - 90 seconds. It survived almost 24 hours but has died as well.

So I don’t think this was resolved at all. It appears to be in the exact state it was.

I’ll add the -1 test and see how it goes.

1 Like