There are severe problems in code of the Spark UDP class which, if fixed, would make Spark UDP much better. Not all the issues are in the CC3000. That the UDP packet must be assembled by UDP.write() and the C sendto() not called until UDP.endPacket() is one of those. Indeed, this one fix will improve things a whole lot.
As UDP datagrams have a maximum length (all that is guaranteed by the protocol spec is 576 bytes) it would be entirely valid AND HIGHLY DESIRABLE for UDP.write() to return an error indicator if this length is exceeded.
But I actually see no need for UDP.startPacket() and UDP.endPacket(). Better if these were discarded and no false impression gained by their availability: UDP.write() is all that is required. All that UDP.write() should do is call sendto(). No need then for the UDP class to do any buffering.
ok back on topic! Is there any way to debug the cloud interactions with the core? Is the cloud code available yet? Lets get the cloud working when using udp
With deep update installed. This simple program (almost the above). User loop stops running in under a minute. Two minutes later rapid CFOD. Another minute goes by flashing red, back to CFOD. Rinse and repeat.
Side note, there is nothing listening at 10.0.0.2.
Please note after you flash this code you canāt flash OTA it rarely starts to take the flash and always never finishes. I am updating the title of the problem to encompass more info. Hope someone can help debug this problem.
I feel your pain but deep update really didnāt change the behavior of this code. It failed before deep update and it fails after in the same way for me.
I have a slightly modified version of this test that looks at the return values from beginPacket() and write() so it does not CFOD, but does have failures where write() returns -1 indicating failure. Over in the CFOD thread I pointed out that not receiving is what is triggering the bug. I changed the test to send NTP packets to a host that responds and it never fails. I adjusted for packet size since NTP is 48 bytes. I donāt think it is a race condition per se since I have slowed the rate down to one packet every 2 sec.
I think there is something in the UDP TX host driver that depends on the UDP RX side doing something.
I believe this code runs out of sockets on the TI CC3000 since it always fails after exactly 6 packets are sent. From that point forward, it will send one packet and then fail (catching the failure with return codes) and repeat from there every other packet failing, since my code does UDP.stop() on getting -1 from UDP.write() releasing the socket.
I will keep looking at it but we may need to get the Spark team involved.
How do we do that? I don't think this will go anywhere otherwise.
Also I am guessing "most" users are not using UDP or TCP to transmit data at any high rates? Have you seen or have example code of anything like that? And when you say-
Are you failing at a 2 second pace or does your code continue to execute? B/C one packet every 2 seconds is really slow. This makes me wonder if anyone is able to pull off transmitting data from the core not using the publish or other cloud functions. UDP as you know does not require anyone to be on the other end listening. Being only able to transmit data through the cloud is a serious problem and keeps people from interfacing with other 3rd parties.
My test code is very similar to yours but instead of the two delay(200); calls to flash the LED I have delay(2000); just so I can see what is happening.
void loop() {
int pin = digitalRead(D0); //allow OTA flash by pulling D0 up and resetting
if (HIGH==pin) {
for(;;) {
SPARK_WLAN_Loop();
}
}
int beginReady = Udp.beginPacket(IPAddress(10,0,0,100), 9000);
int wrBytes = 0;
if (beginReady!=0) {
wrBytes = Udp.write(TxMsg, TXSIZE);
Udp.endPacket();
}
if (wrBytes==-1) {
Udp.stop();
Udp.begin(9000);
}
digitalWrite(D7, HIGH);
delay(200);
digitalWrite(D7, LOW);
//dump debug data here--I am using a serial display
delay(2000);
}
I have similar code that uses an NTP server (one from the pool) over UDP that works great and the only difference I can find is that is also receives UDP packets.
Here is another good experiment to tryāI used the subnet broadcast address (10.0.255 in my case) so I would get my own packets back. I then added this to the loop() so I would flush any received bytes.
if (Udp.parsePacket()) {
Udp.flush();
}
I have run over 300 packets through this code so far without any problems.
Thanks for examples @bko. Iāll give them a try. How do we get the spark team to actually look at the problem? This is a bug, it can be re-produced. The implementation somewhere (host driver?) is broken. I canāt be the only person sending data not via the cloud faster than 2 second intervals
Iām happy to create an issue as well if thatās easier, having a solid test case and an open issue makes it easy for us to test for this and fix the problem.
@Dave thank you! That would be great if you could make the issue. Thanks for everyoneās efforts as well. Hope to see the spark core a great stable development tool.
Depends how busy we are; right now weāre very busy, trying to hire more people so that we can get to these things faster I think that @towynlin is bringing in some big guns to dig in on UDP issues
It looks to me like the latest firmware fixes this issueāmy test program that failed after 6 iterations in the past, has run over 50 iterations and it going strong!
That is great news @bko ! Hopefully Iāll have some time to flash the test app here and give it a go as well. I am flashing my other program that dies relatively fast as well that sends udp packets roughly once every 60 - 90 seconds right now to see how it does. Iāll update with the findings.
I've been extremely busy with #dayjob the last little while, but I'm itching to get back to work on that driver. It's looking very promising, I'll keep you all posted!
@bko my original program still dies. I will tweak it to use the return values as you suggested and see how it does. I also tested another program I have that sends to an address every 60 - 90 seconds. It survived almost 24 hours but has died as well.
So I donāt think this was resolved at all. It appears to be in the exact state it was.