Unreliable UDP: crashes/freezes when sending at high frequency

@psb777 I’ve read your discussions with the way UDP is currently implemented on the spark forum in a number of other topics. I don’t disagree with your grievances but I’d like to stay on topic and deal with one issue at a time.

I’ve given a specific sample code that is broken in that it will cause the core to freeze or reset. The first order of business if to try and understand the root cause of that.

In addition there is a specific fairly simple goal I’d like to achieve. Using what we have available to us currently can I send UDP packets back and forth at a fairly high frequency, say every 20 ms or less in a stable manner?

I think discussing the design issues with the UDP implementation again will be somewhat counterproductive and heavily derail this topic.

Also with respect to your debating the use of the term stream, it just confuses the issue by arguing over semantics. Creating a stream of data is clearly one of the most common use cases UDP, yes UDP isn’t itself a streaming protocol. It’s perfect correct to say you’re creating a stream over/on top of UDP and then discuss said stream/streaming. Yes that can also be confusing when discussing some of the finer details, but for our purposes here I think everyone will follow along fine whichever you choose to use.

Yes lets stay on topic. @sparks have you tried with disabling the cloud? With the reproducible CFOD with udp https://community.spark.io/t/udp-cfod-reproducible/4791/last problem I have on another topic, I did a Spark.disconnect and udp stopped causing the freezes etc. Can you alter your program to see if that helps at all?

Thanks!

@bko thanks for your efforts. I’ve also been able to sometimes run for longish stretches with no delays, but I’m looking to run for at least 10 minutes continuously in a reliable manner right now and I haven’t seen anything like that without delays. From my understanding of your post, you were only able to run for 52000 packets so probably 1-2 minutes?

Currently I’ve gotten some improvements on my end. The main thing is that I’m running on a dedicated router with no internet access. There’s just the 4 sparks and my computer all with static IPs in the LAN. This seems to have dramatically improved the reliability of the sparks. This doesn’t narrow it down much, but it suggests that there may be something going on with other network traffic or network load that affects the sparks.

The stripped down version of my code looks like this. Basically I consider the delays to be black magic and not a reliable way of dealing with the issue. I’m just trying to keep all the calls spaced out in the hopes that it works.

(disclaimer I deleted a bunch of stuff so maybe this code doesn’t compile)

#include "application.h"
#include "spark_disable_cloud.h"

#define numBytes 36

bool online = false;
bool live = false;

UDP Udp;

byte remoteIP[] = {192, 168, 1, 101};
unsigned int remotePort = 9001;

unsigned int localPort = 9100;

int count = 0;

byte foo[36];
byte bar[6];

void setup() {

}

void loop() {
	if (!online) {
		IPAddress addr = Network.localIP();

		if (addr[0] != 0 || addr[1] != 0 || addr[2] != 0 || addr[3] != 0) {
			Udp.begin(localPort);
			online = true;
		}

		delay(100);
	} else {
		if (count%3 == 2) {
			Udp.parsePacket();

			if (Udp.available() >= numBytes){
				while (Udp.available() > numBytes) {
					Udp.read(); // Burn off extra stuff we might have received
				}

				for (int j = 0; j < numBytes; j++){
					foo[j] = Udp.read();
				}

				// do stuff with foo[]
			}

			delay(5);
		} else {
			// generate new bar[]

			Udp.beginPacket(remoteIP, remotePort);
			Udp.write(bar, 6);			
			Udp.endPacket();

			if (!live) {
				delay(1000);
				live = true;
			} else {
				delay(10);
			}
		}

		count++;
	}
}

I am sorry if you think I have sidetracked the topic, but I addressed an issue you discovered: endPacket() does nothing.

Also some responses to your initial posting did seem to imply that no UDP datagrams should go missing at all, and that this too was a bug. My initial intention - perhaps I got carried away - was to try to close that down so that the real issue you identify can be concentrated on. I must however say that what UDP is and is not supposed to do is widely misunderstood here, even if not by you, and I am sure you must understand you cannot rely upon achieving an entirely stable never miss a beat UDP “streaming” you require, even at the quite slow 20ms rate you intend.

Of course, the Spark shouldn’t barf or stop, and that is the primary issue. Let’s not get sidetracked by the UDP limitations, by definition an unreliable protocol, or by the topic title :slight_smile:

@SomeFixItDude as I mentioned above I’m mostly running with #include "spark_disable_cloud.h" which I believe to be effectively the same as Spark.disconnect(). I haven’t actually tried with Spark.disconnect() specifically.

As per my post below, that in combination with running on a dedicated LAN with no WAN access has helped but not eliminated the problem completely.

If you think there’s a difference I can try with Spark.disconnect()

For my project I will need to sometimes run in situations with no internet access so spark_disable_cloud.h is a must.

@psb777 you’re right I did bring that up and in fact in the context of the design issues you’ve mentioned I see how it’s very relevant. Also granted my topic title is a little vague (let me see if can edit that?)

To clarify re “never miss a beat”, I don’t require all the packets to arrive. I’m sending along near real-time accel/gryo data. It’s fine if some packets are lost, in fact that’s the reason I’m using UDP, I don’t care if some packets are lost. What I cannot live with is the core freezing/crashing. Ideally I would go much faster than 20ms, but that’s the minimum I can live with more or less.

1 Like

Now with a new and improved topic title!

2 Likes

@sparks I compiled and ran your shortened version (it compiles :smile: ) with out delaying and did a debug build. I don't understand the debug build output but I do see errors in the output. Maybe it will make sense to you or someone.

0000016135:<DEBUG> set_socket_active_status (837):Sd=0, Status SOCKET_STATUS_ACTIVE
0000016143:<DEBUG> virtual uint8_t UDP::begin(uint16_t) (55):socket=0
0000016151:<DEBUG> virtual uint8_t UDP::begin(uint16_t) (67):bind socket=0
0000016159:<DEBUG> virtual uint8_t UDP::begin(uint16_t) (69):socket=0 bound=1
0000016267:<DEBUG> virtual size_t UDP::write(const uint8_t*, size_t) (148):sendto(buffer=20000e6c, size=6)=6
00000000016279:<ERROR> hci_unsolicited_event_handler (814):isEvent w/Opcode ==0  0x04 0x0f 0x10 0x09 0x00 0x00 0x00 0x00 0x00 0xc6
016278:<DEBUG> virtual size_t UDP::write(const uint8_t*, size_t) (148):sendto(buffer=20000e6c, size=6)=6
00000160000016301:<ERROR> hci_unsolicited_event_handler (814):isEvent w/Opcode ==0  0x04 0x0f 0x10 0x09 0x00 0x00 0x00 0x00 0x00 0xc6
x00 0x00 0xc6
016278:<DEBU (814):isEvent w/Opcode ==0  0x04 0x0f 0x10 0x09 0x00 0x00 0x00 0x00 0x00 0xc6
016312:<DEBUG> virtual size_t UDP::write(const uint8_t*, size_t) (148):sendto(buffer=20000e6c, size=6)=6
00000160000016335:<ERROR> hci_unsolicited_event_handler (814):isEvent w/Opcode ==0  0x04 0x0f 0x10 0x09 0x00 0x00 0x00 0x00 0x00 0xc6
335:<DEBUG> virtual size_t UDP::write(const uint8_t*, size_t) (148):sendto(buffer=20000e6c, size=6)=6
0000031533:<ERROR> HostFlowControlConsumeBuff (144):Timeout waiting on on buffers now 31533 start 11533 elapsed 20000 cc3000__event_timeout_ms 20000
0000031548:<DEBUG> virtual size_t UDP::write(const uint8_t*, size_t) (148):sendto(buffer=20000e6c, size=6)=-1
0000051569:<ERROR> HostFlowControlConsumeBuff (144):Timeout waiting on on buffers now 51569 start 31569 elapsed 20000 cc3000__event_timeout_ms 20000
0000051584:<DEBUG> virtual size_t UDP::write(const uint8_t*, size_t) (148):sendto(buffer=20000e6c, size=6)=-1
0000071596:<ERROR> HostFlowControlConsumeBuff (144):Timeout waiting on on buffers now 71596 start 51596 elapsed 20000 cc3000__event_timeout_ms 20000
0000071611:<DEBUG> virtual size_t UDP::write(const uint8_t*, size_t) (148):sendto(buffer=20000e6c, size=6)=-1

To note when I ran it seemed totally frozen but then these HostFlowControlConsumeBuff timeout started to periodically spit out.

Also you said you patched CC3000, to what version did you patch to? The one here GitHub - particle-iot-archived/cc3000-patch-programmer: TI Patch Programmer, ported to STM32 brings you up to 1.24. However 1.28 is available on TI I can layout a couple steps if you want to patch the patcher to get the current one. Hope any of this helps!

Good Luck!

1 Like

So you are getting an unsolicited and un-handled response from the TI CC3000 that the driver didn't know what to do with. Then it looks like you ran out of buffers in the CC3000.

Maybe @david_s5 could add some words of wisdom here!

2 Likes