Reliably responding to input

In my testing it seems there’s not a way to respond to input when a network connection is being negotiated. It seems like an unrealistic requirement to expect a flawless connection, resulting in a flakey user experience to someone providing inputs.

  • Am I correct in this limitation?
  • How are people currently getting around this issue?
  • Is this something that’s on the roadmap?

Here’s some test code that uses a hardware interrupt with a switch on D4 to control the D7 LED. Switching back and forth quickly exposes the issue.

The entire cloud connection API is changing soon, so you might like the new flavors better.

The notion that Spark should run setup() and loop() regardless of the cloud connection is captured here:

I am not sure if the notion of interrupt-ability is part of that or not. It would be nice if the cloud connection start-up was non-blocking, but you would then have to poll to see if was done before you used it. Maybe @zachary has some thoughts on this subject.

2 Likes

Thanks again @bko. I’ll dive some more into github and follow along there.

I’m still trying to understand how/if other people are dealing with this.

Seems like if your device requires input, you can’t use the the cloud. There’s always a chance an input came through while the network was being negotiated and that will be lost.

I believe this is the github issue related to the network blocking problems.

Unfortunately it sounds like this issue will not be fixed until a new version of the CC3000 comes out and Spark adopts it.

Someone please prove me wrong.

I don't think this is quite right. If your device requires say temperature monitoring at the rate of once a second to once an hour scale, you are all set with the current architecture.

If your device requires say 20 KS/s audio sampling, you are not going to be able to sample that fast consistently with the cloud connected, but you could buffer up samples gathered with the cloud off and then connect and send them (but there might a lot of them).

One way to think about this is that the cloud via Spark.publish() or Spark.variable() can provide around 64 bytes per second of sustained throughput. If you application requires more than that, I would consider using your own web host to receive the data instead.

You can then decide if the cloud brings other value to your application or is just getting in your way. It is really great that you have that control and choice!

@bko thanks again for sharing your thoughts.

I just ran a test by loading my testing app and unplugging my internet connection to my router. It appears the networking loop never returns control to my loop, just continually cycles trying to connect.

So any input made while the Spark is not connected to the cloud is lost. (Please correct me if I’m wrong.)

If you’re monitoring temperature over an hour you’ll probably be ok. But if there’s any reason you can’t reach the cloud, there’s no fallback. I can’t gracefully notify a user or store data locally for transfer later. It’s essentially a brick until the cloud returns.

And in the case of a user pushing a button and expecting an immediate response, you’re screwed without a perfect connection.

I guess there are some uses where accepting input most-of-the-time is fine, but the ability to fallback and interrupt if the network is having issues seems critical to me and my projects.

@chap @bko We just deployed today the result of our spec’d github issue Controlling the Connection. Documentation forthcoming this week, probably tomorrow.

Wrote and then deleted a bunch of stuff because I misunderstood your issue…

Actually, mostly never mind there—I see you’re already using the workaround spark_disable_cloud.h. You no longer need that as of today. Use SYSTEM_MODE(SEMI_AUTOMATIC) instead. However, I see that’s not what you’re asking about.

OK, switching to more in-depth firmware explanation mode.

When you call Spark.connect(), all you’re doing is setting a flag. That returns immediately.

After your loop finishes, we go handle all the background Cloud stuff in SPARK_WLAN_Loop.

If you need to connect to Wi-Fi, we do so here, which calls WiFi.connect(). This is also really fast because it’s asynchronous, sending some messages to the CC3000 and returning. If the Wi-Fi module is not even on, then wlan_start can take some milliseconds, but basically you won’t notice this.

Now your code goes back to running.

Later, when the CC3000 successfully gets an IP address from the router, we get an asynchronous event, which sets some flags and turns the LED green.

Remember that SPARK_CLOUD_CONNECT flag we set a while back? Now, after your loop runs and we go back to the background stuff, we now actually get past this return statement and because WLAN_DHCP is set, we get to Spark_Connect() defined here.

After some quick initial work to read the server address off the external flash chip, we actually open up the TCP socket to the Spark Cloud here, with the call to connect (TI doxygen). As you can see in the code surrounding the connect call, we’ve implemented a watchdog that will kill the attempt to connect if it takes too long. Over in the core-common-lib repo, you can see this is set to 8 seconds. If you’re building locally, adjust as you see fit.

Assuming the TCP socket got opened successfully (potentially 7.99 seconds later, but obviously usually much faster), then we attempt the secure handshake with the Cloud in Spark_Handshake() defined here.

This is where we drop out to the core-communication-lib here to do some encryption and send a handful of messages back and forth, all of which must complete before we head back to your loop.

If you (and ten other capable people!) want to measure how long each of these steps takes, I’d be curious to hear your results. I suspect that the CC3000’s connect varies a great deal and is usually the longest blocking call. I suspect that the communication library handshake, including the encryption, is much more consistent. It would vary a bit due to internet connection speed, but we’re only sending two messages each direction (plus TCP ACKs of course), for a total of 698 payload bytes + IP and ethernet frames.

And if you have ideas for improvements, we are, as always, all ears. :smile_cat:

I hope this helps you understand all the states during which you could be receiving input, when your loop would be running, and when it will be blocked by other work. Cheers!

3 Likes

@zachary thanks so much for your detailed explanation. I’m looking forward to experimenting with the firmware improvements and looking through the changes.

However, I’m still disappointed that an interrupt has the possibility of being ignored because of a blocking networking call. Just a simple demo of switching an LED with a button becomes as reliable as your cloud connection.

I’m coming to this from a background in writing software for the web, so I have experience with all kinds of reasons a connection might fail. Hell, a naive DNS change could mean hours of downtime. (Perhaps this will excuse some of my ignorance on the electronics and low-level firmware side.)

But it frightens me to think of shipping a physical product where the user experience is so dependent on a flawless network connection.

Having said that, I’m really impressed with everything you guys are doing. Love the fact we can look at Github and see what’s coming down the pipe. The community here is great and the build tools and documentation are incredibly well done.

I’m sure this issue will be resolved at some point, but it confuses me that nobody else seems to be as concerned about it as me. Again, I wonder if I’m missing something…

@zachary - are user interrupts actually disabled at any point in the process you describe?
I can’t see noInterrupts() used anywhere in the firmware but I do see some irq toggling in core-common-lib.

Not that I know of. I think a user interrupt could happen at any time.

However, user interrupts are lower priority than “system” interrupts.