User code sometimes blocked by cc3000 in manual mode?

I try in good faith to read between your lines: Are you saying that the Core is operating to spec? Where does it say user code stops and never runs again should the Cloud disappear?

I thought the problem had already been well described and understood and that Spark HQ acknowledged it and said firstly it was an unavoidable issue and later said it could be fixed but not by them, sorry.

Bug report (and not a new one): It matters not one iota which of the three modes you are in. If you are connected to the Cloud and then (for example) you disconnect your router from the WAN, thus disconnecting the Core from the Cloud, the Core stops running user code.

2 Likes

Hi @psb777,

I don’t want to hijack this thread, so it might be worth moving to a discussion on system modes, but the automatic mode is meant to simplify network programming. If the connection is lost it doesn’t run the loop until it’s back. This isn’t true in manual mode, For automatic mode it says:

If the connection to the Cloud is ever lost, the Core will automatically attempt to reconnect. This re-connection will block from a few milliseconds up to 8 seconds.

I think the 8 seconds here is misleading, since automatic mode will block until it gets the connection back. This behavior is separate from any cc3000 sync / async driver issues, and separate from the tcpserver / client discussion above. :slight_smile:

In manual mode it’ll lets you have full control over what’s blocking and when, whenever possible, and will keep running your code even when the cloud connection is lost. The current CC3000 driver does cause a block when joining a network sometimes, and that’s something that would need to be fixed with an asynchronous driver for that module, which is a much more complex task.

edit: If you’re running an app in manual mode, and it’s failing when the internet goes away, I’d be really interested in seeing that / discussing that, since I don’t think that’s intended behavior.

Thanks!
David

There have been numerous topics which have dealt with this issue, some of them with entirely sensible and relevant topic titles. I will of course follow you, @dave, if you fork this thread.

But the behaviour which, your last post, surprises you has been experienced and has been reported over months, by several, and has been acknowledged by several others, including some speaking on behalf of Spark HQ.

It seems as though there cannot be enough repetition: Whatever mode you’re in, if you’re connected to the Cloud and the Cloud disappears [and even if WiFi remains] user code stops. Dead.

@bko: A simple search does it: This is the first of several relevant hits: Semi-Automatic Mode Question - General - Particle @bdub @zach and @dave participated in that thread.

Hi @psb777,

Ah! That’s super helpful, thank you! Based on that thread,it seems like there are two obvious issues, one easy to fix, and one less easy:

1.) How many times the core tries to reconnect / how long it waits is important, and should be settable / controllable

2.) Some cc3000 operations seem to be blocking, and could be fixed with an asynchronous driver

– based on some other threads, it seems like sometimes the cc3000 module becomes unresponsive. There has been a more recent patch from TI 1.14, that might be helpful, but maybe we can catalog some failure cases and maybe we can put pressure on TI to help fix those, since we don’t have access to that code base. Would that be helpful?

Thanks,
David

In another thread @kennethlimcp is testing that recent patch. I have tried my best to help steer that testing - trying to explain to him what exactly UDP must behave like - but I cannot participate - I compile in the cloud. He seems to report better behaviour with UDP. Eveything has gone quiet recently.

Another user, @wlee, reports, same thread, 1.14 TCP is buggy.

@Dave, it seems to me that addressing your point (1) would make all the difference. The CC3000 blocks on difficulties connecting to the network. The Cloud is but a TCP socket. That can come and go, be closed and restarted, without blocking. Or ought to, and (1) would address that, as described by @Zach.

@zach seems to be merely encouraging you to dig into the open-source project and fix it to your liking. If you do come up with a good solution, send a pull request and I am sure Spark would be grateful.

I honestly think you are misunderstanding him and I don’t see anything there about a well understood problem that has a simple fix. On the contrary, it seems to me that he is saying very politely that if you want, you can dig in and try to fix the problem to your liking.

I don’t have more to say on this topic and will bow out now.

1 Like

We disagree about several things. But thank you.

As I understand it the Cloud connect and disconnect calls need not be blocking. It is establishing the WiFi connection which is unavoidably blocking. If I understand this correctly then that is fortunate - there is no technical reason for me to go forth and multiply. As long as the LAN stays up - and it does for weeks and weeks on end - the Cloud ought to be able to come and go without blocking user code.

However, it seems Spark HQ are not going to work on this themselves.

I cannot do as @bko suggests and fix it for myself - I lack that level of knowledge if not skill. All I can do is point out (a) that were Spark HQ to choose to address the problem @Zach says they could - it's possible. And (b) that Spark HQ choose not to address the issue. And (c) this will not be fixed by a trickle down from the Photon as this is precisely where the two products differ - there will not be shared code in this area.

There seems to be a hint a bounty might be offered for a fix :smile:

I’m in a similar boat to @psb777

In my case, I can’t use the Spark Core until this issue is resolved. It’s just too much of a brand and safety risk to use a processor that can block indefinitely. I don’t have the time or skill to fix the issue myself. And I can’t wait till March. I need a solution in January.

If my next round of funding comes in soon I would consider definitely funding someone to fix this bug. Funding a bug fix will be cheaper for me than swapping to another product (which I don’t want to do).

In my case a relatively simplistic “fix” is all I need. The simple fix that would satisfy me is documented here: https://community.spark.io/t/semi-automatic-mode-question/7855/21

2 Likes

@philipq, @psb777, I’m pretty sure that the bug report here, and the pull request here may be a fix for the issue. It’s not the cc3000 code that is blocking, it’s the main loop not returning control to the user loop. Sorry this isn’t more detailed, I’m on my phone :slight_smile:

Edit: in detail, if you are building locally, the fix above should work. Ive been running it for several days without issue, and when the wifi or the cloud is lost, the user code keeps running. Basic issue is that the main while loop has an ifthat checks if the the cloud is supposed to be connected, and if it actually is. When you call spark.connect() the flag to connect gets called, and if the next wlan loop doesnt connect to the cloud, the user loop gets bypassed until it does, and the SPARK_CLOUD_CONNECTED flag gets set. An alternative, if building on the web, but that isn’t probably as robust, is to add at the very end of the loop function something like this, :

if (!SPARK_CLOUD_CONNECTED)
    SPARK_CLOUD_CONNECT = false;
1 Like

I agree. I've seen the recent discussion on github too. There has been a welcome reversal on this issue. Next we must renew the call for a UDP packet boundary fix.

This is a nice fix! Although I would urge people do not use the internal flags, since these are not part of the stable API and may be removed in future. Instead, best to stick to the documented functions:

if (!Spark.connected()) 
   Spark.disconnect();

What this says is, if the connection to the cloud is dropped (for whatever reason) don't try to reconnect.

I can envisage a new API

Spark.maintainConnection(true|false)

which will instruct the system if it should attempt to maintain the cloud connection or not. By default, will be true for AUTOMATIC and SEMI-AUTOMATIC, false for MANUAL.

2 Likes

@mdma that’s a good point. In addition, I’d point out that this won’t autoreconnect to the Spark cloud. It is probably advisable to put something like this at the beginning of the user loop to try reconnecting. However, this code may change as the Spark.process()/SPARK_WLAN_Loop debate gets resolved. Again, if using the building locally fix, this shouldn’t be necessary.

if (!Spark.connected()) {
    Spark.connect();
    //this calls for a Wifi.connect(), and then sets SPARK_CLOUD_CONNECT = true
    for (int i=0; i<5; i++) {
        // Try a connect 5 times
        SPARK_WLAN_Loop();
        // This actually checks the flag and connects
        // if called for (could change in future)
        if (Spark.connected())
            break;
    }
}
if (Spark.connected())
    Spark.process();
    // in manual mode, SPARK_WLAN_Loop() doesn't call Spark.process (at least yet)

@mumblepins thanks so much for your work and fix.Great stuff. Saved my bacon and saved Spark a customer. Again, well done and thanks!

1 Like

@mumblepins Just to confirm that I understand the behaviour of your fix properly:

  • Assuming the Core is running the standard firmware plus your fix code #366
  • Assuming the Core is running in MANUAL mode
  • If connectivity to the Spark Cloud is lost:
    1. The user loop will continue to be called (nice!)
    2. The Spark Core will continue over time to try to establish connectivity with the Spark Cloud
    3. SPARK_CLOUD_CONNECTED will be false until Spark Cloud connectivity is reestablished

Please confirm that my understanding in points 2 and 3 is correct.
FYI, I intend using 3 to avoid calling Spark.publish before the Spark Cloud connectivity is reestablished.

1 Like

@philipq, that is all correct. However, I would follow the advice @mdma gave, and use Spark.connected() instead.

Any idea as to when we might expect this fix to make it to compile-in-the-cloud?