Local cloud - SOS panic flash with user firmware [Solved]

kennethlimcp · August 22, 2014, 12:31am

I’m setting SOS flashes with this new commit. Using tinker V0.3.2 and deleted the entire spark-server and git clone before testing this.

Dave · August 22, 2014, 3:40pm

Hi @kennethlimcp,

You’d need to clone the spark-protocol code directly from the github master branch, so cloning spark-server wouldn’t do it.

Thanks,
David

FlyingYanz · August 25, 2014, 11:05pm

Hi @Dave,
I was kind of confused about the usage of spark-server and spark-protocol. Are they independent? or the protocol is park to the server?

I think when I was trying to set up the local server, I clone and run only the spark-server, but I guess spark-protocol must be a necessary part for my spark core to communicate with my local server, however, I didn’t see in any of the tutorial that how should I use or install it.

Thanks,
Yan

Dave · August 26, 2014, 12:09am

Hi @FlyingYanz,

Good question, the spark-server module is the API / http side, and the spark-protocol module is the core / device side. One knows how to talk to your code, and the other knows how to talk to your Core. When you run npm install or npm update the spark-protocol module is brought in automatically.

Thanks,
David

gruvin · August 27, 2014, 11:13pm

Sorry for my absence. Having a spate of poor health.

EDIT …

@Dave … I just tested with the master branch core-firmware (2014-08-28 NZT) and still have the SOS fault.

I then commented out the call to to Multicast_Presence_Announcement() in spark-utilities.cpp (handshake function). Then I tried putting that back but commenting out that 3x for loop.

No change in either case. Still getting the SOS, exactly as before.

Sorry for the bad news.

Bear in mind that, as far as I know, I am the only person working on this, with local compiling ability AND with the fault being very consistent. Others seem to experience it only intermittently. (I have no idea why.)

Dave · August 28, 2014, 2:33am

Hi @gruvin,

I think the bug I patched, and the suggestion of the Multicast patch indicates a failure scenario where the core is sending too much data too quickly / something is getting overwhelmed. Are you compiling locally, do you have the DEBUG flag defined? I seem to recall a scenario where the debug information + fast traffic could cause a crash. Are you up to date on the various branches and building locally, or are you using the build ide?

– sorry if these are questions you’ve covered already!

edit: I think @kennethlimcp is also very interested in solving this, as am I!

Thanks,
David

gruvin · August 28, 2014, 10:43pm

[quote="Dave, post:126, topic:6161"]
I think the bug I patched, and the suggestion of the Multicast patch indicates a failure scenario where the core is sending too much data too quickly / something is getting overwhelmed.[/quote]

Seems reasonable.

Yes.

Not preently. And yes, there was a buffer overflow issue with debug enabled. That has been fixed in the master branch already, though, by @zachary I think.

The tests above were from locally compiled firmware, using the 2014-q2 (latest) version of arm-gcc and with the latest master branch of core-firmware, core-communication-lib and core-common-lib.

Remember also the workaround I found, above ...

As far as I can tell, really nothing changed in the handshake protocol side of things, between tag:spark_6 and later versions. Thus it seems that the only real change relating to this SOS problem was the compiler version. (tag:spark_7 onwards will not build with that older compiler, due to errors around a missing include file relating to atomic RAM access -- or something.)

Thus, my greatest current suspicion is that there's something going on with atomic RAM access (or failure to be so) which was introduced with the latest compiler and/or some code somewhere to take advantage of that new feature. OR, that issue could be merely coincidental. But why does the problem sudden;y crop up when the later compiler is used (and needed) when there's no apparent changes to the protocols involved, themselves?

Dave · August 28, 2014, 10:46pm

I'm beginning to suspect the multicast presence announcement issue means this is related to this thread:

http://community.spark.io/t/simple-udp-program-breaks-the-core/4791/56?u=dave

kennethlimcp · August 28, 2014, 10:46pm

I have tested that using all the latest master branch for the 3 repo and the latest master for spark-server works well.

You might want to update everything and test again or I will be happy to help troubleshoot.

@dave, during my test, having a 50us delay between each sendto () for multicast will not cause SOS

gruvin · August 28, 2014, 11:02pm

EDIT: @Dave, @kennethlimcp ... ignore this. Something went wrong with my git pulls. The latest master DOES appear to have fixed the local cloud issue, after all!

Whilst I do not doubt that you have, is it not also true that you could not replicate this SOS problem on your hardware in the first place?

I have double-checked everything. My results are correct.

kennethlimcp · August 28, 2014, 11:03pm

Yes i am able to replicate the SOS problem.

So the question is, you are saying that you are getting SOS still with all the latest fix?

gruvin · August 28, 2014, 11:07pm

EDIT: @Dave, @kennethlimcp ... ignore this post. Something went wrong with my git pulls. The latest master DOES appear to have fixed the local cloud issue, after all!

That's very interesting.

EDIT: Meaningless nonsense removed by its author.

gruvin · August 28, 2014, 11:14pm

EDIT: @kennethlimcp … ignore this post. Something went wrong with my git pulls. The latest master DOES appear to have fixed the local cloud issue, after all!

EDIT: Embarrassing, time wasting nonsense removed by original author.

gruvin · August 29, 2014, 2:30am

@Dave, @kennethlimcp, et al …

The latest master branch (as at 2014-08-29 NZT) HAS fixed the local cloud fault on my set-up, contrary to my errant claims above.

I am utterly perplexed as to what happened, but somehow my git pulls (several of them, earlier) didn’t actually get the latest version, until this last one this afternoon. I thought the latest pull grabbed a shiny new commit, since just earlier today. But according to GitHub, the last commit was nine (9) freaken days ago. I’m very confused and embarrassed … but also very happy that the problem has been resolved! My reputation in tatters … and not for the first time! LOL

Thanks guys.

EDIT: This is also with my locally compiled core-firmware, using the latest ARM-GCC v2014q2. So that entire line of reasoning I had from before can be put to rest. Thank goodness, because it was just getting too crazy!

Dave · August 29, 2014, 3:16am

Huzzah! I’m very happy to see this fixed, in any case Thanks for all the troubleshooting and testing!

Topic		Replies	Views
Question about SOS panic flash Cloud	9	1866	March 28, 2015
Spark local Cloud (Beta) - Collection of issues Troubleshooting	8	2367	August 7, 2014
Local cloud server connection problem, spark core led turn red and then reset Cloud	8	2691	August 27, 2014
[Solved] Spark Core offline with local Cloud Troubleshooting	5	2079	July 2, 2015
Core connects to local server but immediately drops it (showing red light) Troubleshooting	5	1430	September 10, 2015

Local cloud - SOS panic flash with user firmware [Solved]

Related topics