I’m setting SOS flashes with this new commit. Using tinker V0.3.2 and deleted the entire spark-server and git clone before testing this.
Hi @kennethlimcp,
You’d need to clone the spark-protocol code directly from the github master branch, so cloning spark-server wouldn’t do it.
Thanks,
David
Hi @Dave,
I was kind of confused about the usage of spark-server and spark-protocol. Are they independent? or the protocol is park to the server?
I think when I was trying to set up the local server, I clone and run only the spark-server, but I guess spark-protocol must be a necessary part for my spark core to communicate with my local server, however, I didn’t see in any of the tutorial that how should I use or install it.
Thanks,
Yan
Hi @FlyingYanz,
Good question, the spark-server module is the API / http side, and the spark-protocol module is the core / device side. One knows how to talk to your code, and the other knows how to talk to your Core. When you run
npm install
or npm update
the spark-protocol module is brought in automatically.
Thanks,
David
Sorry for my absence. Having a spate of poor health.
EDIT …
@Dave … I just tested with the master branch core-firmware (2014-08-28 NZT) and still have the SOS fault.
I then commented out the call to to Multicast_Presence_Announcement() in spark-utilities.cpp (handshake function). Then I tried putting that back but commenting out that 3x for loop.
No change in either case. Still getting the SOS, exactly as before.
Sorry for the bad news.
Bear in mind that, as far as I know, I am the only person working on this, with local compiling ability AND with the fault being very consistent. Others seem to experience it only intermittently. (I have no idea why.)
Hi @gruvin,
I think the bug I patched, and the suggestion of the Multicast patch indicates a failure scenario where the core is sending too much data too quickly / something is getting overwhelmed. Are you compiling locally, do you have the DEBUG flag defined? I seem to recall a scenario where the debug information + fast traffic could cause a crash. Are you up to date on the various branches and building locally, or are you using the build ide?
– sorry if these are questions you’ve covered already!
edit: I think @kennethlimcp is also very interested in solving this, as am I!
Thanks,
David
[quote="Dave, post:126, topic:6161"]
I think the bug I patched, and the suggestion of the Multicast patch indicates a failure scenario where the core is sending too much data too quickly / something is getting overwhelmed.[/quote]
Seems reasonable.
Yes.
Not preently. And yes, there was a buffer overflow issue with debug enabled. That has been fixed in the master branch already, though, by @zachary I think.
The tests above were from locally compiled firmware, using the 2014-q2 (latest) version of arm-gcc and with the latest master branch of core-firmware
, core-communication-lib
and core-common-lib
.
Remember also the workaround I found, above ...
As far as I can tell, really nothing changed in the handshake protocol side of things, between tag:spark_6
and later versions. Thus it seems that the only real change relating to this SOS problem was the compiler version. (tag:spark_7 onwards will not build with that older compiler, due to errors around a missing include file relating to atomic RAM access -- or something.)
Thus, my greatest current suspicion is that there's something going on with atomic RAM access (or failure to be so) which was introduced with the latest compiler and/or some code somewhere to take advantage of that new feature. OR, that issue could be merely coincidental. But why does the problem sudden;y crop up when the later compiler is used (and needed) when there's no apparent changes to the protocols involved, themselves?
I'm beginning to suspect the multicast presence announcement issue means this is related to this thread:
http://community.spark.io/t/simple-udp-program-breaks-the-core/4791/56?u=dave
I have tested that using all the latest master branch for the 3 repo and the latest master for spark-server works well.
You might want to update everything and test again or I will be happy to help troubleshoot.
@dave, during my test, having a 50us delay between each sendto ()
for multicast will not cause SOS
EDIT: @Dave, @kennethlimcp ... ignore this. Something went wrong with my git pulls
. The latest master DOES appear to have fixed the local cloud issue, after all!
Whilst I do not doubt that you have, is it not also true that you could not replicate this SOS problem on your hardware in the first place?
I have double-checked everything. My results are correct.
Yes i am able to replicate the SOS problem.
So the question is, you are saying that you are getting SOS still with all the latest fix?
EDIT: @Dave, @kennethlimcp ... ignore this post. Something went wrong with my git pulls
. The latest master DOES appear to have fixed the local cloud issue, after all!
That's very interesting.
EDIT: Meaningless nonsense removed by its author.
EDIT: @kennethlimcp … ignore this post. Something went wrong with my git pulls
. The latest master DOES appear to have fixed the local cloud issue, after all!
EDIT: Embarrassing, time wasting nonsense removed by original author.
@Dave, @kennethlimcp, et al …
The latest master branch (as at 2014-08-29 NZT) HAS fixed the local cloud fault on my set-up, contrary to my errant claims above.
I am utterly perplexed as to what happened, but somehow my git pulls (several of them, earlier) didn’t actually get the latest version, until this last one this afternoon. I thought the latest pull grabbed a shiny new commit, since just earlier today. But according to GitHub, the last commit was nine (9) freaken days ago. I’m very confused and embarrassed … but also very happy that the problem has been resolved! My reputation in tatters … and not for the first time! LOL
Thanks guys.
EDIT: This is also with my locally compiled core-firmware, using the latest ARM-GCC v2014q2. So that entire line of reasoning I had from before can be put to rest. Thank goodness, because it was just getting too crazy!
Huzzah! I’m very happy to see this fixed, in any case Thanks for all the troubleshooting and testing!