@gruvin You are correct even using setup it still sometime goes into SOS cycle. I am running the local server on my laptop… it is OSX and it is 64 bit if that’s relevant. I am going to try running it off a centOS server and see if I get different results.
@kennethlimcp@mdma I’m just going through all the comments now. Is there anything i can do at this point to help? Let me know.
@pixelboy - this is open source - you know your own skills, so you know best where to help! And of course, the help is truly appreciated - loved in fact!
Okay I have something, but I’m not sure if it helps at all. I just installed the spark local server on a remote webserver I have. The server is CentOS 64 bit. Everything is working flawlessly. Custom firmware works. Tinker works. I can power everything down and up and it still works.
The interesting thing was before I setup the port mapping so the server could communicate back to the core the core was perfectly happy. I don’t understand this because if the core didn’t get a response from the server it should flash cyan. This is where I wish I really knew what was going on under the hood.
@kennethlimcp I tried the old tinker firmware and it works way better.
@kennethlimcp … FWIW, I found the local build environment setup pretty painless on a Mac.
Meanwhile, I got workaround going for the DEBUG macro to work. But then I found that none of the DEBUG macros are available from down inside spark_protocol.cpp, where the handshake code is.
This is getting frustrating. We’re working in the dark. The people who wrote this code originally would be able to nail it so much faster.
Me too. I believe it is firmware though, because we're seeing a hard fault. Ideally, a hard fault should never happen -- no matter what the server does or does not do or when.
spark_6: still fails. Moving to spark_5 …
spark_5: still fails. Moving to spark_4 …
spark_4: still fails. Moving to spark_3, to clean compile locally as a sanity check …
spark_3: still fails! What the?! …
OK … so I’ll try the downloaded binary from spark_3 … works just fine.
HMMM.
Maybe that binary is actually from spark_2? Moving to download and locally compile spark_2 then …
There are no tags earlier than spark_3 for communication-lib or common-lib. spark_2 core-firmware does not compile against those.
Right. So I'm going to try the Spark HQ compiled binary from spark_7. This could be a local compiler issue. Will edit this post with results shortly.
That still fails. Far out. OK, so I'll try the Spark binary from spark_6 (and keep moving down the chain until I find the latest version that works).
The binary from spark_6 is working. No SOS.
But ALL versions I compile locally, from spark_3 to spark_7 inclusive, fail with the SOS.
So let's put that in our pipes and smoke it for a bit. Gee. Hmm. :-/
(And yes -- I always test at least three resets, to be sure I'm not being mislead. If I say it worked, it means the core connects first time, every time.)
Oh and all versions that I compile locally DO WORK just fine with the global cloud ... though right now, that's not making sense. So I'm gonna double check. ... and that is definitely working fine.
So ... only local builds and local server (and tag:spark_7's Spark compiled binary) are failing on the local server. The plot has thickened.
spark flash --usb tinker exhibits the SOS on the local cloud server, too. That’s not surprising though, since the same is true for the core-firmware.bin binary in tag:spark_7. (Recalling that the binary from tag:spark_6 appears to be OK.)
I wonder if the Spark server build environment itself got an upgrade recently? Like the compiler and tools and stuff? I believe I am running the very latest release version of arm-gcc and have been since the (my) start.
That version is arm-none-eabi-gcc (GNU Tools for ARM Embedded Processors) 4.8.3 20131129 (release) [ARM/embedded-4_8-branch revision 205641]
I want to add one more weird thing that seems to be happening consistently… if I leave the core unplugged for a while (like hours) and plug it in… it seems to work the first time (not subsequent times). Is there some kind of cache? Some bit of connection data that resides in volatile memory?
We did update the ARM toolchain recently on the build server so we could get the newlib stubs for the big ram improvement.
HMM. If you’re compiling locally, and you don’t have that newer ARM toolchain, I’m not sure why there would be local vs. public server differences. The handshakes should be essentially identical. Maybe certain size / types of local server / core crypto keys are causing a fault when compiled with different toolchains? – I’m guessing here.