When this all started, I was using gcc-arm-none-eabi-4_8-2013q4
. But yesterday, I did upgrade to gcc-arm-none-eabi-4_8-2014q2
and found no improvement.
/*
if (len > QUEUE_SIZE) { // TODO add sanity check on data, e.g. CRC
return false;
}
*/
...
Connection from: 192.168.1.136, connId: 46
on ready { coreID: '53ff6d065067544853360587',
ip: '192.168.1.136',
product_id: 65535,
firmware_version: 65535,
cache_key: undefined }
Core online!
Core SOS. So no change what-so-ever on my side -- not even the extra message reported by @pixelboy.
(This was using the latest gcc version 4.8.4 20140526
[2014q2] )
When we talk about "tinker", we're really talking about the full source set from the repository, with no local user code modifications. In other words, the repository code is the tinker app. (Oh, I see you figured that out. Yes -- the stock firmware (specifically application.cpp) is Tinker. )
In any case, the file spark_tinker.bin
in the CLI tools will have been manually renamed to that by someone, from build/core-firmware.bin
.
Indeed!
I should add that I am using (in all cases) the Mac binary versions of GNU ARM toolchain, under OS X 10.9.4.
This may be important to note, because it is remotely possible that the Mac binaries differ in some way from the Windows version. At a guess then, this could explain why (if it's even true) it is that some people can compile without the SOS issue, while others cannot, even though we are using the same tool chain version.
I suppose I could test for that myself in a Windows VM. If I get time today, I will.
Iām getting interesting results
This will be the log output if the core managed to connect using spark_7
tinker:
Your server IP address is: 192.168.1.247
server started { host: 'localhost', port: 5683 }
Connection from: 192.168.1.216, connId: 1
on ready { coreID: '53ff6f065075535135261687',
ip: '192.168.1.216',
product_id: 0,
firmware_version: 6,
cache_key: '_0' }
Core online!
routeMessage got a NULL coap message { coreID: '53ff6f065075535135261687' }
got counter 7377 expecting 7376 { coreID: '53ff6f065075535135261687' }
1: Core disconnected: Bad Counter { coreID: '53ff6f065075535135261687',
cache_key: '_0',
duration: 0.002 }
Session ended for _0
SparkCore - sendReply before READY { coreID: '53ff6f065075535135261687' }
Connection from: 192.168.1.216, connId: 2
on ready { coreID: '53ff6f065075535135261687',
ip: '192.168.1.216',
product_id: 0,
firmware_version: 6,
cache_key: '_1' }
Core online!
routeMessage got a NULL coap message { coreID: '53ff6f065075535135261687' }
got counter 25341 expecting 25340 { coreID: '53ff6f065075535135261687' }
1: Core disconnected: Bad Counter { coreID: '53ff6f065075535135261687',
cache_key: '_1',
duration: 0.004 }
Session ended for _1
Connection from: 192.168.1.216, connId: 3
on ready { coreID: '53ff6f065075535135261687',
ip: '192.168.1.216',
product_id: 0,
firmware_version: 6,
cache_key: '_2' }
Core online!
routeMessage got a NULL coap message { coreID: '53ff6f065075535135261687' }
got counter 63852 expecting 63851 { coreID: '53ff6f065075535135261687' }
1: Core disconnected: Bad Counter { coreID: '53ff6f065075535135261687',
cache_key: '_2',
duration: 0.004 }
Session ended for _2
Connection from: 192.168.1.216, connId: 4
on ready { coreID: '53ff6f065075535135261687',
ip: '192.168.1.216',
product_id: 0,
firmware_version: 6,
cache_key: '_3' }
Core online!
Iām testing with SYSTEM_MODE(SEMI_AUTOMATIC)
and the function causing SOS red flashes is Spark.connect()
Also, another successful connection:
Connection from: 192.168.1.216, connId: 16
on ready { coreID: '53ff6f065075535135261687',
ip: '192.168.1.216',
product_id: 65535,
firmware_version: 65535,
cache_key: '_15' }
Core online!
Connection from: 192.168.1.216, connId: 17
on ready { coreID: '53ff6f065075535135261687',
ip: '192.168.1.216',
product_id: 65535,
firmware_version: 65535,
cache_key: '_16' }
Core online!
1: Core disconnected: socket error Error: read ECONNRESET { coreID: '53ff6f065075535135261687',
cache_key: '_1',
duration: 285.665 }
Session ended for _1
Connection from: 192.168.1.216, connId: 18
on ready { coreID: '53ff6f065075535135261687',
ip: '192.168.1.216',
product_id: 65535,
firmware_version: 65535,
cache_key: '_17' }
Core online!
1.) it seems to me that the server is not handling the connection properly.
A socket error will result in a successful connection thereafter.
Is there somewhere i can modify to make it time-out earlier? That might help create a test-case
2.) The previous socket starts to get socket error Error: read ECONNRESET
and get closed one after another the moment a core gets online
I think the key lies in duration: 620.017
. When the server starts to kill connectoins around this timing, and the connection gets through. We need to get it to error out faster on the local .
However, i tried digging but couldnāt figure out which to changeā¦
3.) Core disconnected: Bad Counter
which also somehow killed connections/close sockets worked well tooā¦
The culprit for red SOS is the call to Multicast_Presence_Announcement(); within Spark_Handshake() in spark_utilities.cpp
Temporary solution till we make a official release:
comment Multicast_Presence_Announcement() (since that uses a hardcoded IP), local cloud should work fine.
can you comment more on whatās the issue and why does the issue not present itself between the core and spark cloud?
Thanks for the work!
I got really curious and started digging but was unable to determine with sending a multicast broadcast message caused the SOSā¦
@kennethlimcp, I am not sure why we multicast to address ā224.0.1.187ā and port ā5683ā as my knowledge of the server code is limited. The firmware commit(Multicast CoAP presence announcement) was done over here: https://github.com/spark/core-firmware/commit/b717286dbcffedf211c343cc820c762e70a782d2
Also found that calling UDPās sendto() just once instead of thrice as below gets the core connected to local cloud.
//for (int i = 3; i > 0; i--)
{
sendto(multicast_socket, announcement, 19, 0, &addr, sizeof(sockaddr));
}
The solution now is to multicast to localhost:5683 for local cloud and 224.0.1.187:5683 for spark cloud.
@kennethlimcp, were you able to connect to local cloud by commenting the Multicast_Presence_Announcement()?
I donāt have much knowledge in this but it seems like ā224.0.1.xā is a common multicast address used locally and shouldnāt affect whether if itās connect to the local or spark cloud.
How did you traced this down?
I did not setup my local build environment so itās hard for me to test this. Shall wait for @gruvin!
Interesting. found that on : http://www.iana.org/assignments/multicast-addresses/multicast-addresses.xhtml => 224.0.1.187 -> All CoAP Nodes -> http://www.iana.org/go/rfc7252
Will check with our CoAP lead @zachary tonight as why presence announcement is made to IP 224.0.1.187 after handshake.
But I can confirm that while testing on local cloud, commenting Multicast_Presence_Announcement() in Spark_Handshake() prevents the core entering in SOS HardFault Handler. Awaitng on @kennethlimcp, @gruvin to reconfirm this finding.
I have made a commit for this fix on branch: https://github.com/spark/core-firmware/tree/feature/new-led-interactions
Please do : āgit checkout feature/new-led-interactionsā and rebuild core-firmware to test the fix
@satishgn @kennethlimcp I just compiled after commenting out that line. It works⦠but the behavior of the LED is a bit unusual. When I unplug the power and plug it back in the light breaths blue for 3 - 6 seconds and then flashes green once and flips to breathing cyan⦠it bypasses the green flashing connect phase. It works but Iām not sure what the lights mean,
Thatās the new led behavior for V0.3.1 and itās normal. I havenāt checked about the breathing blue status light but the behavior is the same when I tested the latest firmware from Web IDE.
Itās weird that you did not managed a blinking green. I will only be able to test this weekend and it will be nice for someone else to report back.
Great that you managed to use the local cloud without any issues now!
For the same reason, I had mentioned about building the firmware after doing āgit checkout feature/new-led-interactionsā which fixes the various LED state changes as shown below. It will be merged with the master soon and released.
WIFI OFF => Breathing WHITE
WIFI ON and LISTENING (OR profiles not found) => Blinking BLUE
WIFI DISCONNECTED (WiFi.disconnect() called) => Breathing BLUE
WIFI CONNECTING (trying to connect to stored profiles) => Blinking GREEN
WIFI CONNECTED (IP address issued) => Breathing GREEN
CLOUD CONNECTING => Blinking CYAN
CLOUD CONNECTED => Breathing CYAN
Thanks for tracking this down!
Yes⦠thank you for your help!
I have tested with the master build default tinker firmware and it seems to be working fine with the local . Will test with some custom firmware later
Thanks!
BOOM!
So, I re-discovered this bug during our brief window of upgrading Node.js on production, which is why we had to roll it back briefly. After a really deep dive into the server / firmware code, I think I found the real cause of this. Iāll just rolled it out to the local server (spark-protocol master branch) as well.
Iāll test it some more, and then roll it out to the npm installer when itās ready. If someone wanted to test this and let me know if it helps, thatād be grand.
Thanks,
David
Hi Dave,
Iām keep having problem using my local cloud server and Iād really want to try this. In order to do that, should I just update the spark-server folder from the git server? and run the server again after that?
Thanks,
Yan