Experiment – Core Cloud Connection Recovery Issue

Finally a very successful day managing somewhat stable cores. I present this set of casual test results, hoping
a discussion will help “fix” what I observe. Or possibly this is just another example of previous documented delay(xx) issues??

Had two cores operating on the main wireless; on the same desk; 20 feet from DLink modem.
Both operating for a period of time… more than 10 minutes without error
Both cores had ONLY test code that blinks the blue LED (D7) to indicate code is operation. (BST = Basic Sanity Light)

Observation…
Core 2 drops connection, AND has no BST
Core 2 flashing green looking for a connection ; Core 1 is connected, happy and flashing its BST
Am I correct to assume MY LOOP should be executing without a wireless connection???

Several times the BST flashed one time, then again dead.
Press core 2 RST, all is happy again. (waited over 2 minutes before resetting)
Observed this condition more than once.

Potential issue: my Comcast net connection is proving to drop a lot.
Core 2 lock could have been caused by a Comcast drop, a drop that
did NOT adversely affect Core 1 recovery sequence???

Comments are welcomed.

My comment is not specifically about your situation, but maybe related.

I have noticed that there is confusion at times between the core and the cloud as to what is the current status of communication.

For example, while my core is connecting to the cloud, if I reset it, then when it comes back up, the cloud is still sending it packets from the previous connection. These appear to be ignored by my core, but it is now incapable of starting up the tcp connection.

If I wait long enough before doing a reset, all is fine.

There seems to be some ‘dead’ states in the connection state diagram.

I think this could relate to your problem, as if you lost packets from the connection to the cloud of your core 2, then the connection state is confused.

So really this gives the impression that the core is not stable. Some resets it works, others it does not.

I witness all this by using wireshark on my outgoing Internet connection.

Hi!

Please see https://community.spark.io/t/bug-bounty-kill-the-cyan-flash-of-death/1322/268 and five that FW a try.

Now 2 hours into the latest experiment. Left both cores undisturbed for a while.

Core 1 is fine, BST flashing; Cloud connected.
Core 2 is again “locked” with flashing blue.
Simple RST returned core 2 to operation.

There is a small timing difference between the two applications. Core 1 has fewer delays flashing the BST 2 times per cycle, Core 2 is flashing the BST 3 times per cycle. Just thinking. Going to dinner.

Hi @mariog,

This is very interesting, and not a report I’ve heard before. What kind of packets are you seeing at what point during the connection / post reset etc? Are you seeing any error packets or anything? Any packet captures I could take a peek at?

Thanks!
David

I tried recreating the scenario. Actually, if I wait a while, the core will actually re-initiate the connection.

Through this testing I have seen that there are TCP keep alives after 15 seconds. good.
Also, when my core cannot connect to the cloud (got rejected for some reason). It does not seem to try again.

So I guess my initial comment was incorrect. Sorry for misleading you.