Electron sometimes freezes on reconnect - Led Solid Cyan


#21

Thank you! Looking forward to hearing the results!


#22

Hey, here’s an update on our debugging process:

  • We’ve run our two 3G Electrons with our full code suite since Wednesday, and they are still going strong! Running this long without a freeze has been rare among our 2G Electrons, so great news there! :slight_smile:
  • The Tinker software has run on thirteen 2G Electrons over the weekend with unreliable cellular coverage - none have freezed :grinning: This is basically unheard of with 2G Electrons in the environment we run them in.

These results should be good enough to conclude that the Tinker software works with 2G Electrons in our desired setting; meaning there’s something between the Tinker code and our “barebones” code that does not play nice. One of the differences is SYSTEM_MODE; AUTOMATIC vs SEMI_AUTOMATIC. The reason we’re not running AUTOMATIC is that we need the devices to be responsive for user interaction when offline and as fast as possible when they boot up. It is not enough for the devices to wait until connection is established for them to be usable. With that said:

  1. Are there any important undocumented differences between these modes that may cause our freeze?
  2. Are there any stock code examples (like Tinker) available with SEMI_AUTOMATIC that we can test on our devices?

In the meantime we will modify our own barebones code to run STSTEM_MODE(AUTOMATIC) and see what happens.


#23

I so far do not have problems with freezing electron 2G. I use SEMI_AUTOMATIC mode.
You can also use the code to code for fault tolerance by @rickkas7.
https://github.com/rickkas7/electronsample/blob/master/README.md


#24

And the pursuit of a bug free hardware continues…

New discoveries:

  • Again, the two 3G Electrons are still going strong. No crash after a week of continuous operation, with the Exact same code as the 2G Electron ran, and crashed.
  • We tried running the aforementioned crashing code on our 2G Electrons with one change; removing all display related code, and woala, no crash! This may simply be a coincidence, since we only tried this over one night. But now suddenly the suspect seems to be the physical display, or the display related code. Have anyone had similar problems? It’s a 0.96" SPI screen, with a current draw of ~25mA, and we’re using the Adafruit SSD1306 lib from the web IDE. We’re using the pins A3, A5, D0 & D1 for SCL, SDA, DC & RST respectively.
  • We’ve now run the rickkas7 “Electron Debug” code for 48 hours without any crash. This is on Electrons that have a physical display connected, but no display related libraries included.

We’ll continue to investigate the display and see if we’re able to crash a clicker without the Adafruit SSD1306 library included.


#25

I’m having similar issues with a test module, we don’t use the screen or other screen code.

We also use a battery well in excess of our needs - it appears to happen after a certain interrupt is triggered.

The interrupt has SMS code in it as well as Particle.publishes.

Not sure yet what is causing the system hangs. but we get similar unresponsive behaviours where interrupts are ignored and the device drops off the net.

We use multi-threading too… humm

EDIT: Correction - other interrupts seem to activate functions, but don’t bring it back onto the cloud. The interrupt that triggered the issue doesn’t re-trigger. :\


#26

A rule of thumb is not to put complex stuff into an ISR.
Set a flag and do the complex stuff in the main loop.


#27

Can do - but why is that the case? What should I be aware of that causes that to occur? :slight_smile:

Thanks


#28

I also can’t figure out why my Watchdog doesn’t kick in and restart the system to bring it back to life when it hangs either… :frowning:

The watchdog is reset at the end of the loop. but it seems the device can’t escape the interrupt function and hangs when it runs a Publish.


#29

That has to do with interrupt priorities, masking, preemtiveness, reentancy and timing which would lead a bit too far here.

Also the application watchdog is a software construct which relys on the controller to execute system processes which will probably be blocked due to a deadlock situation.

Dealing with interrupts needs a somewhat more fundamental understanding of the hardware than pure application programming.


#30

Thanks - in terms of deadlocks, what would that mean?

What two components are fighting against each other?


#31

Exactly.
e.g. your interrupt calls a function that uses an interface which in turn uses a lower priority interrupt than the causing interrupt, the interface will wait for the calling ISR to end while that ISR is waiting for the function to return. Adding multiple layers of indirection and your debugging nightmare is born :wink:


#32

Adjusting some of the code now to see what happens - let’s hope for some good changes :slight_smile:


#33

Seems to have fixed the problem, running Particle.Publish(…) & Sending an SMS in an interrupt seems to cause the device to hang.

Surprisingly, the SMS was received successfully (using the onboard SIM to send the SMS) … but Publish never went through.

Thanks @ScruffR :slight_smile:


#34

Spoke too soon… back to fast Cyan flashing and every so often going red…

Humm… what else can cause that?


#35

@ftideman, @robinandersson,

Thanks for sharing your test processes quite elaborately.

We are facing similar issues (Electron 3G freezes Solid Cyan, after 2 days of operation). Starts working normally after a reset for another 2 days, requiring a reset again.

Would like to know what you guys are upto with testing ? Did you find out what was causing the freeze ?

Your advise would be invaluable for us. Here is a link to our application

Thanks


#36

i’m begining to wonder if cellular providers are growing tired of all the “things” that request a line to the network and then, even though next to no signal traffic is taking place, expect the line to be continuous with no interruptions of any kind for as long needed by the “thing” the fact that cellular providers allow such access to the network is a wonder in itself. i’m old enough to remember the messages that were given by the phone companies when a handset was picked up, a connection made, but no talking taking place. sooner or later the line was disconnected and a message about hanging up the phone handset [back when pretty much only landlines] “please hang up the phone”. or if someone picked up the handset and never makes a call that weird beeping tone at high volume would force you to hang up or go crazy listening to that tone… last month or so i have come to the conclusion that the cellular carriers in my area are growing tired of “things” and if a connection does not seem to be doing much it is dropped by the carrier. anyway,might spend hours trying to debug a issue thinking it is something wrong with the device when it has nothing to do with the device except that it has a connection and is doing next to nothing with it.


#37

Hi, @ftideman, @robinandersson,

Thanks for sharing your research, I’m having the same issues on connection when I’m waking the device from sleep.
I’m running on 4 different Electrons overnight, all running the same code based on the 0.6.2 firmware with both modes enabled: SYSTEM_THREAD(ENABLED); & SYSTEM_MODE(SEMI_AUTOMATIC);.

1 Electron keeps getting stuck every couple of hours, solid cyan, which looks kind of bright white?

We are clueless, our customers have returned the product because in their environment it get frozen more often, I believe due to connectivity issues.

Is there anyway to debug the Electron while he is in that state?