Electron sometimes freezes on reconnect - Led Solid Cyan

Can do - but why is that the case? What should I be aware of that causes that to occur? :slight_smile:

Thanks

I also can’t figure out why my Watchdog doesn’t kick in and restart the system to bring it back to life when it hangs either… :frowning:

The watchdog is reset at the end of the loop. but it seems the device can’t escape the interrupt function and hangs when it runs a Publish.

That has to do with interrupt priorities, masking, preemtiveness, reentancy and timing which would lead a bit too far here.

Also the application watchdog is a software construct which relys on the controller to execute system processes which will probably be blocked due to a deadlock situation.

Dealing with interrupts needs a somewhat more fundamental understanding of the hardware than pure application programming.

Thanks - in terms of deadlocks, what would that mean?

What two components are fighting against each other?

Exactly.
e.g. your interrupt calls a function that uses an interface which in turn uses a lower priority interrupt than the causing interrupt, the interface will wait for the calling ISR to end while that ISR is waiting for the function to return. Adding multiple layers of indirection and your debugging nightmare is born :wink:

2 Likes

Adjusting some of the code now to see what happens - let’s hope for some good changes :slight_smile:

Seems to have fixed the problem, running Particle.Publish(…) & Sending an SMS in an interrupt seems to cause the device to hang.

Surprisingly, the SMS was received successfully (using the onboard SIM to send the SMS) … but Publish never went through.

Thanks @ScruffR :slight_smile:

2 Likes

Spoke too soon… back to fast Cyan flashing and every so often going red…

Humm… what else can cause that?

@ftideman, @robinandersson,

Thanks for sharing your test processes quite elaborately.

We are facing similar issues (Electron 3G freezes Solid Cyan, after 2 days of operation). Starts working normally after a reset for another 2 days, requiring a reset again.

Would like to know what you guys are upto with testing ? Did you find out what was causing the freeze ?

Your advise would be invaluable for us. Here is a link to our application

Thanks

i’m begining to wonder if cellular providers are growing tired of all the “things” that request a line to the network and then, even though next to no signal traffic is taking place, expect the line to be continuous with no interruptions of any kind for as long needed by the “thing” the fact that cellular providers allow such access to the network is a wonder in itself. i’m old enough to remember the messages that were given by the phone companies when a handset was picked up, a connection made, but no talking taking place. sooner or later the line was disconnected and a message about hanging up the phone handset [back when pretty much only landlines] “please hang up the phone”. or if someone picked up the handset and never makes a call that weird beeping tone at high volume would force you to hang up or go crazy listening to that tone… last month or so i have come to the conclusion that the cellular carriers in my area are growing tired of “things” and if a connection does not seem to be doing much it is dropped by the carrier. anyway,might spend hours trying to debug a issue thinking it is something wrong with the device when it has nothing to do with the device except that it has a connection and is doing next to nothing with it.

Hi, @ftideman, @robinandersson,

Thanks for sharing your research, I’m having the same issues on connection when I’m waking the device from sleep.
I’m running on 4 different Electrons overnight, all running the same code based on the 0.6.2 firmware with both modes enabled: SYSTEM_THREAD(ENABLED); & SYSTEM_MODE(SEMI_AUTOMATIC);.

1 Electron keeps getting stuck every couple of hours, solid cyan, which looks kind of bright white?

We are clueless, our customers have returned the product because in their environment it get frozen more often, I believe due to connectivity issues.

Is there anyway to debug the Electron while he is in that state?

Hello everybody!

@ftideman, @robinandersson, @danpe it’s been a while have you resolved your problems?

I have the same behaviour: after a while, Electron3G-EU freezing in green or cyan. Sometimes after resetting out of the blue for 5-20 times before freezing.
as @ScruffR pointed it’s “most likely a deadlock between […] code and the system” but I’m currently not able to find it.

Setup:

  • Electron3G - EU (270), 50 devices, firmware 0.7.0
  • Each one on a custom PCB also using D0-D1 for I2C devices (and a lot of other IO for busses, ADC, PWM,…). - will test later without custom pcb
  • Regular acces to power from a 5V 600mA source (99% of the time) + small lipo high current backup battery 200mAh 25C. PMIC current limit set at 500mA asap on startup
  • interrupt on timer0 @1ms, with a handler used for scheduling (30-50 + == % operation on flags, no function call) .- will test later without it
  • interrupt on LOW_BAT_UC with a small handler calling millis()
  • Automatic mode, - will try semi auto but without faith
  • Thread enabled - will try disabled but will probably be a pain for user experience
  • uses ~10 Particle.function + ~10 Particle.variables declared at setup()
  • uses if(Particle.connected() ) Particle.publish() every time
  • uses max 1 Particle.publish() per 30 sec.
  • no watchdog - did not test the application watchdog yet. Looking forward for the hardware one on 0.8.0-Rc5
  • no sleep (yet). A7 is declared as an iput and read at ~20ms.

How to reproduce the problem:

  • in one of our test location, poor network, bad rssi (2G only probably? How to know it’s only 2G by the way), when taking the device inside an old house with big walls (poor wifi coverage as well fwiw) cell RSSI drops even lower, rapidly loses connection and the “death cycle begins”. A startup sound on the custom pcb buzzer makes it easy to notice. The devices resets itself (how? low power from U270 current peak? I have no watchdog) untils it finally lands a green or cyan solid led.
  • during the “death loop” the devices never reach the cloud for a spark/device/last_reset… message.
  • on our 2 other test location (resp 60km and 30km away), we experienced the problem once or twice so far (12 product used h24 for 3 days)
  • better network and less publish leads to “almost no problem” which is not enough :-/

How to clear the problem:

  • with a manual reset on the reset pin. Uneasy due to casing. Inacceptable and unreachable for final customer
  • resetting the device at the same location results in a new “death cycle” shortly after…

How to solve the problem: open

  • with more decoupling on the cell module? With a battery capable of bursting enough current? Is the total parasitic inductance relevant?
  • with an hardware watchdog (resetting is ok for our application but freezing obviously not)
  • by finding the deadlock - can this explain a reset?

I’ll continue to post updates. Any help would be great!

Hi @tdasnoy, Unfortunatley we weren’t able to fix that issue no matter what we tried.
We ended up not using Particle anymore.
I hope you’ll be able to fix that issue, please keep me posted if you do. :pray:

2 Likes
  • interrupt on timer0 @1ms, with a handler used for scheduling (30-50 + == % operation on flags, no function call) .- will test later without it => changes nothing
  • Automatic mode, - will try semi auto but without faith => changes nothing
  • Thread enabled - will try disabled but will probably be a pain for user experience => without “SYSTEM_THREAD(ENABLED);” No problem so far… but product almost unusable due to wait time. “most likely a deadlock” confirmed?

I have enabled app watchdog and logging the reset reason in an external spi flash.
Will test after posting this :slight_smile:

Then, I’ll try logging as much as possible to find the deadlock cause. Wish me luck!

While SYSTEM_THREAD(ENABLED) will usually be the better choice, if that was really the cause for the issue, there should be ways around most anticipated UI issues.

SYSTEM_MODE(MANUAL) may be the better choice especially if you can keep SYSTEM_THREAD(ENABLED).

And of course any of the gotchas mentioned above (if applicable) need to be taken into account

  • be careful what you do in any ISR (no logging, no high level calls, ...)
  • don't use system logging in Software Timers
  • guard against thread collisions where possible or issues may be likely (SINGLE_THREADED_BLOCK or ATOMIC_BLOCK)
  • encapsulate any access to shared resources in "one-place-of-(potential)-failure"
  • ...
1 Like

First of all, thanks for your help!

  • I have a buzzer playing a melody. Every time I call tone() the right note is played but I'm not able to call the function at the right time (Particle.process() blocking the loop I guess).
    This results in multiple second silences between each notes.
  • I must send command to a motor control loop each 20ms. This causes serious trouble as you can imagine...
  • other calls can be delayed without too much problems...

With SYSTEM_THREAD(ENABLED) I just noticed something blocking loop() while photon blinking green... I immediately went outside for better network and the blocking disappeared... I do not even know that I'm using "waiting for something" in my loop...
I read in the doc that Particle.xxx( ) are often blocking, but is there anything else?

Could this behaviour happen with this kind of code (using semi auto):

if(Particle.connected()) Particle.publish("event", buffer, 60, PRIVATE);

[loop thread] calling Particle.connected(), result is true, continue
[process thread] updating the result of Particle.connected() to false as network is off
[loop thread] blocked by Particle.publish()

Could SYSTEM_MODE(MANUAL) solve this?
Could SINGLE_THREADED_BLOCK solve this? (I'd say yes and try it now)

  • Another thing I noticed with the logs: network keep switching between 2G(900+1800) and 3G(900+2100)

  • I'll review my code one more time looking for your bullet list gotchas :slight_smile:

Depending on the actual code doing these jobs, Software Timers or even HW timed interrupts via SparkIntervalTimer might help you out there.

Since the cellular module on Electrons is connected to the µC via a USART connection and the module does tend to block the µC, while it's at work, a bad connection will keep impacting your code. You just need to find the way to make it least noticable.

That depends on the actual cause and use-case, but always worth a try. Some Particle.xxx() functions have been made blocking for SYSTEM_MODE( [SEMI_]AUTOMATIC ) + SYSTEM_THREAD( ENABLED ) but not for MANUAL mode, so best to test.

1 Like

Turns out I have 2 problems linked to "loosing network in a bad cell network area" which could be separated into:

  1. unknown resetS back to back. RESET_REASON_NONE every time. I don't know what are the conditions for this one. The only one I know is after a "particle flash --usb ...". I was kinda hoping a RESET_REASON_POWER_MANAGEMENT or a RESET_REASON_POWER_BROWNOUT. Are those implemented by the way? When is the result of System.resetReason() computed? I guess it's ok if I call it asap from setup.
  2. the deadlock. Which seems (so far) avoided by disabling threads OR removing all Particle.publish() from my code.

Really promising! I'll try! Thanks!

Unfortunately the deadlock is not solved:

  • software timer + System.reset() => nope
  • AppWatchdog + System.reset() => nope
  • hardware timer (SparkIntervalTimer ) + System.reset() => nope

perhaps I’m doing something wrong with SparkIntervalTimer?

IntervalTimer myTimer;
// Pre-declare ISR callback functions
void resetIfUnresponsive(void);

void setup(){
...
myTimer.begin(resetIfUnresponsive, 10000, hmSec);
}

static uint8_t loopUnresponsive = 0;
void resetIfUnresponsive(void){
    if(loopUnresponsive>9){
        System.reset();
    }
    else{
        loopUnresponsive++;
    }
}

void loop(void) {
...
loopUnresponsive = 0;
return;
}

Another thing I found:

  • there is a 10-15 sec delay between the electron losing network and realising it. Even in MANUAL mode with a loop doing only
Particle.process();
Serial.printlnf("Cellular.connecting()  %u Cellular.ready()  %u Particle.connected()  %u           Cellular.RSSI().rssi  %i", Cellular.connecting(), Cellular.ready(), Particle.connected());
  • if I call: if(Particle.connected()) Particle.publish(“event”, buffer, 60, PRIVATE);
    during this delay, Particle.connected() returns true and publish() is blocking, causing the deadloop.
    I can reproduce this very reliably:
  • if I do not call publish at all in the same situation, the electron does not freezes but resets a variable amount of time (reason 0)

perhaps I’m doing something wrong with SparkIntervalTimer?

So thats why I guess