Electron code crashing?

I have an Electron deployed at a test site. The Electron is part of a system that was originally developed on the Core (over a year ago) and ported and well tested on 7 Photons for over a year. The same hardware and firmware is being used on the Electron as on the Photons.

When I got the Electron a few weeks ago, I plugged it into my test system (that was running with a Photon) and I flashed the firmware to it. It ran just fine. I burned it in for over a week with no problems at all. Two days ago, I installed the system at a test site and it has been running fine, until just now. This evening, I tried to read data from it and it did not work. The Dashboard shows that it is on-line and my javascript based debug client logs into the cloud, finds the device is online and can communicate with it. The javascript receives a cloud report of all cloud functions and variables - just fine. However, I was unable to read any cloud variables at all, nor could I access any cloud functions - timeout and error returned in all cases.

The device is at a test site, so I cannot get to it readily to see what the LEDs say. But I did go and OTA re-flash the firmware to it (this was the only way that I could think of to remotely reset the device). It flashed OK and now everything works. But I have no idea what went wrong. The firmware has been tested for many many months on multiple Cores and Photons (about 10 such devices in all) and it does not crash for any reason. So, I cannot account for what happened – I assume that the firmware crashed somehow because of the behavior (the cloud reports all cloud functions and variables but I cannot access any of them on the device).

If anyone has ideas about this, I would appreciate knowing about them. Other than ideas about what might have happened, it would be nice to have more information from the Dashboard. At present, there is a big gap between what is available from the dashboard and what I can see from my debug client. Again, the debug client does read metadata about cloud functions and variables from the Particle cloud. This information was correct, but the functions and variables could not be accessed (none of them). And I do not know of any way to remotely reset the device other than to flash the code OTA (which is expensive, G3 data-wise). At the very least, the ability to perform a remote reset from the dashboard would be welcome. Any additional information (such as the time of the last communication to the cloud and anything the device can say about running user code) would be more than welcome. I think that such improvements to the dashboard are necessary to support operations where the Particle devices are deployed in remote (from the developer) locations.

I see my Electron as connected when I know it has been sleeping for long periods. for example, MyElectron has been asleep for 6 days:

You could try to write into your firmware a System.reset() if the device doesn’t connect to a server or receive a publish from another device (or Watchdog timer if it is locking up).

You can also experiment with System Modes, if you are not already using that.

The Core took a long while to iron out and get really stable, the Photon took less time and got really stable, I suspect once there are a lot of use cases (like yours) that get developed and debugged, the Electron will stabilize, and hopefully in less time.

I just sent out 2 systems that will be connected to Ethernet in remote locations. This was running on 2 Arduino based micros and I ended up using the Watchdog feature on both micros to make sure if the code locked up that the device would reset and reconnect as if nothing ever happened. This worked exactly how it should.

I would recommend we figure out a way to use the Watchdog timer on the Electron also.

Here is a library for the Photon Watchdogs on the Photon

If you read down to @ScruffR 's last post it looks like there was no support for the Watchdog timers on the Electron yet and he pinged @mdma and maybe we should ask @BDub about the Watchdog timer on the Electron. What is the status of the Watchdog timer function on the Electron and Photon?

Frequent resetting of the Electron will not work for me because my firmware keeps an event log (circular buffer) on board (in RAM) that is to be read out when the user needs to. Resetting the device will wipe out the log and thus lose data. Obviously, when the device is at a remote location, resetting and losing some historical data is preferable to not having a working device at all, but I’d rather see the underlying problem fixed.

If it happens again, I hope to be able to physically get to the site so that I can observe debugging data that my firmware places on the D7 LED. This way, I will be able to report if the user firmware is running correctly or not, independent of cloud communication. If it is (which I suspect, but cannot prove because I was unable to access the site this time), then it is system firmware related to cloud communication that was the problem.

The Electron has changed the way cloud communication is done (UDP vs TCP for the Photon) and it would not be surprising if there were some subtle bugs in it now. Hopefully, they will be found and fixed. If the WDT is to be used, it would have to be used here, to re-establish the full cloud communication, rather than just reset the device (which will do the latter, but reset and wipe out my circular buffer in the process).

@BobG, my electrons are sitting there not running anything serious so if your code is shareable, I can take a look and run on my setup to see what happens :smile:

No promise but a little something i can offer with the availability of beta electrons

@kennethlimcp: Thank you for the offer. In order to run the code on your Electron, you would have to build the “SIS Hub”, which includes a Photon/Electron, an I2C EEPROM and two wireless receivers (and misc other parts). You are more than welcome to do so. Complete documentation (including Particle source code) is posted at:

It is clear that something had died. Whatever it was, it was not dead enough to have the Particle cloud detect that the Electron was offline - it fact it was online and could flash code OTA. It might be that my (user) firmware crashed; however, this same firmware has been running stable on 7 Photons for over a year. Nevertheless, I wasn’t able to physically access the Electron in order to tell for sure if my firmware was running or not.

If my firmware wasn’t running, then the question becomes why - a bug (memory leak or such) or a hardware issue with the Electron or power. If my firmware was running, then there must be a state machine issue with the Electron/cloud communication that prevents access to the exposed variables and functions via the Cloud. Given the history with the Photon, and given that the Electron is new and uses different protocols to communicate with the cloud than does the Photon, I’m leaning toward the latter.

In any event, the device was reset by re-flashing the firmware and it is running again, so I will keep track of the operation and if this happens again, I will attempt to physically access the site so that I can report for certain if the user code is running or not. As I stated previously, I have some debug information on the D7 LED as the code is running and doing its thing.

Does the publish message still deliver or it is determined to have issues as the fn/var are not listed?

Forgive me if this does not make sense but can you use the Backup RAM to store you circular buffer data so it survives a WatchDog Timer reset?

I had commented out the line #define CLOUD_LOG, which disables the Spark.publish(), prior to flashing the code to the Electron. I did this to save data charges on the 3G. The intended operation is to read out the internal (to the Electron) circular buffer of events via the ReadBuffer function and the CircularBuff variable. So the answer is that I don’t know if publications would have worked or not when functions and variables were not accessible.

Background (in case you are interested in why): we designed the software to work two ways – polling the Photon/Electron for data stored in an internal (in RAM) circular buffer (pull) or publishing new buffer entries to the Cloud (using the circular buffer to store and rate limit publications – push). We have used IFTTT to respond to the publications and load them into a Google sheet. The preferred method is to use the publications, for a number of reasons. However, we found that entries on the Google sheet take an unpredictable and sometimes very long time to appear. Occasionally, and under unusual but possible circumstances, the Google sheet is missing logged event entries (that are in the circular buffer on the device). We are unsure why these things happen but we have tentatively placed the blame on IFTTT. Because timeliness and completeness is essential for the unit that is on-site, I have turned off publication and query the circular buffer when data is needed.

We have considered bypassing IFTTT and scripting on the Google sheet, using the Particle javascript library to make a subscriber. This is on our (rather long) to-do list. We hope that this proves to be reliable. Using publication (push) vs query (pull) will save us 3G data transfers and provide many benefits of the Goggle spreadsheet (ability to collect a large amount of data, ability to share the collected data without sharing the Particle account, etc.). But for now, the pull method is needed to ensure that the data is up to date when queried and is not missing any key events.

@RWB: sorry but I do not know what “Backup RAM” is all about. I’ll try and look it up in the documentation.

@RWB: I just looked up Backup RAM and yes, this does appear to be the solution to preserving the log (circular buffer) data through device resets. Thank you for pointing this out. It was not available when we started developing (for the Core) and I failed to notice whenever it came in. It is available for Photon and Electron, which is important to us, since we want to support both WiFi and cellular services. This is a GREAT feature to have! The 4K that is available will be enough for our purposes.

2 Likes

Sweet I’m glad I can help out around here like others have helped me also :wink:

@All: Here is an update: the same problem reoccurred between last night and this morning. This morning I was unable to access any cloud variables or functions on the Electron that is at a remote site. The dashboard reported that the Electron was online and communicating fine.

I was able to access the device on-site this afternoon. The device was absolutely fine. It was breathing cyan, the battery was charged (no red battery light) and the D7 LED would flash the proper diagnostic pattern when I tripped some sensors. However, the variables and functions could not be accessed over the Internet. I pressed the RESET button on the Electron and it reset and everything is working again.

Clearly, there is some 3G communication problem between the Electron and the cloud. the user firmware is running fine. The multi-color LED indicates that the system firmware is running fine. But cloud communication with the Electron was definitely lost. As this is the second time that this happened, there must be some state machine issue in the Electron cloud communication.

@RWB: I tried to use the backup RAM for my log (circular buffer) but it did not work. I believe that the reason is that what is logged in RAM are Strings and I do not think that the compiler knows how to retain Strings and other non-primitive data types. Sigh …

@BobG It’s nice your seeing this because I also plan to place some remote devices so I can monitor and communicate with them.

For the backup RAM I’m actually not sure it’s currently working on the Electron unless your compiling your Electron Firmware locally and pull in the fixes for the Electron. It does work with the Photon’s latest firmware.

@joky @MDMA @Bdub @ScruffR Do you guys know if the Backup Ram on the Electron can be enable if you build locally?

In your current code I would just reset the cellular connection every 12 or 24 hours to reset what ever is happening currently. Or create some code that checks for the non connectivity issue your seeing and then call the cellular disconnect and reconnect functions to reconnect to the mobile network. It’s not ideal and it will cost some data but it sure beats having to go out and check on the system.

Let us know what happens.

Also the Electron will almost always show up as ONLINE even if its not currently online. There has been post about this in the past so don’t expect it to be accurate like the how the Photons show up online as soon as they connect to the Particle Cloud .

Yup, that's part of the problem. The String object itself is not the big problem, but the internal buffer of the String which is allocated dynamically in the heap area which doesn't live in the Backup RAM domain.
But C strings would not pose a problem.

BTW: For long running applications heavy use of String might lead to heap fragmentation and eventually to hard or usage faults. So C strings are what I'd prefer anyway.

Hi @RWB,

yes, based on latest GIT-sources the backup RAM can be enabled & works fine.

Apologies if this is thread jacking, but is this true? For a remote sensor, that sounds like a kiss of death! I've had my electron lock on me once as solid cyan, and I'm not doing any fancy multithreaded stuff (unless it's enabled by default on .50), could this be it?

It happened again just now. The Electron has been working flawlessly since April 6 (almost one month). It was working and fully accessible OTA this morning. It is not cloud accessible now. The dashboard and Spark.js all report that the device is on-line and my javascript successfully reads in the metadata of all of the cloud accessible functions and variables. However, the device itself will not return any values for any cloud variable nor will it accept any cloud function calls – they all just time out. I tried OTA flashing the firmware again, but this time it also timed out. Since the device is at a site, I cannot access it to look at the LEDs or to push the reset button, The last time that this happened, however, it was found to be breathing cyan and running my firmware but not communicating with the cloud.

The system firmware on the device is 0.4.8. It will probably be two days or so before I can get into the site to see what is happening. I hope that someone has a solution for this because it is not really possible to deploy an Electron to a remote site as long as these communication interruptions continue to happen, with no apparent way to remotely reset the device.

@BobG I’m pretty sure the Watchdog timer is working in the latest firmware and maybe even with the 0.4.8 firmware so I would look into using it since it will enable you to reset the micro controller should it ever lock up or become responsive.

Or you could just use a extra small micro processor to trigger a Electron Reset every 24 hours just in case this ever happens again in the future. I think the EEPROM save during power down is working now so you could have the extra micro controller trigger a interrupt pin on the Electron to save the critical data to EEPROM before resetting.

Just some thoughts.

I have some systems deployed remotely and rely on the Watch Dog feature to rest the device which is not a Electron anytime the firmware locks up for more than 8 seconds and it works like a charm.

@RWB: Thanks for the suggestion. Assuming that this condition is the same as before, I had found that my firmware was running just fine but the Electron was not exposing the cloud variables and functions to the cloud (however, the cloud had the metadata just fine). So the question is: how is the WDT, or my firmware, for that matter, to know if the Electron is “locked up”? It really isn’t “locked up” (so it will be resetting the WDT OK), but something in the system firmware seems to think that it is communicating to the cloud when it is not. So I do not believe that WDT will solve the problem.

I suppose that I could just go ahead and issue a System.reset() once every day. Of course, as discussed before, this will blow away my buffer of 100 of the last recorded entries. I may be able to recode the whole thing using c-strings vs Strings; the former can be stored in battery backed RAM which MIGHT preserve the buffer. However, it would be a lot better if the underlying problem were identified and fixed by Particle. In the meantime, it would be really handy if an owner could remotely reset a device from the Dashboard. Depending upon what the underlying problem actually is, this may or may not work. But if it does work, I could remotely reset the device after determining that something in the cloud communication was broken. I had hoped that re-flashing the code OTA would accomplish this (at a high data transfer cost), but it too timed out.