Reliability of a device like Electron

@marcuslp, I use Electrons in several Harsh Industrial Environments.
In my personal experience, I’ve had 2 issues that have caused frustration (neither are the fault of Particle).

  1. Something in my Code will cause the Electron to stop Publishing. This can happen after 3 days of run-time or 3 months. The electron is actually still connected to the CLOUD, but no Publish Events make it out. I would “incorrectly” assume the Electron lost Cellular Connection.

  2. Sometimes a I2C sensor will cause havoc on the I2C Bus, and prevent any new sensor readings on the I2C Bus. This was hard to identify on a remote unit. Depending on the LOGIC, the Electron may not have anything to Publish without new values coming in. I added a 20 minute failsafe publish event that would always execute - but would have blank sensor values (zeros) during the I2C Bus problems.

As a work-around to both of my problems, I added a remote reset function so I can reset the Electron when the Graphs look funny. I even played around with calling that function in the CODE every 24 hours.

Is there any chance that your altimeter/barometer ( I assume you use ) could be the culprit ?

Naturally, it’s doubtful my experiences will help you, but just in case…

2 Likes

Sure @Moors7 Please feel free - any input welcome.

And thanks @RWB

@Rftop thanks for reaching out - that is good to hear. It does mean a lot to us that you’re using Electrons in an industrial environment and that you were able to find work arounds to your issues.

It’s getting late here but tomorrow myself or @Falcon will add some more comments / data.

Thanks all. Appreciate your help.

Well scratch the idea about sensor involvement then. I’m a Newbie and here to learn from the experts.

I love @RWB suggestion for publishing the RSSI.
Those results may point you in the right direction (Cellular Connectivity VS Code ).
You could treat it as Debug Code and turn the RSSI publish ON/Off with a simple Particle Function for a Flag to avoid a lot of OTA firmware changes.

I’m not from a Large City, but what are the chances of the Cellular Provider swapping tower/sites on you for Load Balancing in a densely populated area ? The folks here would know how difficult that is to track. Tower swapping would prevent you from seeing a defined pattern of RSSI verses Elevation.

@Rftop,

This is a very cool project!

I also use sensors in an outdoor environment and am new to the Particle platform so, it will be gaining outdoor experience in the coming months. I took my Electron on a road trip from Raleigh to Washington DC to test it under changing (albeit horizontally) conditions and this approach seems to be working.

Happy to share the hardware and software with you and the community if it is helpful.

I have seen the same situation you are referring to where my board thinks it is sending data but nothing is being received by the back end serial data streaming database server. This is my approach which is working so far but requires additional validation:

  1. I have an external watchdog timer on my carrier board. You can set the interval for resets by selecting the value of a resistor, for me it is 2 hours but you may want a shorter interval.

  2. When my device sends data, it subscribes to a response handler which listens to the response code from my serial data streaming service (Ubidots).

  3. If it receives the code that indicates the successful logging of a data point, it “pets” the watchdog resetting the timer. If it does not receive the code or gets a code which indicates a new data point was not logged, it does not “pet” the watchdog. It also keeps that data for the next successful connection - my application is relatively simple and I am OK if it takes 2 tries to transmit.

  4. If the external watchdog is not reset in the allotted time, it will reset the Electron using a pin reset. The next revision of the carrier board will also be able to do a “hard reset” by cycling the power.

If you are interested in the hardware component, you can follow this thread. I can also share the software if it would be helpful.

Good luck with your very interesting project.

Chip

5 Likes

I too am using Electrons in an industrial environment for agriculture. Here are some of my observations:

  • I struggled in the beginning with cell related issues. To dig into how to deal with these issues I would read everything @rickkas7 has posted on the issue. Some issues I’ve had:
    • Cell modem on / off doesn’t always work as expected, so you need to be really careful with testing how you use this command. Especially doing a Cellular.off() and then putting the device to sleep
    • Getting the watchdog to perform as you’d expect. Again, read @rickkas7 on this and then test it yourself
  • Cellular reception is actually better for me on my Electron than on my iPhone 6S
  • Condensation is an issue you will need to deal with. A good waterproof box is important, but we also conformal coat now.
  • The sim card holder can be tricky. I usually bend the 6 pins on the holder away from the electron in order to get a more snug fit
  • As mentioned above… it can be tricky to find a cell antenna that actually performs to whatever it is rated to. I would test any non-particle sold antenna extensively, as I’ve been burned by this. We went through about 10 antenna before we picked one to use in situations where service was really poor
  • I do not have a watchdog on the board that my electron is connected to. I may add this in the future, but I seem to be doing OK without one for now. If I were to have one, I would design it to completely power cycle the Electron

Thats all I can think of for now! Good luck.

4 Likes

@hwestbrook would you mind sharing the model # of the antenna that worked for you at your agricultural site.

much thanks

We’ve had good luck with this antenna: https://www.weboost.com/products/314475

We do not use it often, as its expensive and our luck has been good with cell service, but when we have to, it seems to work well.

The replies in thread have been a big help to us.

@falcon has been mulling over some of the thoughts and suggestions and will reply once he has run a few tests.

From a product development point of view (I’m not a coder myself) its been very useful at this stage in our project to hear from the comminciuty, in particular from @hwestbrook @chipmc @Rftop with devices in a similar outdoor environemnt and their feedback on code, antenna and weatherproofing points.

We are planning our next tranche of development work and are confident to be able to work through improving and stabilising the code but there are a couple of areas we are still thinking through and would welcome any further comments on:

1) Resetting Device

If we have an issue with one of our Electron powered devices we need to be able to get that device going again quickly (within minutes rather than hours).

We will be working on improving the code to reduce connection or device issues in the first place, but when they do happen, we need to resolve them quickly. As such, we are trying to figure out the best way to accomplish getting a device back to working state after it has had an issue or lost connection.

Our app will notify us immediately when a device goes down but we need to identify a viable way to get it working again.
It seems our options are:

A: Manual reset button or power cycle
B: SW or HW watch dog timer
C: Via a seperate remote access

Anyone got any thoughts / suggestions for rapid and reliable reset of a device in the field? Option A should be reliable and quick but it would mean we would need to place our device where it is easily accessible (i.e. not on top of the hoist) - not ideal but we can do this.

2) Accessing Reset Buttons on Enclosure

If we go with an option A (to utilise a manual reset button), it would be best if the client could click this button without having to open up the enclosure (as is currently the case). We are currently using an asset tracker enclosure but may move to another enclosure if and when necessary. Has anyone setup such a button (or buttons) on their enclosure of their Electron device for providing easy access to button functions? Does it work well?

Thanks!

An external Watchdog timer would be the quickest way to rest the device considering the Electron is powered from mains power.

The Electron should never be going into this solid Cyan mode though so there may be a code issue that needs to be found and dealt with.

Your current code does have the software Watchdog timer implemented but there are cases where this will not work.

Having @peekay123 @ScruffR @rickkas7 take a look at your code would be a wise idea since they are very good at understanding potential coding issues on this platform.

1 Like

@marcuslp, I’m not sure if you and @pari are working on the same project. I highly recommend against using Arduino String variables and operations due to possible heap fragmentation, especially in a commercial application.

I would go with option B, with both software/hardware and hardware watchdog resets. If your code uses Particle.publish() to send out data and you also subscribe to the that event, then you could reset the Electron if the publish event never comes back. A pin on the Electron can be used to trigger a hardware reset.

If the Electron hangs at any time, the hardware watchdog will do the reset. So, with both of these and a creative mix of their use, you should be covered.

3 Likes

The watchdog idea sounds like worth considering. @Falcon wasn’t totally sold on the idea when we last spoke. I’ll let him comment / elaborate on it.

Yes I think we have a code issue to resolve first. Yesterday the device went off line again - it appears to do this after about 48 hrs. Can’t tell but assume tis gone to solid cyan again.

@Faclon also stating this should not be happening and needs to be resolved first and that we should be aiming at a reliable device that doesn’t go down often int he first place.

Yeah @pari is also involved in this project.

Good to hear. That sounds like solid advice, to use HW and SW.

At @Falcon once you’ve had a chance to review let us know your thoughts / comments on the above route

Cheers!

@marcslp

Good luck with your project. I wanted to show what my enclosure looks like. I have had great experience with these BUD Industries enclosures as they are inexpensive, roomy and weatherproof. You had asked about buttons to reset the device, I like big push button switches which can me mounted on the board like this or panel mounted (use an IP-67 rated switch like this one).

On the watchdog timer, I have had good luck using a simple external one from TI, the TPL5010 which can be set to trigger anywhere from 100mSec to 2 hours by simply changing a resistor. I think this approach may be the cleanest. One item I forgot to mention to you in my last post. The Electron has the ability to report on what caused it to reset when it restarts. In my program, I use this feature to monitor and report (using a cloud.variable) on the number of reboots. This could be helpful as you deploy your devices to see if you still have software reliability problems which are being masked by the automatic resets.

Good luck and I hope this helps.

Chip

8 Likes

@chipmc

I do like the route you have gone with your enclosure, seems accessible, rugged and fit for purpose.

Hopefully as those of us with devices that must survive extended periods in industrial / open weather environments get more first hand experience we could be a help to each other. Will keep in touch.

And thanks for comments above - useful input as we consider and plan our next steps.

Cheers,
Marcus

1 Like

FAO of @ScruffR and Particle Support

To help aid us with this issue I am posting below information that was sent from one of our developers (@Falcon) to Rick at Particle support on the 23rd of August.

I will send a link to our code direct to Rick / Particle support by Email so as to keep it private.

Our developers are saying the code we have is very small, just 60 lines or so.

Can someone from Particle look into our code to help investigate the issue, as we have already had input from forum members?

We are prototyping a device using a Particle electron to publish floor levels from a hoist car(http://bit.ly/2wE1fgr) in realtime. We have managed to make it work,

  1. However after two days of operation, the device shows a solid Cyan light, see video in the link (http://bit.ly/2rRhosJ).
  2. After resetting the device, it start to operate normally.

It is not practically possible for us to reset the device every two days.

Could you please advice , what is exactly causing the issue.

Here is the link to my code.
–I will send this link to Rick / Particle Support by Email to keep it private-- (Marcus , 11.09.17)

Some of the community members have suggested that the print statements in the code could be causing a Heap issue. I think this is not the issue as suggested.

As we have many more devices to build, we are struggling to figure out what is causing the electron to freeze on Cyan and work properly on reset.

Your help and guidance would be greatly appreciated.

Thanks,
Marcus

Hi Marcus,

I just sent you an email with a number of recommendations, but for the sake of maintaining the thread and for other folks who hit these issues, I will post my recommendations here as well:

What to try if your device is locking up:

  • Use the latest stable version of System Firmware (not a release candidate)
  • Allocate the PMIC and FuelGauge objects in the global scope, not in your loop function.
  • Try to spend as little time in an interrupt handler as possible, don’t allocate memory, don’t call other functions, don’t print to the USB Serial connection. That can block if there isn’t a USB Host connected. You can change this behavior if you like ( https://docs.particle.io/reference/firmware/electron/#blockonoverrun- ).
  • Use the SYSTEM_THREAD to help minimize any blocking as a result of maintaining the cloud connection. ( https://docs.particle.io/reference/firmware/electron/#system-thread )
  • Use the watchdog to automatically restart the device if your firmware doesn’t check in when expected, or if things are blocked. ( https://docs.particle.io/reference/firmware/electron/#application-watchdog )
  • Don’t go to sleep immediately after a publish, give the device a few seconds to tend its cloud connection. You also want to keep your device awake for something like 15-60 seconds after connecting, so you can push over-the-air firmware updates later.
  • Consider using an Acknowledged publish “WITH_ACK”, which can make publishes more reliable over a cellular connection.

My guess is that your firmware is locking up as a result of running out of memory, or blocking / complications as a result of the interrupt handling. My guess is fixing these issues will improve the reliability of your firmware.

Thanks,
David

8 Likes

@Dave Thank you.

We are going through your Emails and info. Appreciate the input.

1 Like