Code Not Working after 15 Days in the Field


#22

Now, the first time this happened, I was not able to get to the device for a couple weeks. And after a few emails with Rick, I was trying random things and happened to remove the device from it’s product group. Maybe coincidentally, the device suddenly came back online (in that the publishes started coming across the stream again).

The second time I was able to get to the device sooner and after reading this I tried my normal troubleshooting (same as before) but instead of removing the device from the product, I tried re flashing my binary to the device. Boom. it started working again.

So, I’m stuck with maybe two ideas of what could have happened.
A: These were two separate incidents with similar symptoms in which the first magically righted itself and the second was a firmware corruption happened after several weeks of running with no incident.
B: After a device broker issue, a new handshake is required to right whatever wrong happened on the cloud side. Unfortunately the time between handshakes can be a couple weeks (from what I understand).

I am certainly looking at any other ideas on what could have caused this but unfortunately it is near impossible to troubleshoot.

To answer your question, I have a waitfor set for 5 minutes. I probably picked it up from one of your other forum posts. :slight_smile:


#23

I’ve seen random connection times from day to day. Sometimes its super fast and sometimes it takes 20 min’s for some reason. It always ends up working again eventually.

A low battery is one reason I have seen the Electron not connect when a fresh battery seemed to connect quicker all else being the same.

The Electron will automatically reset the Modem after not being able to connect for 5 mins. Can’t do much else but reset the modem and try connecting again, it used to not do that but usually that will solve your issue in most cases. Still sometimes it just can take awhile to connect for some reason.


#24

Thank you! I was hoping someone would have the exact same problem.

What you have described is exactly what I’m dealing with, and I see it happen very often since I have 50+ devices in the field. I have another community post about it that is still somewhat unresolved here: connecting but not logging to cloud

I also noticed a correlation (that I found to be a false correlation) that handshaking with the cloud fixed the problem and would make sure it actually got data through the logs. (see here: how to force a handshake). However I found this week in a fleet of 9 devices that even if they did a handshake each day, this problem still happened eventually.

I have no idea what is the problem here…

With your ideas of A and B,
A sounds non-logical to me (perhaps I’m misunderstanding). Perhaps the second part of A with firmware corruption is happening… (But how do we find the problem and fix that?)
B What is a device broker issue? And how can we fix that/work with that?

@ScruffR @Ric @RWB any thoughts on this?


#25

Part of your problem is feeding the PMIC the 5v input on the Li battery input line. You need to drop this voltage to 3.6-3.8v to eliminate the possibility of this causing non-connection issues due to power from the PMIC cutting off which you have reported to be happening.


#26

This thread also includes an in-depth discussion on this. Seems to not be resolved and a frequent recurring problem for quite a few people.


#27

I saw your post about forcing a handshake and decided to give it a try. Was hopeful but if it didn’t resolve the issue for you, I’m less so. :frowning:

I have no idea what these broker issues are. I’m assuming it has something to do with setting up two way authentication keys for the UDP connection? According to @rickkas7, there was a broker issue at the time that it last went offline. I have to believe that is more than a coincidence.

My thought was that after one of these “broker issues”, a device would need to do a handshake to reestablish keys to communicate with the cloud. But because a device doesn’t do a handshake every time it connects, this could take days or weeks? I was hoping that forcing a handshake as @ScruffR described by disconnecting and reconnecting to the particle cloud would do the trick. Of course, this adds quite a bit of data per loop, which is not ideal.

I wish I had a way to tell when these “broker issues” happened so I could see if they correlate.


#28

I fixed that issue with the power, and it is still connecting with the cloud, but logging no data.

@dcliff9 Yeah. I’m pretty sure it’s the Broker issue and something on the particle cloud end. It’s a weird error.

if you read through this community post, they have the same issue… (won’t publish). It sounds like a cloud issue they still haven’t fixed completely.


#29

Cool, just got an email that the cellular is down for now.

I have seen the Electron stop sending data even though it looks like it is sending data before. A battery pull fixed the problem.

Adding an external watchdog circuit to rest the unit every so often is not a bad idea either for these remotely located units since a reset usually will fix a lot of problems.

I left a unit outside last year for like 3 months in the winter and the only problem I ever saw was the data stop sending to Ubidots directly, no particle cloud involved, even though it looks as if everything was working fine by the way the Electron was operating. A battery pull always fixed it.

If you subscribed to your publishes to make sure it was posted then you could trigger a reset if you did not receive the publish response and then trigger a reset to try to fix the problem.


#30

Yes, whenever I disconnect all power for 5+ seconds, and then press reset and boot up the device again, it seems to start working again (but if I have 30,000 devices I can’t be doing that) and also sometimes it will only continue reporting the data for a few more days.


#31

What I was able to figure out was that it had nothing to do with the Particle Cloud since I was not using the cloud but sending data to Ubidots via their MQTT code. So it must be a cellular or network issue.

Subscribing to your publishes if you’re using the Particle Cloud is a way of catching if the events did not get received and then you can trigger an automatic reset.


#32

Okay… That’s interesting and is starting to make some sense. But just doing a normal reset doesn’t fix the problem (I’ve tried that), and from what I understand is when I go to Deep Sleep it resets the device upon wakeup (which hasn’t fixed the problem either). Or am I missing something?

It only happens on some devices, or after a period of time. It happens with certain devices more frequently as well.


#33

Yes, a reset will not reset the modem but there is some code you can run that will reset the modem as if you pulled the battery and that’s what you want to run.

Check out this great post for more info:


#34

I tried using that code from that post to reset the modem.
However, I can see noticeable differences between that, and a hard reset of pulling the battery away (basically it didn’t work for me).

Here is the code I used (just like @rickkas7 used):

void smartReboot(){

Particle.disconnect();

Cellular.command(30000, "AT+CFUN=16\r\n");
Cellular.off();
delay(1000);

System.reset();
}

#35

Should work but let’s see what @rickkas7 has to say about this.


#36

@liddlem I feel like you are always one step ahead of me.
I am starting to understand more about this problem and the one correlation I have been able to find is that the events begin showing on the stream again once the device does a new handshake. But this only seems to happen if I pull the battery for a bit and plug it back in.
As a side note, I’m also having a problem getting an OTA update (automated through a product update, not by pushing the OTA button on the web IDE). This also seems to only happen if/when the device does a handshake with the particle cloud.
So I am back to trying to figure out a way to force a handshake. Doing a particle.disconnect() and then a particle.connect() does not force a handshake. Was hoping killing the modem and reconnecting would do it.
Have you found that to be true?

I’m wondering if using these 3rd party SIM cards is somehow causing an issue with handshakes to the particle cloud. Is it possible that the shared keys just remain the same until the device is completely powered down? And if so, is there a surefire way to reistablish (or force a handshake)?


#37

@rickkas7 I tried that code as mentioned above, but it is not acting like removing the battery and a ‘hard reset.’

@dcliff9 I know man… It’s kinda frustrating. I see as well what you mentioned, if you force a handshake by removing the battery for a bit and plugging it back in it seems to always publish and work.

For OTA updates, you need to have it handshake, and then you have to keep the device alive long enough so that the update has enough time to get downloaded and installed into the device as the new firmware.

I’m not sure what forces a handshake. It sometimes works with the particle.disconnect() and then a particle.connect() on the next wakeup. it’s really confusing though what is working. i was hoping it kills the modem and reconnects to do a handshake, and it seems to kill the modem if you do the cellular.off() commands, but it doesn’t act the same as removing the battery.

Maybe it’s something with the backend cloud receiving the devices, or it’s a firmware issue or a 3rd party SIM issue. I don’t know what to say, but I’m submitting a support request.


#38

I have seen the same issue with needing to pull the battery to get the data sent via MQTT to Ubidots in the past so I think we can rule out the Particle cloud as the issue since I was not using it and saw the same issue your having.

I was using the Particle SIM cards.


#39

@liddlem, have you seen my post about skipped handshake in the other thread you mentioned this issue?