Electron suddenly stopped connecting

I think this is another problem which I also have round about every 4 days with some Electrons (not with all). After pressing the reset button they work for the next few days. Then they fall in the same problem state.

The “not publishing” issue is new (it seems it is not the Particle services problem like a few weeks ago). I have this since today and can solve it with new keys (keys doctor).

I’m starting to compare the log of a cloud connection where I am able to publish and where i am not able to publish:

My current insights:

  1. The message ids ([comm.coap] TRACE: sending message id=17) are different at the same position. But they are different in every of my stored logs.

  2. The remote host ip address (particle server) is often different from log to log. Does Particle have multiple endpoints? The not working ip address was in my case: 54.89.10.58. I will check whether my working Electrons connect to this IP and can publish.

  3. In the problem case, there are no messages received by the Electron. In a successful case this lines are in the log:

     0000081161 [system] TRACE: received 33
     0000081161 [comm.coap] TRACE: recieved ACK for message id=2e
     0000081162 [comm.protocol] INFO: rcv'd message type=13
     82.162 AT read  +   17 "\r\n+UUSORD: 0,33\r\n"
     Socket 0: handle 0 has 33 bytes pending
     82.172 AT send      16 "AT+USORF=0,813\r\n"
     82.222 AT read  +   71 "\r\n+USORF:     0,\"54.86.250.117\",5684,33,\"\x17\xfe\xfd\x00\x01\x00\x00\x00\x00\x00i\x00\x14\x00\x01\x00\x00\x00\x00\x00ih\x1dq{;\xa6\xb4\x83\xd9I\xaa.\""
     82.234 AT read UNK   2 "\r\n"
     82.244 AT read OK    6 "\r\nOK\r\n" 
    

    So I think, the Particle servers could not find the Electron when the “not publishing”-problem appears. Wrong public IP address? Wrong or no IP address transfered to Particle? Timeouts? This could only be answered by Particle members.

Update

Starting a deep dive in the Electron firmware, I think I find a possible reason for our problems. Fast Link to GitHub

My thoughts:

  • To generate the udp remote ip address for the Electron, this function is
    called: determine_connection_address

  • There is an interesting todo comment:

     #if HAL_PLATFORM_CLOUD_UDP
     // todo - how to determine if the underlying connection has changed so that we invalidate the existing session?
     // for now, the user will have to manually reset the connection (e.g. by powering off the device.)
     if (udp && !determine_session_connection_address(ip_addr, port, server_addr)) {
     	return 0;
     }
     #endif
    
  • The constant HAL_PLATFORM_CLOUD_UDP is 1 in case of PLATFORM_ID is 10 or 3 (10 = Electron, 3 = GCC) see: hal_platform.h on GitHub

  • The todo comment would also explain my guess that something wrong is stored of a previous cloud session. I think this because on the one hand a reset helps and on the other hand new keys help. This also explains why the problems only happen after some time.

  • In this file you also see how to find the remote IPs of the cloud (the IPs in the log are not the public Electron IPs!). They are generated by resolving the domain name: $id.udp.particle.io - $id is replaced with the device id. This explains the differenct IPs from log to log.

Can someone confirm these points?

1 Like

This may be Something @BDub wants to know about. For me, this rarely happens, but it does happen.

1 Like

Did this every resolve itself? Sometimes these non-connecting issues can be related to your SIM card going over the data usage limit and getting paused. You can check your data usage in the Console under Billing. If that looks ok I would contact support.particle.io and see if something recently changed with your local network provider for Particle SIMs.

When you switch back and fourth between APPs that use Particle SIMs and 3rd Party SIMs, are you power cycling your Electron? If you do not, the modem typically stays powered and retains the settings from the previous connection with previous SIM card. A power cycle or special AT commands are required to drop the current network settings.

Your logs seem to indicate all SEND operations and no RECEIVED operations before sleeping. This indicates there is no ACK on the Publish for one, but might also indicate you are putting the device to sleep well before the Electron has even finished handshaking with the Cloud, let alone started the Publish. Try using Publish with the WITH_ACK flag to make the operation fully synchronous:

causes Particle.publish() to return only after receiving an acknowledgement that the published event has been received by the Cloud

Particle.publish("frostEvent", pubData, PRIVATE, WITH_ACK);

You should definitely not have to do this. Try the WITH_ACK flag first and see if that helps. Requires 0.6.1 firmware.

This is just the message ID which keep incrementing after your session has been created. It should change, it's ok.

We have multiple servers to balance and distribute the load.

You're very keen on this point in the logs :slight_smile: I think the timing is definitely an issue, and you can improve things with the WITH_ACK flag.

3 Likes

@BDub, I was not power cycling. Is a reset adequate or does the device actually have to be powered down? I believe flashing new code involves a reset, so a reset is likely not adequate.

A reset does not power cycle the cellular modem so to clear that, you need to power-cycle the whole device or go the extra steps in code to tell the module to power down and reset.

@Niklas @ctmorrison @BDub

For me, the never ending green blinking was happening because my modem was not able to connect to the network. I was using 3rd party SIM. It is my experience that electron sometimes ends up confused what network mode it should select, 2G, 3G. This happens if I frequently switch between particle SIM and 3rd party SIMs. Which network band your modem selects to connect depends on how it is configured. There are few different configurations for it

  • 2G only
  • 3G only
  • Dual mode with 2G preferred
  • Dual mode with 3G preferred

Now your electron could end up in any of these configs. and if your SIMs network and your modem configs do not match you can end up in never ending green blinking state. So to absolutely make sure that the SIM and network you are using do match with your modem configuration, you need to set it explicitly to be very sure. There is a very nice utility firmware available at the following link that you can run on your electron to configure your modem settings.

Band select utility

Try running this utility and set your modem configuration to match your SIM network. For 3G electrons the best setting is ‘dual mode with 3G preferred’ but for testing purpose try setting it to ‘3G only’ mode so that it does not fallback to 2G in case the 3G coverage in your area is not 100%. Play around with it, use different settings and I am sure you will find the right one that works for you.

Thanks for your replies.

So I will use the WITH_ACK flag for my publishes and change the sleep variables to long instead of unsigned long (based on this topic: Electron Deep Sleep and never wake up?)

The docs says for using no flag:

Unless specified otherwise, events sent to the cloud are sent as a reliable message. The Electron waits for acknowledgement from the cloud that the event has been received, resending the event in the background up to 3 times before giving up.

What is the different between this quote (no flag) and WITH_ACK flag? What will happen if I use the WITH_ACK flag and the acknowledge is never received? Will the application delay forever or is there a timeout? I could handle this with the watchdog.

Like you see in my code, I have a delay(10000) after the publishes. It seems that this is too short for receive the default ack (sent without the WITH_ACK flag, see docs) or does the delay block cloud communications?

@noumanh I only have the 2G version of the Electron. So the ublox modem only can GSM/GPRS.

As ScruffR pointed out already, a reset does not power cycle the modem. This was an intentional part of the design of the Electron, such that operations like flashing firmware OTA or DFU, short sleeping and waking, do not cause the modem to power cycle and go through a long connection process with the cell tower.

If you are manually switching SIMs, the easiest thing to do is just remove power (battery and USB) and then plug them back in after your new SIM card is inserted and new code is on the device. Sometimes this is a three step process, you might OTA new firmware first, remove power, switch SIMs and apply power again.

It's a good idea to perform the steps above when switching SIMs. Changing the Band Select is an advanced feature and not recommended if you are switching cards or your device is moving to different locations. It's best to keep the Electron in the most automatic mode possible to ensure it's radio access technology (RAT) aligns with what is currently available.

Not using a flag only makes the Publish block until the message is sent to the Cloud, it does not ensure the ACK is received. Electrons are also supposed to wait for all confirmable messages to be ACK'd before sleeping, and I don't see that happening with your logs even though you say that's with firmware 0.6.1 right? I'm currently looking into if the device needs to receive the ACK before the message is posted to the event stream.

If you use WITH_ACK and the ACK is never received, it will timeout after 30 seconds. The function call would return false.

You mean 1000. Possibly yes, but I'm not certain by looking at your logs that your Published message is represented there either. Try WITH_ACK and please post your logs and results. Thanks!

1 Like

@Niklas I think even if you have 2G only device it's worth making sure that the modem is configured correctly. Also, keep in mind that telcos are gradually phasing out 2G so make sure thats not the case in your area.

So the consensus is that if the Cloud receives the Published message, it will post it to the event stream. So I would guess that you are sleeping before the Electron ever gets to publish it's message.

Try two tests, logging just before the Publish something like:
LOG(INFO,"Publish 1 Started");

Test 1, increase your 1 second delay to 10 seconds.

Test 2, keep the delay 1 second and use the WITH_ACK flag.

1 Like

Sience I use the flag WITH_ACK, my Electrons are no longer in strange situations. No key errors anymore, no connection problems, no sleeping forever.

One time I had the issue that the RGB light stays cyan after few seconds of cyan breathing. After a reset and new keys, this didn’t happen again.

When the problems appears again, I will let you know it.

Many thanks for the support :+1:

2 Likes

Let @Bdub know the new firmware has solved some old problems.

1 Like

@ScruffR Can you elaborate on the best/recommended way to power down and reset the cellular modem?

I've seen cellular.command("AT+CFUN=1\r\n") used.

My understanding is that CFUN=1 sets the modem to "full functionality". If it is already in full functionality by default (another assumption on my part) , does that cellular command reset it?

I’d leave that to some Particle engineer :wink:

2 Likes

Adding to @mox reply, I have located @rickkas7 github repository that indicates the best way to approach this which states (https://github.com/rickkas7/electronsample):

Particle.disconnect();
unsigned long startTime = millis();
while(Particle.connected() && millis() - startTime < 15000) {
  delay(100);
}
Cellular.command(30000,"AT+CFUN=16\r\n");
delay(1000);
System.sleep(SLEEP_MODE_DEEP,10);

Though I would like to clarify a few parts from @rickkas7 or a Particle Engineer as @ScruffR mentioned.

This repository was last update a couple years ago and want to confirm that this is the most appropriate method to fully reset the device since System.reset() doesn’t reset the cellular modem.

Also, according to the SARA-UBLOX-R4 datasheet, AT+CFUN=16 will not work and instead should use AT+CFUN=15. This also states that this will reset the SIM card on the unit. Will this cause any issues with the 4G boards? Should we not reset the SIM card and go a different route?

I would not use the AT+CFUN=16 technique in electronsample. Device OS itself will power cycle the modem after 5 minutes of failing to connect. And it knows how to do it on all device models, and can also toggle the GPIO that powers it down.

However, Device OS will not reset the device itself, and that could fix some failures. What I would do is after failing to connect for more 7 minutes or so, reset. On Gen 2, go into SLEEP_MODE_DEEP for 10-30 seconds. On Gen 3, you can’t do that, so just System.reset().

1 Like

With the Boron, if the device is remote and you want to be able to completely power cycle the entire device with absolute certainty this circuit is handy:

1 Like

Thank you for sharing the circuit to power cycle the entire device as this may come in handy for us!

Can you elaborate on why you do not recommend on using the AT+CFUN=16 technique?

In your first reply, you state

"can also toggle the GPIO that powers it down"

Are you referring to the enable pin that will power down the modem or the device itself, or both?

In manual mode with system threading enabled, will waiting ~5 minutes for the device to power cycle the modem still be used since its up to the user to handle the connectivity of the device?

You mentioned Gen 2 and Gen 3 devices. We know that Argon, Boron, and Xenon are Gen 3 while Electron, Photon are Gen 2. Is there anywhere in the documentation that states whether a device is Gen 3 or Gen 2 as I can't find it anywhere? I am specifically asking for E-Series, both 3G and 4G units?

There does not appear to be any advantage of using AT+CFUN=16, and because the command varies between modules, it’s not worth the effort of using it, in my opinion.

The modem can be hardware reset using an internal, undocumented, GPIO. Device OS itself knows how to use it, which varies depending on the device and the amount of time the pin needs to be held in various states varies depending on the modem, which is why I recommend letting Device OS reset the modem, because it’s hard to do properly across all devices.

Yes, the modem will still reset automatically in manual mode.

Gen 2 devices include the Photon, P1, Electron, and E Series. They all have STM32F205 processors.

Gen 3 devices include the Argon, Boron, Xenon, and B Series SoM. They all have nRF52840 processors.

Both generations include a variety of cellular modems including 2G, 3G, and LTE Cat M1. The only exception is that only the Electron has 2G-only (G350), 3G Americas (U260), and 3G Europe/Asia/Africa (U270). The E Series and Gen 3 devices use a world-wide 2G/3G model (U201) instead.

1 Like