B523 fails reconnect loop with SEMI_AUTOMATIC - modem out of sync?

Dear Particle Commmunity,

I’ve started to see some of our B523 SoM fail to connect on DeviceOS 3.2.0/3.1.0. They are sleepy devices, so the connection logic is executed every hour or so, and I utilize Rickkas PublishAsyncQueuePosix to store events during offline measuring and I loop while connected. However, it seems like the connection state never reaches ‘connected’. It simply fails to initialize modem, powers it down and retries over and over without any luck.

0107049906 [ncp.client] ERROR: Failed to perform early initialization
0107049906 [ncp.client] TRACE: Hard resetting the modem
0107050306 [ncp.client] TRACE: Waiting the modem to restart.
0107061181 [ncp.client] TRACE: Successfully reset the modem.
0107061182 [net.pppncp] TRACE: NCP event 3
0107061182 [net.pppncp] TRACE: NCP power state changed: IF_POWER_STATE_POWERING_DOWN
0107061182 [system.nm] TRACE: Interface 4 power state changed: 3
0107066182 [ncp.client] TRACE: Powering modem off
0107087752 [net.pppncp] TRACE: NCP event 3
0107087753 [net.pppncp] TRACE: NCP power state changed: IF_POWER_STATE_DOWN
0107087753 [system.nm] TRACE: Interface 4 power state changed: 1
0107087754 [ncp.client] TRACE: Modem powered off
0107087754 [net.pppncp] ERROR: Failed to initialize cellular NCP client: -210
0107087854 [ncp.client] TRACE: Powering modem on, ncpId: 0x62
0107087855 [net.pppncp] TRACE: NCP event 3
0107087855 [net.pppncp] TRACE: NCP power state changed: IF_POWER_STATE_POWERING_UP
0107087855 [system.nm] TRACE: Interface 4 power state changed: 4
0107098960 [net.pppncp] TRACE: NCP event 3
0107098961 [net.pppncp] TRACE: NCP power state changed: IF_POWER_STATE_UP
0107098961 [system.nm] TRACE: Interface 4 power state changed: 2
0107098962 [ncp.client] TRACE: Modem powered on
0107099962 [ncp.at] TRACE: > AT
0107099965 [ncp.at] TRACE: < OK
0107099965 [ncp.client] TRACE: NCP ready to accept AT commands
0107099966 [ncp.at] TRACE: > AT+CFUN=1,0
0107099969 [ncp.at] TRACE: < OK
0107099970 [ncp.at] TRACE: > AT+IFC=2,2
0107099973 [ncp.at] TRACE: < OK
0107099973 [ncp.at] TRACE: > AT
0107099974 [ncp.at] TRACE: < OK
0107099975 [ncp.at] TRACE: > AT+IPR=460800
0107099978 [ncp.at] TRACE: < OK
0107100979 [ncp.at] TRACE: > AT
0107100981 [ncp.at] TRACE: < OK
0107100981 [ncp.at] TRACE: > AT+CPIN?
0107100983 [ncp.at] TRACE: < +CPIN: READY
0107100983 [ncp.at] TRACE: < OK
0107100984 [ncp.at] TRACE: > AT+CCID
0107100985 [ncp.at] TRACE: < +CCID: 89314404000821576023
0107100985 [ncp.at] TRACE: < OK
0107100985 [ncp.at] TRACE: > AT+QDSIM=0
0107100986 [ncp.at] TRACE: < OK
0107100987 [ncp.at] TRACE: > AT+CMUX=0,0,7,1509,,,,,
0107100988 [ncp.at] TRACE: < OK
0107100988 [mux] INFO: Starting GSM07.10 muxer
0107100989 [ncp.client] ERROR: Failed to perform early initialization
0107100989 [ncp.client] TRACE: Hard resetting the modem
0107101389 [ncp.client] TRACE: Waiting the modem to restart.
0107112319 [ncp.client] TRACE: Successfully reset the modem.
0107112320 [net.pppncp] TRACE: NCP event 3
0107112320 [net.pppncp] TRACE: NCP power state changed: IF_POWER_STATE_POWERING_DOWN
0107112320 [system.nm] TRACE: Interface 4 power state changed: 3
0107117320 [ncp.client] TRACE: Powering modem off
0107139515 [net.pppncp] TRACE: NCP event 3
0107139516 [net.pppncp] TRACE: NCP power state changed: IF_POWER_STATE_DOWN
0107139516 [system.nm] TRACE: Interface 4 power state changed: 1
0107139516 [ncp.client] TRACE: Modem powered off
0107139517 [net.pppncp] ERROR: Failed to initialize cellular NCP client: -210
0107139617 [ncp.client] TRACE: Powering modem on, ncpId: 0x62
0107139618 [net.pppncp] TRACE: NCP event 3
0107139618 [net.pppncp] TRACE: NCP power state changed: IF_POWER_STATE_POWERING_UP
0107139618 [system.nm] TRACE: Interface 4 power state changed: 4
0107150718 [net.pppncp] TRACE: NCP event 3
0107150719 [net.pppncp] TRACE: NCP power state changed: IF_POWER_STATE_UP
0107150719 [system.nm] TRACE: Interface 4 power state changed: 2
0107150720 [ncp.client] TRACE: Modem powered on
0107151720 [ncp.at] TRACE: > AT
0107151723 [ncp.at] TRACE: < OK
0107151723 [ncp.client] TRACE: NCP ready to accept AT commands
0107151724 [ncp.at] TRACE: > AT+CFUN=1,0
0107151728 [ncp.at] TRACE: < OK
0107151728 [ncp.at] TRACE: > AT+IFC=2,2
0107151731 [ncp.at] TRACE: < OK
0107151731 [ncp.at] TRACE: > AT
0107151733 [ncp.at] TRACE: < OK
0107151733 [ncp.at] TRACE: > AT+IPR=460800
0107151736 [ncp.at] TRACE: < OK
0107152737 [ncp.at] TRACE: > AT
0107152739 [ncp.at] TRACE: < OK
0107152740 [ncp.at] TRACE: > AT+CPIN?
0107152742 [ncp.at] TRACE: < +CPIN: READY
0107152742 [ncp.at] TRACE: < OK
0107152743 [ncp.at] TRACE: > AT+CCID
0107152744 [ncp.at] TRACE: < +CCID: 89314404000821576023
0107152744 [ncp.at] TRACE: < OK
0107152745 [ncp.at] TRACE: > AT+QDSIM=0
0107152746 [ncp.at] TRACE: < OK
0107152746 [ncp.at] TRACE: > AT+CMUX=0,0,7,1509,,,,,
0107152747 [ncp.at] TRACE: < OK
0107152748 [mux] INFO: Starting GSM07.10 muxer
0107152748 [ncp.client] ERROR: Failed to perform early initialization
0107152748 [ncp.client] TRACE: Hard resetting the modem
0107153148 [ncp.client] TRACE: Waiting the modem to restart.
0107163993 [ncp.client] TRACE: Successfully reset the modem.

My disconnect logic before sleep is as follows:

void Namespace::disconnect()
{
   // Disconnect from cloud
   Particle.disconnect();

   // Turn cellular module off
   Cellular.off();
}

and connection logic is:

bool Namespace::connect(int timeout)
{
   // Disable System OTA updates whilst connecting and transmitting
   System.disableUpdates();

   // Check whether we're connected, if not -> connect!
   if (Particle.connected() == false)
   {
      // Turn on cellular module
      Cellular.on();
      delay(500);

      // Connect GSM
      Cellular.connect();
      waitFor(Cellular.ready, timeout);

      // Connect to Cloud
      Particle.connect();
      waitFor(Particle.connected, timeout);
   }
   if (Particle.connected)
   {
      return true;
   }
   return false;
}

Edit: Not sure if I should check for Cellular.isOn after calling Cellular.on - I guess that would be good practise. Reference says I shouldn’t worry about the Cellular but simply call Particle.connect (On Gen 3 devices (Argon, Boron, B Series SoM, and Tracker), prior to Device OS 2.0.0, you needed to call WiFi.on() or Cellular.on() before calling Particle.connect() . This is not necessary on Gen 2 devices (any Device OS version) or with 2.0.0 and later.)

Two things that I immediately noticed (there could be others);

Don’t call System.disableUpdates() before connecting to the cloud. It publishes an event to the cloud, and won’t do anything before you’re connected.

This is the real problem:

 if (Particle.connected)

It should be:

 if (Particle.connected())
1 Like

Thank you @rickkas7! I’ve updated the connected field to the connected() function, will update when I’ve tested a couple of units… Also, I see that Cellular.on and Cellular.ready lacked the parathensis of a function, that is also fixed now. Seems like timing was lucky enough for this to work for some time.

In relation to the disableUpdates(), if I execute the function after Particle.connected returns true, isn’t there a slight chance of a race condition between the cloud wanted to do an OTA and the disableUpdates going through?

The amount of time between connected and checking for updates is at least 8 seconds, so you have a pretty big window to disable updates.

1 Like

Alright, now I see why I lacked the paranthesis.

In the Particle documentation for waitFor, we have the following example:

    if (WiFi.setCredentials(credentials)) {
        WiFi.connect();
        waitFor(WiFi.ready, 30000);
        Particle.connect();
        waitFor(Particle.connected, 30000);
    }

When compiling as a function call, e.g. “waitFor(Cellular.ready(), timeout);”
the compiler returns with the following error:

In lambda function:
../wiring/inc/spark_wiring_system.h:810:81: error: expression cannot be used as a function
  810 | #define waitFor(condition, timeout) System.waitCondition([]{ return (condition)(); }, (timeout))

Correct. waitFor takes a function pointer. if takes the result of calling a function.

I see the problem now, thanks! Which essentially leaves me no better than before, as the returned booleaned is not used in the main function, so whether it returned true or false does actually not matter (Confusing I know.)

I just noticed that you have a waitFor right before that, so that’s not the issue.

Is there a reason you are managing the modem manually like that? With current versions of Device OS you generally don’t need to do that; System.sleep() will take care of doing that for you.

The only exception is if you conditionally wake from sleep and sometimes connect and sometimes do not connect to the cloud after wake. If you do that, call Particle.disconnect() before calling System.sleep() to prevent automatic reconnection. You do not need to wait for it to complete.

You are right in your assumption. We use SEMI_AUTOMATIC, because we want to measure a sensor every X minute (offline to save power) and transmit it every hour or so. It is rather sleepy devices that we want to conserve as much power as possible during measurements.

I’ll try to see if the newly added waitFor(Cellular.isOn, timeout); helps!

The recommended way to handle conditional connection on wake is:

  • Use SYSTEM_THREAD(ENABLED);
  • Use SYSTEM_MODE(SEMI_AUTOMATIC);
  • Use ULTRA_LOW_POWER or STOP sleep mode (not HIBERNATE)
  • Ideally connect to cellular once after a cold boot, such as by adding a Particle.connect() in setup(). This not required, however if you go immediately to sleep after cold boot the modem will not be in the lowest power state.
  • When you are ready to sleep, call Particle.disconnect() so the modem will not reconnect on wake.
  • Use the modern sleep APIs using SystemSleepConfiguration.
  • On wake, if you do not need to connect to the cloud you should be able to go to sleep very quickly, within a second or two.
  • If you do want to connect to the cloud, call Particle.connect(). You do not need to manually start up cellular.
  • If you are going to publish, make sure you wait until Particle.connected() is true.

There will be an application note on this in a few weeks.

We are doing the following things:

  • Using SYSTEM_THREAD(ENABLED);
  • Using SYSTEM_MODE(SEMI_AUTOMATIC);
  • Using ULTRA_LOW_POWER sleep mode using the modern Sleep APIs
  • We only connect if its' time to transmit measurements (I've just outcommented the cellular function, so we only use Particle.connect now).
  • We disconnect after measurement is (hopefully) transmitted using following function:
   // Disconnect from cloud
   Particle.disconnect();
   
   // Wait for Particle disconnect
   waitFor(Particle.disconnected, 10000);

   // Turn cellular module off
   Cellular.off();
  • When we give PublishQueuePosix::instance().getNumEvents() * 2100 ms connection time to run the following function:
   // Transmit measurements
   if (Particle.connected())
   {
      while (PublishQueuePosix::instance().getNumEvents() && (System.millis() - TIMEOUT_START) <= TIMEOUT_DURATION)
      {
         // Loop publish
         PublishQueuePosix::instance().loop();
         delay(1000);
         // Tickle Watchdog
         wd->checkin();
      }
      PublishQueuePosix::instance().publish("h", getHealthStatus(), WITH_ACK); // Posix Publish instead
      PublishQueuePosix::instance().loop();
      delay(1100);
   }

This is a class function, as we cannot call PublishQueuePosix::instance().loop(); from the main loop often enough (only once when it transmits data)

A single call to PublishQueuePosix::instance().loop() is probably not enough to successfully publish. In version 0.0.3 of the library, there is a new function getCanSleep() that will be helpful.

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.