Handshake, post once, then disconnect/failign

I have a handful of Borons (6) that are all having the same problem. When power cycled they boot up, handshake, post data one time, then disconnect from the cellular network (LED shifts to blinking green).

They’re all installed in the same rural area, so it might be something environmental, but the fact that they’re all able to post upon first boot makes it seem like there might be something else. Does anyone else have experience with this sort of issue and/or ideas on troubleshooting?

Thanks,
Aaron

Could be your code blocking.
Hard to say without knowing the code tho'

@scruff,

I’m pastign my main code below. I can’t think of what it would be in firmware - I have ~50 other units with identical code that are not experiencing this problem.

#include <ModbusMaster.h>

#include "CloudFunctions.h"
#include "Constants.h"
#include "Json.h"
#include "Modbus.h"
#include "Register.h"

//setup AT&T sim card 
STARTUP(cellular_credentials_set("soracom.io", "sora", "sora", NULL));

ModbusMaster node(METER_ADDRESS);
Modbus* slave;

void setup() {
	//Initialize methods
	Constants::Initialize();
	RegisterList::Initialize();
	CloudFunctions::Initialize();
	Particle.keepAlive(CELL_KEEP_ALIVE);

	//Serial printing for testing purposes
	Serial.begin(SERIAL_PORT);
	Serial.println("Initializing response");

	//initiate slave 
	slave = new Modbus();
	node.begin(BAUD_RATE);
}

// loop() runs over and over again, as quickly as it can execute.
void loop() {
	//reads and adds data from slave to json object
	for(int i = 0; i<NUMBER_OF_REGISTERS; i++){
		int result = slave->readRegisters(node, RegisterList::list[i]->Register_Address, RegisterList::list[i]->Register_Type);
		ReadData data(RegisterList::list[i]->Register_Name, result);
		Json::add(data);
	}

	if(!slave->hasError){
		//Success case!
		Json::serializeSuccess();
		//its the clean up song!
		delete(slave);
		slave = new Modbus();
		delay(CYCLE_DELAY);
	}else{
		//Failure case :(
		//Waits 1 second and then retries entire loop again for max 5 times
		slave->addToFailureCount();
		delay(FAILURE_DELAY);
	}
	//Loop starts over totally after max retries
	if(slave->failureCount==FAILURE_LIMIT){
		Json::serializeFailure();
		//don't forget to take out the garbage!
		delete(slave);
		slave = new Modbus();
		delay(CYCLE_DELAY);
	}
}

Where is CELL_KEEP_ALIVE defined?
Try adding this

#define CELL_KEEP_ALIVE 30

@ScruffR,

Ah, I didn’t include that . CELL_KEEP_ALIVE is defined in constants.h, I currently have it set to 120

You can try adding SYSTEM_THREAD(ENABLED) too.
That behaviour could come from some of the included libraries blocking the cloud task.

BTW, I also can’t see the statment that activates the external SIM.

@ScruffR

No external sim in this case. All just Borons using the native sim.

EDIT: I see I posted a version where the Soracom sim piece hasn’t been edited out. That isn’t active in this code.

Some other interesting points:

When the device is online, I cannot ping it, get the status of the RMS meter to which it is attached, or do an OTA firmware update. It does, however, post a diagnostic update every minute or two. That seems to show are error code (below).

{"device":{"power":{"battery":{"charge":100,"state":"charging"},"source":"USB host"},"system":{"uptime":41908,"memory":{"total":167952,"used":157888}},"cloud":{"connection":{"status":2,"error":17,"attempts":3,"disconnect":1},"disconnects":83,"publish":{"rate_limited":0},"coap":{"unack":225}}},"service":{"device":{"status":"ok"},"coap":{"round_trip":1213},"cloud":{"uptime":1,"publish":{"sent":0}}}}

In addition to the error code, it shows "battery":{charge":100,"state":"charging"" despite the fact that there is no battery connected

Your new and delete() calls suggest you are using dynamic memory allocation a lot.
Doing that on embedded systems like these (without proper garbage collection) can result in heap fragmentation which can lead to situations where some tasks may start misbehaving.

Another thing is all the undisclosed library calls that may take longer (e.g. readRegisters()) should be monitored for their time consumption. If one of them may take seconds to return you'll definetly see some bad impact on the cloud connection.

Hence my previous suggestion