Using mesh.publish kills the cloud connection

I've noticed something odd with the Xenon. A simple program that uses Particle.Publish is very reliable, but add a Mesh.publish and it quickly loses it's connection to the cloud. The Particle.Publish (w/ACK) will start returning false after awhile (~2min), though they will actually make it to the cloud for another few minutes before it starts blinking cyan and then nothing works. Oddly the Particle subscriptions, firmware updates, and function calls no longer make it through to the Xenon after that first failure (~2min). Happens on v1.1.0 as well as 1.2.1-rc.2.

Here's a test app to demonstrate the issue, it typically takes 2min for it to start showing up, regardless of publish interval. After you can reproduce the issue, comment out the Mesh.publish and try again and you'll see it remains connected.

SerialLogHandler logHandler(LOG_LEVEL_NONE, { { "app", LOG_LEVEL_TRACE  } });

unsigned long publishFrequencySec = 5;
unsigned long lastPublishTimeMs = 0;

void setup() {
	Particle.subscribe("test", handleMessage, MY_DEVICES); 
}
void loop() {
    if(millis() - lastPublishTimeMs > publishFrequencySec*1000UL) { 
        lastPublishTimeMs=millis();
        //Comment out the following line and the cloud connection remains stable
        Mesh.publish("test", String::format("uptime: %lu", millis()));
    	bool success = Particle.publish("test", String::format("uptime: %lu", millis()), 60, PRIVATE, WITH_ACK);
        if (success) {
            Log.info("Publish success");
        } else {
            Log.error("Publish failed.");
        }
    }
 }
void handleMessage(const char *event, const char *data)
{
	Log.info("Received message - " + String(event) + ": " + String(data));
}

Here's what this looks like:

0000116505 [app] INFO: Publish success
0000116514 [app] INFO: Received message - test: uptime: 116200
0000121658 [app] INFO: Publish success
0000121667 [app] INFO: Received message - test: uptime: 121201
0000126567 [app] INFO: Publish success
0000126576 [app] INFO: Received message - test: uptime: 126202
0000151201 [app] ERROR: Publish failed.
0000171201 [app] ERROR: Publish failed.
0000186792 [app] ERROR: Publish failed.
0000206792 [app] ERROR: Publish failed.

Before boiling this issue down to that test app I tried numerous ways to get the Xenon to reconnect with no success. In v1.1.0 it still thought it was connected the whole time but with the latest 1.2.1-rc.2 firmware it can tell when it disconnects but can't fix the connection on it's own, no amount of turning on/off mesh/network/cloud has any effect. The only thing that works is a system.reset. A reset of the gateway Argon has no real effect, the Xenon will go offline and reconnect but publishes/functions/firmware updates will keep failing.

Have you tried this with all logging on?

I ran in to this issue: Mesh.publish() alloc errors and function/variable issues a while back. I haven’t had a chance to re-test it on newer versions though. I’m wondering if you’ll see similar logging.

Oh, you’re right. I don’t recall seeing this error before with my full program but it’s pretty obvious with TRACE level on my test app.

0000111324 [app] INFO: Publish success
0000111329 [app] INFO: Received message - test: uptime: 111129
0000111331 [comm.protocol] INFO: rcv'd message type=8
0000116527 [comm.protocol] TRACE: Reply recieved: type=2, code=0
0000116528 [comm.protocol] INFO: message id 112 complete with code 0.00
0000116529 [comm.protocol] INFO: rcv'd message type=13
0000116529 [app] INFO: Publish success
0000116536 [app] INFO: Received message - test: uptime: 116130
0000116536 [comm.protocol] INFO: rcv'd message type=8
0000121427 [net.th] TRACE: input failed to alloc
0000121433 [net.th] TRACE: input failed to alloc
0000126379 [net.th] TRACE: input failed to alloc
0000126386 [net.th] TRACE: input failed to alloc
0000134958 [net.th] TRACE: input failed to alloc
0000136184 [net.th] TRACE: input failed to alloc
0000137784 [net.th] TRACE: input failed to alloc
0000138712 [net.th] TRACE: input failed to alloc
0000140513 [net.th] TRACE: input failed to alloc
0000141129 [app] ERROR: Publish failed.
0000141434 [net.th] TRACE: input failed to alloc
0000141441 [net.th] TRACE: input failed to alloc
0000146695 [net.th] TRACE: input failed to alloc
0000146947 [net.th] TRACE: input failed to alloc
0000147249 [net.th] TRACE: input failed to alloc
0000148912 [net.th] TRACE: input failed to alloc
0000151250 [net.th] TRACE: input failed to alloc
0000154652 [net.th] TRACE: input failed to alloc
0000157404 [net.th] TRACE: input failed to alloc
0000158906 [net.th] TRACE: input failed to alloc
0000160155 [net.th] TRACE: input failed to alloc
0000161129 [app] ERROR: Publish failed.
0000161409 [net.th] TRACE: input failed to alloc
0000161415 [net.th] TRACE: input failed to alloc
0000164952 [net.th] TRACE: input failed to alloc
0000166608 [net.th] TRACE: input failed to alloc
0000167160 [net.th] TRACE: input failed to alloc
0000169380 [net.th] TRACE: input failed to alloc
0000170506 [net.th] TRACE: input failed to alloc
0000174273 [net.th] TRACE: input failed to alloc
0000176536 [comm.protocol] ERROR: Event loop error 1
0000176536 [system] WARN: Communication loop error, closing cloud socket
0000176537 [system] INFO: Cloud: disconnecting
0000176537 [system] INFO: Cloud: disconnected
0000176538 [system] INFO: Cloud: connecting
0000176539 [system] INFO: Read Server Address = type:1,domain:$id.udp-mesh.particle.io
0000176540 [system] INFO: Loaded cloud server address and port from session data
0000176541 [system] TRACE: Address type: 1
0000176542 [system] TRACE: Cloud socket=0, family=10, type=2, protocol=17
0000176543 [system] INFO: Cloud socket=0, connecting to 64:FF9B::36A6:B5BF#5684
0000176543 [system] TRACE: Cloud socket=0, connected to 64:FF9B::36A6:B5BF#5684
0000176544 [system] TRACE: Updating cloud keepalive for AF_INET6: 30000 -> 30000
0000176545 [system] TRACE: Applying new keepalive interval now
0000176546 [system] INFO: Cloud socket connected
0000176546 [system] INFO: Starting handshake: presense_announce=0
0000176547 [comm.protocol.handshake] INFO: Establish secure connection
0000176737 [comm.dtls] WARN: session has 0 uses
0000176743 [comm.dtls] INFO: (CMPL,RENEG,NO_SESS,ERR) restoreStatus=0
0000176743 [comm.dtls] INFO: out_ctr 0,1,0,0,0,0,0,239, next_coap_id=74
0000176744 [comm.dtls] INFO: app state crc: cached: 8cd594ee, actual: 8cd594ee
0000176745 [comm.dtls] WARN: skipping hello message
0000176745 [comm.dtls] INFO: restored session from persisted session data. next_msg_id=116
0000176746 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 2
0000176747 [comm.protocol.handshake] INFO: resumed session - not sending HELLO message
0000176751 [system] INFO: Send spark/hardware/max_binary event
0000176754 [system] INFO: Send subscriptions
0000176758 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 4
0000176758 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 3
0000176759 [comm] INFO: Sending TIME request
0000176760 [system] INFO: Cloud connected
0000176761 [app] ERROR: Publish failed.
0000176868 [net.th] TRACE: input failed to alloc
0000176875 [net.th] TRACE: input failed to alloc
0000176920 [net.th] TRACE: input failed to alloc
0000176928 [net.th] TRACE: input failed to alloc
0000176934 [net.th] TRACE: input failed to alloc
0000176940 [net.th] TRACE: input failed to alloc
0000176947 [net.th] TRACE: input failed to alloc
0000176953 [net.th] TRACE: input failed to alloc
0000176975 [net.th] TRACE: input failed to alloc
0000176981 [net.th] TRACE: input failed to alloc
0000178578 [net.th] TRACE: input failed to alloc
1 Like

Well I’m both happy and sad to see you’re seeing the same issue as I was. Where there’s two now I’ve sent a message to support to see if they can look.

@Fragma can you still reproduce this? If so I might ask support to email you rather than me, as I’m not having much luck in reproducing it and they’d like to check some of their logs from when it’s happening.

Yes, though it takes twice as long now, failing around 240sec, reproduced 3x. Go ahead and send them my way :wink:

1 Like

That “input failed to alloc” is tied to the following line indicating a failure to allocate a packet buffer while processing input on the OpenThread mesh.

For the Xenon any Particle.publish() has to travel across the mesh and out through a gateway so a mesh layer failure would also impact Particle.publish().

Needs confirmation, but it looks like you have a pretty minimal test case that showing possible leak of packet buffers impacting mesh/cloud connectivity on the Xenon. Thanks for the report and I’ll make sure it gets to the right place.

Thanks for reporting this! I’ve managed to reproduce the issue. Please confirm whether you have other nodes within your mesh network doing mesh publishes. If so, a temporary workaround until we include the fix in one of the next releases is to add a dummy Mesh.subscribe() to any event. Without a single subscription we are apparently not reading the data out of the mesh pub/sub socket.

I’ve also created an issue on GitHub, so you may follow it for an update on when it gets fixed: https://github.com/particle-iot/device-os/issues/1828

4 Likes

Thanks, I’ve confirmed adding a subscribe to the test program resolves my issue. I do have other devices on the mesh network and they already had a mesh subscription.

I have consistently seen the same results as originally described by Fragma using a number of Xenons with an Argon gateway.

I am using the recently updated Ubidots library to send data to Ubidots (ubidots.setCloudProtocol(UBI_TCP):wink: through the Argon gateway (ubidots.meshLoop():wink: every 20 minutes:

ubidots.add("pulseSum", pulseSum); 
ubidots.meshPublishToUbidots();
delay(5000); 

This does continue to work even after the Particle cloud connection is lost, and as Fragma describes, only a reset of the Xenon will resolve the issue temporarily.

I have tried the suggestion of a dummy Mesh.subscribe() without success:
Mesh.publish(“testVar”); on the Xenon, with Mesh.subscribe(“testVar”, myHandler); running the Argon.

I am grateful for any suggestions.

Thank you!

The workaround was to add the susbcribe on the Xenon. No change is needed on the Argon.

Oh! Thank you very much, I will try this approach today. Thanks for taking time to help!

There has been some improvement with the addition of pub-sub statements on the Xenon; however, the 4 Xenon’s I have running this code periodically start blinking madly and go off-line for a time, but are able to reconnect to the cloud. The Particle.publish statements did publish initially but no longer, and variables are not visible on the console. My simple code to collect data from a particulate matter sensor below. Thanks very much for your time and help!

Ubidots ubidots(TOKEN, UBI_INDUSTRIAL, UBI_MESH);

int once = 1;
int sendTime = 0;

int testVar = 0; //dummy variable for mesh pub-subcribe

int pulseCount = 0; // pulseCount is incremented with each interupt on D3.
int pulseSum = 0; // pulseSum is the # of pulses counted during the 20 second measurement period.
int measureTime = 0; // measureTime is the 20s measurement period specified in the data sheet.

void setup() 
{
    
    Mesh.subscribe("testVar", myHandler); //subscribe statement to dummy variable.
    
    pinMode(D3, INPUT);
    attachInterrupt(D3, count, RISING); // minimum pulse with is 0.5ms.

    Particle.variable("pulseSum", &pulseSum, INT);

} // End set-up

void loop() 
{
    
    if (measureTime == 20) {
        
        pulseSum = pulseCount; 
        
        measureTime = 0;
        pulseCount = 0;
        
    }
    
    delay(1000);
    
    measureTime ++; //inrement +1 each loop and count to 20 seconds.

    //20 minute sample and upload cycle; changed to shift+1 one minute 7/21/2019

    sendTime = Time.minute();
  
  if ((((sendTime == 21) || (sendTime == 41)) || (sendTime == 1)) && (once == 1)) {
      
      //will equal 20 until Time increments to 21
      
      transmitdata();
      Mesh.publish("testVar", String(testVar)); //mesh publish dummy variable to be received by subscribe statement.
      delay(5000); //added delay as test 7/24/2019
      
      Particle.publish("transmitdata ran", String(sendTime), 60, PRIVATE);
      
      once = 0;
      
    } //end upload 3 times per hour
   
   if (((sendTime == 22) || (sendTime == 42)) || (sendTime == 2)) once = 1;

}//End main loop

void myHandler(const char *event, const char *data) // dummy handler function for mesh subscribe statement
{
    
  testVar = atof(data);  
  Particle.publish("mesh pub-subscribe event", String(testVar), 60, PRIVATE); 
  
}

// Interupt service routine.

void count()
{ 
    
    pulseCount = pulseCount + 1;
    
} //End count function.

void transmitdata()
{
    
   //send data through gateway via mesh/tcp to ubidots
    ubidots.add("pulseSum", pulseSum); //type, format ??? Not a float value? Could this be an issue?
    ubidots.meshPublishToUbidots();
    delay(5000); 
    
}//end transmitdata function