Argon Breathing Cyan but not reachable


#1

Good Day,

I have two Argons running the same code. One of which is on my test bench. The other is mounted inside of an outside light to control the light. I have a antenna mounted to the case of the light connected on the inside to the Argon. The unit is about 150 feet away from an AP that also has a high gain antenna. Internet signal is great where the light is located.

My test bench Argon runs perfect.

The one thats in the field cycles on when plugged in, connects (breathing cyan) and works perfect for just a little bit (5 - 10mins). Then continues to breath cyan but is unreachable. The app nor the console are connecting, cannot read variable or call functions. It does list the variables and functions just cant do anything with them. They do show the device as breathing cyan, but I think I read that at this point that is normal for these devices.

So to talk this out… differences between test and field:
Different antennas - Test has supplied antenna and field has this

Distance from AP - Test 5 feet and field is 150 feet (good signal on other devices thou phone etc)

Mounted - Test is on desk on supplied breadboard and field is inside aluminum outside light with antenna mounted outside

Pertinent Code

void setup() {
  
  Time.zone(-6);
  
  WiFi.setCredentials("XXX", "###");

  Particle.function("toggleLight", toggleLight);
  Particle.function("setRise", setRise);
  Particle.function("setSet", setSet);

  Particle.variable("rise", todayRise);
  Particle.variable("set", todaySet);
  Particle.variable("current", currentTime);
  Particle.variable("lighton", lightOn);

  Serial.begin(9600); 
  pinMode(set, OUTPUT);
  digitalWrite(set, HIGH);
  lightOn = true;
  
  while (!WiFi.ready()) {
    WiFi.connect(); 
    // wait 5 seconds for connection: 
    delay(5000); 
  }

}

void loop() {

  unsigned long currentMillis = millis();

  if (lastMillis == 0 || (currentMillis - lastMillis) > delayMillis) {
    lastMillis = currentMillis;

    currentTime = Time.hour();
    Particle.publish("Check Light");
    //getWeather();
    checkToToggleLight();
  }
}

int setRise(String command) {
  todayRise = command.toInt();
}

int setSet(String command) {
  todaySet = command.toInt();
}

int toggleLight(String command) {
  if (command == "1") {
    digitalWrite(set, HIGH);
    lightOn = true;
    if (isManual == 2) {
      isManual = 0;
      manualMillis = 0;
    } else {
      isManual = 1;
      manualMillis = millis();
    }
    return 1;
  } else if (command == "0") {
    digitalWrite(set, LOW);
    lightOn = false;
    if (isManual == 1) {
      isManual = 0;
      manualMillis = 0;
    } else {
      isManual = 2;
      manualMillis = millis();
    }
    return 1;
  } else {
    return -1;
  }
}

void checkToToggleLight() {

  
  if (isManual == 2) {
    if ((millis() - manualMillis) > manualDelayMillis) {
      manualMillis = 0;
      if (lightOn == 0) {
        digitalWrite(set, HIGH);
        lightOn = true;
      }
    }
  } else {
    if (!WiFi.ready()) {
      digitalWrite(set, HIGH);
      lightOn = true;
    } else if (currentTime > todayRise && currentTime < todaySet) {
      if (lightOn) {
        //turn off
        digitalWrite(set, LOW);
        lightOn = false;
      }
    } else {
      if (!lightOn) {
        //turn on
        digitalWrite(set, HIGH);
        lightOn = true;
      }
    }
  }
}

Im sure that there are details that I forgot. So if your interested in lending a hand, just ask.

Any input would be greatly appreciated.


#2

I built a garage door opener, using essentially the same hardware as you, except I have a reed switch telling me door position. It would work solid for about 2-4 days, then stop.

I have put in a watchdog timer to check if it hangs, then reset the device. Here’s the documentation for it:
https://docs.particle.io/reference/device-os/firmware/photon/#application-watchdog


#3

Thanks. Ill try that.

I guess one should switch the relay to have it normally closed then so that if in fact it is reseting every 5 minutes the light isnt turn on and off every 5 minutes with each reset.

Id still like to figure out whats causing it not to reach the cloud even thou its blinking that its connected.


#4

Hi,
here is my deceiving experience, telling me Argon (at least) is not a reliable HW, at least now, at least in my project :frowning:

One Argon on my balcony, decent WiFi signal. A couple of Xenon to be placed somewhere around.
A simple test code: a Mesh.publish is sent every 2 seconds from the Argon to switch on and off the blue led on my Xenons!
Easy, it works fine, cool, let’s go to next step… oh oh after less than 24 hours the Argon is hung! Again and again. :persevere:
No publish, no more ping-able on the home network.
It still breathes cyan, but the loop is broken, even the local switch on/off of the blue led is stopped.
And of course the Xenons can’t connect to the mesh anymore.

I have 0.8.0-rc.27 on all devices, which release should I wait for to make this basic step work reliably??
Thanks!

       Andrea

#5

My experience has not been the same although I have had a couple of issues with the Argon on RC27. I have 2 separate Argon-Xenon networks running. The Argon has locked up inexplicably once or twice. I suspect the problem I saw was from a temporary WiFi or internet disruption which the Argon did not recover from. With that said, I have not had that problem consistently and my networks have been running for at least 30 days without an incident.

Post your code for review. As simple as the application might sound, it seems on this forum that about 9 times out of 10 it is the user firmware misbehaving that causes problems. Aside from that, check your WiFi environment and network connections for intermittent issues. Since the Argon not being able to recover from WiFi /internet disruptions is reported a lot on here, I suspect that is already being worked on for the next RC release.


#6

Thanks ninjatill,
I want to believe it might be an issue with the WiFi signal (reported RSSI is usually between -65 and -75).
Or even better a firmware issue!
If it is the first case I can try to keep the Argon closer to WiFi AP, at least for some testing.

Here is the code I use in the Argon in question:

// Publish events (ARGON)

int rssi;

void setup() {
    pinMode(D7, OUTPUT);
}

void loop() {

    rssi = WiFi.RSSI();
    Particle.publish(WiFi.SSID(), String(rssi));

    for (int i=0; i<10; i++) {
        Mesh.publish("event1", "off");
        digitalWrite(D7, LOW);
        delay(2000);
        Mesh.publish("event2", "on");
        digitalWrite(D7, HIGH);
        delay(2000);
    }
}

Anything wrong here?
Thanks.


#7

Yes. Your loop blocks for 40 seconds. Blocking that long causes the Particle cloud connection to drop because you are not servicing the Particle.process() routine. That routine gets serviced automatically between loop()s. You can either adopt a non-blocking approach or you can add Particle.process() inside your for loop. The best approach is determined by how strict your timing requirements are. The lack of cloud servicing can cause your application to appear to lockup or be unresponsive from the cloud.

Also, your Particle.publish() is not in the correct format; you need to include the PRIVATE or PUBLIC flags. You should get rid of the String() call and rather use snprintf() and a small message buffer. This most likely won’t cause a lockup but is best practice.


#8

Cool!
Made the corrections, let’s see.
I should have asked before! :wink:
Thanks.

PS: I had the wrong impression that a delay() would have given time to service the Particle.process() routine!


#9

While it’s true that multiple delay(2000) calls will considerably degrade the responsiveness of your device it shouldn’t cause a complete connection loss.
In the early days this was the case, but quite early in the Spark Core days this was dealt with and extra measures have been taken to make delay() call Particle.process() once every accumulated 1000ms waiting time (when not running SYSTEM_THREAD(ENABLED), in that case it’s not required since the application thread just idles).
However that 1sec “deafness” with the occasional microsecond where the device will pay attention can well account for the device appearing unreachable.


#10

Gentlemen,
thanks for the clarification, as said I corrected the code, now it looks like this:

// Publish events (ARGON)

float rssi;
char buf[6];
WiFiSignal sig;

void setup() {
    pinMode(D7, OUTPUT);
}

void loop() {

    sig = WiFi.RSSI();
    rssi = sig.getStrength();
    sprintf(buf, "%.0f%%", rssi);
    Particle.publish("WiFi SSID", buf, PRIVATE);

    for (int i=0; i<10; i++) {
        Mesh.publish("event1", "off");
        digitalWrite(D7, LOW);
        delay(2000);
        Particle.process();
        Mesh.publish("event2", "on");
        digitalWrite(D7, HIGH);
        delay(2000);
    }
}

Any other suggestion or correction? Should I use (multiple) smaller delays? Because, needless to say, it still stopped working after less than 24 hours.
One more thing: in this project I don’t mind too much if the cloud is not 100% available, but I can’t afford the mesh network to be unavailable and I don’t want to reboot the Argon every day! :frowning:

Wifi signal is between 50% and 60% now, not that bad I would say.
Thanks for your help.

          Andrea

#11

You didn’t really remove the delays, so the prime factor both of us mentioned is still unchanged.
Looking at your code you should rather turn round your logic.
You want the Mesh.publish() done more frequently than the Particle.publish() so the Mesh.publish() part should be what you do once per iteration of loop() and Particle.publish() you do only every X iterations.

like

SYSTEM_THREAD(ENABLED);

const uint32_t msMeshInterval = 2000;        // Mesh.publish should happen every 2000ms
const uint32_t msParticleInterval = 10 * msMeshInterval;

void setup() {
  pinMode(D7, OUTPUT);
  waitUntil(Mesh.ready);
}

void loop() {
  static bool     meshState = true;          // current state 
  static uint32_t msLastMeshPublish = 0;     // keep track of last action
  static uint32_t msLastParticlePublish = 0; // keep track of last action

  if (millis() - msLastMeshPublish < msMeshInterval 
  && msLastMeshPublish)                      // when time is NOT up but not the first visit 
    return;                                  // bail out immediately
                                             // otherwise proceed
  msLastMeshPublish = millis();              // store timestamp
  meshState = !meshState;                    // toggle current state
  digitalWrite(D7, meshState);               // reflect meshState on D6 LED 
  if (Mesh.ready()) {
    if (meshState) Mesh.publish("even2", "on");
    else           Mesh.publish("even1", "off");
  }

  if ((millis() - msLastParticlePublish >= msParticleInterval 
  || !msLastParticlePublish)                // when it's time to publish or on first visit
  && Particle.connected()) {                // and we have a cloud connection 
    char buf[6];
    snprintf(buf, sizeof(buf), "%.0f%%", WiFi.RSSI().getStrength());
    Particle.publish("WiFi SSID", buf, PRIVATE);
    msLastParticlePublish = millis(); 
  }
}

This code attends to the cloud connection about 1000 times more frequently than yours since it doesn’t use delay but only ever rushes through loop() in one go.

However, currently the Argon has some difficutlies coming back online once the WiFi network disappears - that’s worked on.
But if that happens to you every 24 hours it might be that your WiFi has a 24 hour lease time for the IP which may cause the Argon to be kicked off the WiFi causing the trouble.
If so, try to change the lease time on your WiFi AP.


#12

Thanks ScruffR for taking the time to write it all,
I will change the logic then. This is pretty different from mine, but its fine!

Let’s suppose the issue is coming back online from a WiFi down, is this also known to block the loop execution and mesh function? This is what I experience.

I didn’t measure exaclty how long the Argon keeps working, anyway looking back at some logs I would say it goes roughly between 8 and 14 hours. I’m not sure the lease time is really involved.

I will let you know (I need to go back home to reboot the Argon to test again…)
Thanks,

         Andrea

#13

Yes, in SYSTEM_MODE(AUTOMATIC) (which is default) that’s the expected behaviour, hence I’ve added SYSTEM_THREAD(ENABLED) to decouple the application thread from the system chores.


#14

Hi again,
I could finally make some more tests on my Argon:

first I did some minor changes to my code, removing the long delay() and having more frequent short ones, interleaved with Particle.process(). This was a failure, apparently nothing changed.

Then I tried your suggested code, but this seem to prevent my Xenon(s) from connecting to the mesh!
Also, there must be a minor issue, because the D7 led is not staying 2 secs on and 2 secs off, as it was in my code. It is always on and briefly flickering every 2 seconds.

So I’m faithless again!
I should probably wait for a new firmware where the Wi-Fi issue is fixed?
Thanks for your help.

   Andrea

#15

Yes, I had three typos in the code above since I just wrote that code off the top of my head without actually running it.
The code should now do what you’d expect.

However, I couldn’t replicate your issue with the failing mesh connection.
But just for safety I added a waitUntil(Mesh.ready) call in setup().


#16

I think I had only found 2 of the 3 typos…
So now it looks good, let’s see over the next hours!

Next step is also experimenting with external BT antennas, as the range is not sufficient for now.
Thanks

UPDATE: after a weekend worth of tests I can say the Argon is lasting sligtly longer, but can’t survive for more than 19-20 hours before getting stuck again. I hope to see a new firmware version soon!


#17

One thing you should try is to check to see the last time you were connected to the Particle cloud, and if it is greater than a predefined timeout interval, call a reset on the device. Something like:

// top of program
uint32_t last_time_particle_connected;
const uint32_t particle_connected_timeout = 300000;  // this is in ms

// in setup()
last_time_particle_connected = millis();

// somewhere in loop()
if (Particle.connected()) last_time_particle_connected = millis();     // we are connected, so reset the timer
else if ( (millis() - last_time_particle_connected) > particle_connected_timeout ) {
     // we have been disconnected for too long, so let's reset everything!
     #if Wiring_Wifi
     Wifi.off();
     delay(1000);
     #endif
     System.reset();
}

This just forces a fresh restart, including of the WiFi chip, if you have connectivity issues for over 5 minutes. Something like this is generally good practice, because there is always something that will go wrong in software. Thus, you need to identify your critical success conditions (in this case Particle is connected to the cloud within 5 minutes, every 5 minutes) and create a logical check to see if they have a problem. If they do, either mitigate or simply reset.

Note that there are other additional modem reset steps for Cellular devices that I did not include here. Also note that if you set your timeout too short you may have difficulties during first time connection.