Cyan flashing and other issues

Hi,

OK so following on from some earlier reports of various issues using 0.4.7…

  1. I reported constant flashing Cyan. Various people have expressed concern that this is caused by memory leaks etc. Well - ‘suspect’ that these leaks might actually be in the Photon code :-O. I added some code to display the state of the main SP and the ‘Free’ memory. When problems arise, I suddenly LOSE chunks of memory (in large fragments - around TCP block sizes (1400-1500 ish !!). We can simulate the errors by pulling out the ethernet from the WAN side of the wireless router, so wifi is still up but no cloud. This has immediately caused Cyan flashin on our code…

To work around this we are now running in ‘SYSTEM_MODE(MANUAL)’ - with ‘SYSTEM_THREAD(ENABLED)’, and calling process every 10 seconds or so. The system is now FAR more stable - but no cigar just yet !!. Our code continues to run even when connection is lost!! = GREAT - reporting SP and Free every 10 seconds…

I really think that the underlying Photon code has issues when packet transmission/receptions fails…

  1. Completely different problem (but likely related). After a firmware build and update, the photon regularly FAILS to reboot. The new code gets installed - somehow, and runs after pressing reset. BUT the free mem seems to drop by a large chunk - suggesting again to me that a packet has been lost - which is why it failed to reboot, and also failed to ‘clean up’ the lost packet :-O.

  2. Completely different ‘issue’ is more of a question. I have some bit-banging code interfacing to a number of 18B20 sensors (1wire) using my own code. These often report a CRC failure - and debugging aften shows a single misread bit. I use delayMicrosecond, but can this still be used safely in MANUAL mode and if I say ‘nointerrupts’ - ie is the microseconds delay using interrupts or just a hardware timer ???. I have some short delays (6uS where I disable interrupts, but the longer ones (500 uSec) I run with interrupts enabled - which is where I suspect the bit errors creep in. So how long is it safe to disable interrupts for ???. NB I am not using (yet) any other ports for reception - just for transmission and NOT when reading temperatures…

Hope this feedback is useful - and that if confirmed, any memory leaks which may exist in the OS code can be found and fixed soon :-O.

Many Thanks

Best regards

Graham
NB Please don’t expect an immediate reply from me - as I am now travelling for a few days…with no access to this email :wink:

Is my ‘Problem 1’ here the same thing you are seeing in this issue perhaps?

Hi,

It certainly looks similar to problems I have seen. We managed to make it happen by puling the ethernet out of the router - so although the WiFi remained operational, Internet connectivity was lost.

In my view this represents what will most likely happen with many users networks (especially domestic in the UK ;-)).

In fact we had another (different) anomaly last night - which may or may not be related…We deployed a unit onto a clients site for trials, and it stopped sending UDP packets at 00:46 this morning, although Particl.publish() packets were still getting through - they may have dropped out for a short while though…

Its a pity that you are using 0.4.8 RC - as I WAS rather hoping that this might have got fixed in 0.4.8 :-(( - I guess not…I haven’t looked at 0.4.8 yet until its ‘released’ - if it ever will be ???. I am told its possibly an internal interrim release anyway and we are awaiting 0.4.9 ???.

NB I am not running as much debugging as you patently have the ability to ;-)). All I see is FreeMem decreasing. In fact so much so that I now have some code which says if this ever drops below 0x4000 - do a System.reset() - as there is still no working ‘watchdog’ ???.

BR

Graham

We fixed a hanging issue that went into 0.4.8-rc1 when calling UDP::stop() so we took the release to mitigate that issue as we were seeing it a lot.

I am just finishing scripting the AP to go up and down so I am going to monitor memory here and see what happens as I keep having disconnects.

My issues are potentially related too - I’m wondering does it get to a state with no free memory to do what it wants (my problem 1) and sometimes it actually runs out of memory (my problem 2).

Will continue investigating - the lads seems to have noticed it anyway so maybe something will come from all the chat.

In regards to your watchdog comment - I was looking into doing my own watchdog with the InterruptTimers and then I came across this which someone has put together, doing just that.

Going to try it next week - maybe worth a look?

Hmmmm,

Looks interesting - so I grabbed a copy and the SparkIntervalTimer stuff, but not being a developer who rebuilds the whole OS - it won’t build. I simply use the CLI, added the cpp and h files and its missing references to WWDG_xxxx, and I have no idea where I find these - or add them :-O. Tried just adding a ref to stm32f2xx_wwdg.h, but thats obviously wrong as it can’t be found :-O.

Hopefully they will get back to me soon…

If I can get it built I will let you know how I get on with it…

BR

Graham

@GrahamS, if you are downloading a lib that’s meant for Particle Build Web IDE to use it with CLI, you need to correct the #include "<libName>/<libName>.h" statements into #include "<libName>.h" since CLI (and Dev) flatten the directory structure.

I’ve done this for the WDGS demo here

1 Like

ScruffR,

Thanks - yet again - for your response. So I copied yout files into a new folder, and built them just fine.

I reviewed your code, and found that it was pretty much identical to mine - I had already removed the relevant folders from the paths.

So I built my project again and it built perfectly this time :-O. yesterday it complained about not being able to find the various WWDG_ and IWDG_ functions ( 6 in total) as referred to in photon-wdgs.cpp.

The only difference is that I restarted the Atom editor/compiler :-O. Sorry my mistake when I referred to CLI - I use Atom for most development - then a final build directly from the CLI (as that gives me warnings and not just errors ;-)).

Bizarre - but hey it now builds at least ;-)).

Thanks again

BR

Graham

2 Likes

Hi,

Above @MHazly mentioned a UDP::Stop issue. Is this fixed yet ???.

I mentioned above that we are also seeing UDP stoppages when a photon is deployed onto a users site.

Well…I just managed to simulate this here - again by pulling the network cable from the Photons router (I use a different router for the photon for just this reason.

Anyway my app sends UDP diagnostic packets every 10 seconds, and stops sending quite often. The rest of the app continues to run so we are OK for the time being !!!.

I just added some better UDP code to print the result of a Send if its less than 0, and it now shows -26 (again) after pulling the ethernet, and it STAYS like that !!.

So I started to trawl and found some references to Stop hanging (worrying). I was trying to figure on the best way to recover when we lose WiFi. There only seem to be ‘Begin’ and ‘Stop’, so if Stop is going to hang after a loss of WiFi, there seems no way to recover :open_mouth: - without a reset !!.

I will ‘try’ adding a stop if we get an error back and see if that works.

What we really do need is a way of testing UDP to see if it has a valid socket - so that we KNOW we have to clean up maybe ???. OR Stop should automatically clean up resources. I see some stuff about this but don’t know the underlying code well enough to understand these posts adequately :-O.

There is so little documentation on this that it seems difficult to know what the expected use should be :-O.

BR

Graham

I think I read a similar thread a while ago where @mdma suggested as workaround to put the device into Listening mode (WiFi.listen()) to clean up the sockets and "immediately" after that finish listening via WiFi.listen(false).
And he'll be looking into a better solution.

I only hope I understood this right tho' :blush:

1 Like

-26 is the WICED return value for invalid socket.
You have to re-open (UDP.stop and UDP.begin) the socket to be able to send messages again.
I don’t recall what the UDP.stop issue was, but the fix was merged in the dev branch and seems to have improved stability.

UDP long term stability is less than stellar. I’ve resorted to counting errors on the UDP return values and soft-restarting the photon when the error rate gets too high.

Do you know if these errors happen for any particular reason, such as WiFi connection being dropped?

When the WiFi connection drops, then any networking connections are also dropped since the network interface they were bound to is no more. This is quite typical for most devices - e.g. take your windows or mac pc, disconnect wifi and any established sockets will be closed. The program has to recreate the socket.

Correspondingly, application code can re-create their own sockets if the socket reports an error, which is one way to resolve this.

However, I feel we can do better. The system could also remember the target IP/port of the socket and recreate the socket as needed when the network interface goes down and comes up again. I’m kind of thinking aloud here, so thoughts on that welcomed. :smile:

Hi,

As I was the one who started this thread - I will re-iterate that the whole issue stems from losing the cloud. I simulate this by pulling the ethernet cable into my (local) wifi router. So the wifi should still be up (probably router dependent).

I am concerned that calling udp.stop() hangs (at least in 0.4.7) - I had it do just this yesterday. So I don’t understand how we should cleanly shut down the socket to prevent memory leakage ??.

Currently I am re-initialising the udp socket, when Particle.connected() says its not. I first reconnect, then when it comes back up - I re-init udp WITHOUT calling stop (as that hangs). In Windows this would most probably return an exception with ‘invalid handle’, which could be caught and thus handled.

I DO have a catch-all which says is Free Mem ever reports less than 0x4000 bytes - force a System.reset(). Sledgehammer but just in case, this ‘should’ stop the system crashing or hanging due to low memory ???.

Hope this helps ???

In a different thread - I did ask the question about UDP overhead when calling begin and stop. Basically should we create a new socket for each message to be transmitted (ie begin…do something…stop in a single function), OR should we create a UDP socket and re-use this ??? (which is what I currently do). This is where we run into issues with having no ‘handle’ to refer to though ???.

BR
Graham