We have deployed some temperature sensors that are connected to an Electron. There are ~10 units currently in operation in 3 distinct locations.
On Monday 8 units that were using the same APN (In 2 locations) suddenly stopped reporting data.
Subsequent investigation has indicated that while the Electrons are working fine, the packets are not getting to our server. We have done packet captures at the firewall and have confirmed that that is still working fine. Additionally, we can see that the 8 units in question are still sending data (Usage in the provider portal).
2 units using a different APN continued to work normally.
After doing a hardware reset of 1 of the devices, it started to work again.
This morning however (at a similar time) that same device had the same issue. We had changed another device (At a different location) to the same APN and it did not have the issue.
We currently think it is a problem in the network of the provider but we aren’t sure (We have lodged tickets). Has anyone come across this issue?
We are implementing work arounds (Such as resetting if we don’t receive communications) but aren’t sure if that is the best response.
Just as a side note:
UDP is a protcoll that expressly doesn’t guarantee for packet delivery, order or duplicate avoidance.
Hence its up to the application layer to ensure the delivery, reception and consistency of vital packets.
We are aware that UDP doesn’t guarantee delivery. The problem is that all the devices went off at the same time after they had been working well for a number of days.
I would also ask your network provider if anything changed in their DNS service. I can’t really think of many ways that the Electrons would be sending and yet you receive nothing, but DNS failure is one way.
Maybe you could deploy test firmware with a hardcoded IP address instead?
It turns out that there was some maintenance on the tower nearby which caused the Electrons to stop talking to our server.
We are actually using a hard coded IP address so it isn’t a DNS issue. It had to be a routing issue of some sort.
We have had this happen again in 2 different locations. After the above incident, we added the capability for the device to reset if it hadn’t contacted the server. That hasn’t worked as expected.
We are looking at other fixes to the problem and trying to get a detailed description of the maintenance performed in an attempt to resolve the issue.