Long Range IoT Networks - Chapter 2 | Particle + LoRa Better together

bko · August 15, 2022, 4:11pm

Clocks drift, even good ones. When the Particle time functions came out (obviating the need for my NTP library) I did a bunch of testing. I knew at the time that I was 137ms from the Particle server giving me a time stamp, for instance.

I also tested drift on Photon’s and found several seconds per day was the norm on the three or four devices that I tested, but if I re-synced the time using the Particle API every 24 hours or so, it never moved the time more than about 5 seconds in my testing. I’m sure the Argon/Boron RTC is about the same amount of drift, and at variable outdoor temperatures it could be more.

You can compensate your RTCs over LoRA using the same type of algorithm that is in the NTP standard: there are four numbers, that are all returned from time server or measured on the client:

Originate timestamp–the client’s time when the request for time was sent
Received timestamp–the time the server got the request
Transmitted timestamp–the time server sends the response
Time Stamp Reference timestamp–the client’s time when it gets the response

You can then calculate the delay as (T4-T1)-(T3-T4). You then use ((T2-T1)+(T3-T4))/2 as the offset to correct the local clock. You have to be a little careful to do the subtractions in the right data types–typically uint32. You don’t have to use the full algorithm, you can just set the client time to T3, but the algorithm works well if you have multiple levels of client/server and different transmission delays built into the system.

jgskarda · August 15, 2022, 5:26pm

Yeah... I'm guilty at times at failing this principle. This might be one.

I currently am thinking this as well... In fact, I first listen to all LoRa transmissions while the cellular is still "off" and only turn on the cellular when you want to publish. In fact, today, I go through several LoRa Reporting sessions and then publish at the end. I.e. 10:05 AM LoRa Report only, 10:10 AM - LoRa Report only, 10:15 LoRa Report only, 10:20 AM - LoRa Report and then once done, turn on cellular, and then publish all data from the prior 4 LoRa Sessions. This is what I do today.

My main consideration in all of this is to minimize the "on duration" of a LoRa node that needs to service the mesh. Thus reducing the battery and solar requirements a LoRa node that can service the mesh while maximizing the number of LoRa nodes that could be adequately serviced in the Mesh. . I.e. the power of this LoRa Node that can service the mesh by minimizing the LoRa Reporting Window.

The Boron would also benefit however the actual power budget of the Boron is mostly consumed by cellular. Reducing the LoRa reporting duration would be wouldn't have a big impact on the overall power budget.

In Comparison, the power budget of a Lora node that services the mesh would almost entirely be made up of the duration of time it needs to service the mesh. Thus the sensitivity or trying to think through ways to reduce the impact to the power budget.

jgskarda · August 15, 2022, 5:41pm

Yeah... completely agree and great points. This could be another new thread altogether and certainly a topic I hope to address in more detail in this thread as this all progresses. Here is what I am currently attempting. Basically, when you power up, transmit with a Join Request message Once. If you get a response, great, move on. If not, enter Sleep mode but keep the LoRa in Receive mode (10 mA) and use in interrupt to wake the MCU back up if any LoRa activity is detected. Once LoRa Activity is detected, we must be in a LoRa reporting window, so go ahead and attempt a Join Request message again.

In addition, once the RTC is set on the device, the AB1805 RTC power is maintained even if the user turns on/off the device. Only if they remove the battery will the RTC power be lost. So once the time is set, it should be reasonably close unless it doesn't receive a time sync message within 12+ hours. So in addition to waiting until Slp Done or Lora MsgDetected, it would also attempt another Join Request message when it thinks the reporting window is.

Nice! I'm assuming just a little more battery and larger solar panel? Do you go into a low power mode where it'll "wake up" when it detects channel activity or is it always on/always scanning?

That's really good guidance! I try to strive for the. I've had good success with Hub/spoke sleep models (i.e. no mesh), good success with mesh (always on), decent success with a sleepy mesh but the sleepy mesh uses more battery or would require too large a solar panel than I'd like thus trying to optimize the sleepy mesh for both reliability as well as battery power.

As for the power budget... I'm sensitive to making this solar panel work for a sleepy Node that services the mesh. It just fits my existing enclosure so well. I could go larger enclosure or external solar panel but that all comes with a lot of re-work or added cost.

Very interesting... Thanks for sharing! Sounds like you've done a lot of work in this area. I'll have to consider that concept further and understand what that means for my use case. I was thinking of comparing TX start time vs RX finish time of a LoRa Node and take 1/2 of that difference in time as the "transmit" delay. So when it sets time, it actually sets it to the time value it received but subtracts the transmit delay to account for it. The biggest variable I'd think is the number of hops it needs to make to get to the Particle + Lora device. The more hopes, the more transmit delay the more correction.

I'll have to think through/understand this further. I've also been studying the AB1805 application manual and they talk about "Advanced Crystal Calibration to +/- 2 PPM". The procedure is chapter 5.9.1 in that manual. From what I understand it's get a scope, clock an output, compare it to 32768 and given what you measured, set a few registers. The big unknown for me in that is how much does temperature affect the PPM/clock? Also if you have 10 PCBs with the same components assembled/ordered the same, do they all require calibration or once you figure out the calibration parameter for your specific PCB/design is it the same or "close enough" for all.

Something related to this on my mind is how often does Particle synchronize time. The concern being if time is off say 10 seconds, then Particle time is synced with the cloud, it could make it "out of sync" with the LoRa nodes. Whereas, it would be best to have it only correct it's own RTC by 1-2 seconds at a time. Looks like it's every 3 days for most devices but can be synchronized on demand by Particle.SyncTime(). I may need to call that every few hours to ensure it doesn't jump by a large number of seconds each time it needs to make a correction. 3 days seems too long.

jgskarda · August 16, 2022, 2:15pm

@Rftop

I forgot to mention... I currently "configure" this from the cloud. @chipmc and I collaborated on this a bit and his suggestions seemed like a good way to go. This sounds similar to your Partilce function. Currently, when a customer "claims" a LoRa device, it is assigned to a particular "Particle + LoRa" device. Once that happens, a config JSON is written to the Particle Device. It has 2 parts. First lists a Serial Number (Unique identifier/claim code) to assign the device a 2Byte identifier (used as the unique identifier in all future transmissions).

{
"1234567abc": "12345",
SerialNumber: "2ByteIdentifier"
}

The second configuration is. Given a unique Identifier what is the configuration array for that device.

{
"12345": [10, 20, 30, 255],
"2ByteIdentifier": ["LoRaNodeAddress", "TXOffset", "Config3", "Config Index"]
}

When this information is written down to the Particle Device... it is also written to the flash file system. In setup, The JSON from Flash File System is read to "reload" the data after OTA or power cycle. Secondly, once it connects to the cloud, it would push down this configuration JSON again. So trying to make the Node Address, Device Offset and other Node specific configurations configurable in the cloud instead of done by Particle.

Finally, since only one Particle device can be configured to "listen" to a particular Lora device and each Particle device knows who it can "listen" to. The "de-duplication" that occurs in LoRaWAN when two gateways hear the same message from a single LoRa device, is inherently done by the Particle device. I.e. if two Particle + LoRa nodes hear the same message from a LoRa node, only the one it is configured for will route the message to the Particle cloud.

Are there other aspects of this I should also be considering/thinking through?

Rftop · August 17, 2022, 6:57pm

No Sir, what you detailed would have eliminated those issues for me as well.... with the obvious assumption that your backend has/gets everything it needs from the config array for each node.

Rftop · August 17, 2022, 7:09pm

I like that !
Please please please tell me you're running trials w/ your Radio/board and this panel.
What kind of power and duty cycle are you thinking for the Sleepy Solar Routers?
And Sorry I have to ask, how many mWh's would the Lora consume in 24 hours of Listening ?

Again, I really like your enclosure and panel. Thanks for sharing that.

jgskarda · August 18, 2022, 5:06am

Thanks, this is what I'm using as the panel. The voltaic solar panel just barely doesn't fit on the covers flat spot (~3 mm narrow). But I have a CNC router, when I cut the hole in the cover for the wires, I also "square off" 1.5 mm of the edges on each side to provide an entire flat mounting surface. The PCB is shaped with holes to fit right inside so quick/easy assembly.

Yeah, of course! I started with just 1 node for the last ~3 months using this solar panel/enclosure with a separate/non-integrated Solar charging circuit (bq24074) charging a single cell 18650 like lipo. My original board didn't accommodate solar charging so had to add this separate component for initial testing. I just got back some boards that integrate the similar solar charging circuitry in a custom PCB this last week and now testing/finalizing that design. I hope to be testing 3-5 of the fully integrated PCB in a sleepy mesh config in the next few days/week with this new PCB.

As it pertains to testing in general, I started playing with 10 - Adafruit RFM95 featherwings year 1. Then moved on to 50+ custom PCB LoRa nodes out “in the wild” this last spring. At that time none of them were “mesh” but rather hub/spoke model but did have a sleepy hub at the time and kept in sync. This typically was a LoRa node awake for 1-2 seconds every 300 seconds.

In my solar LoRa node that I'm testing now:

Sleeping ~160 uA - Adding PCB updates to allow deep sleep by turning all power off except the AB1805 RTC which is used to awake it up/end deep sleep. That should get me down to <10-20 uA sleeping current. Likely will only use that mode of sleeping for along duration (i.e. 2 hours+) since it takes more power when waking up.

Awake/Listening/Servicing the mesh/waiting for a response today: ~21 mA today but could likely reduce this to ~10 mA by entering sleep and use an interrupt to wake up the MCU when a Lora Message is received. I tested this, it's functional/working, I just didn't update my state machine yet yet. I.e. like this:

//Keep radio on when sleeping the MCU. Still wakes up with LoRa Msg Received ~10 mA
rf95.setModeRx(); 
LowPower.deepSleep(slpTimems);
//Can successfully wakes up from Sleep using interrupts

vs:

//Turn the radio off during this sleep mode. All messages ignored: ~160 uA
rf95.sleep(); 
LowPower.deepSleep(slpTimems);

I've also attempted CAD sleep but haven't been successful in getting it working:

//Set the LoRa radio into Channel Activity Detection mode. ~2 mA
rf95.setModeIdle();
rf95.isChannelActive();
LowPower.deepSleep(slpTimems);

//Haven't been able to figure out how to get it to wake up the MCU in this mode as CAD doesn't produce an interrupt

Here is what power consumption looks like today: (no to scale ). I think this matches up with the Adafruit Feather M0 stated power consumption as well Power Management | Adafruit Feather M0 Radio with LoRa Radio Module | Adafruit Learning System

Transmit: 130mA for ~100ms

I need to likely do a better job in calculating and keeping track of my power consumption and be a bit more scientific. However, similar to RF, I found real world field tests being much more telling than an excel sheet in how long the battery will last. It was hard to determine what actual power I'd get out of the Solar PanelSolar Power Calculator/Map. In my testing, I deliberately made it poor solar conditions (under a large tree almost all shade with the panel pointing north) so very little if any direct sunlight. In the ~3 month timeframe since I started the single 18650 battery is at about 90% charge remaining. In comparison, the non solar are just about dead right now. If I faced it south, I'd think it would keep it topped off/charged fully. That was with 5-8 nodes with a configuration using 3 seconds per node or total "awake time" being 24 seconds ~ 20 mA for a Sleeping Node that services the mesh with the remaining 276 seconds sleeping (~160 uA).

The awake time is the lions share of the power consumption for a LoRa node that services the mesh. This whole effort of wanting sub second RTC synronization would be instead of my 3 seconds I use today, what if I got it down to say 1.5 seconds. Can I cut my awake time in half. Or said differently, I am "marginal" at 10 Lora nodes as even with a solar panel it is slightly draining the battery, if I can cut the awake time in half, then I could double the number of Lora nodes that can be serviced.

Rftop · August 18, 2022, 3:30pm

Great info.

If you get the Channel Activity Detection working on Solar Routers, then your timing hassle is really just to clean up your airtime for your battery Nodes verses orphaning them because the Mesh went down (from timing errors).

My “back of napkin” calcs (backing into the power budget from the Solar side):

Assuming 1/2 of peak power from your Panel, for 4 hours per day = 300 mW * 4 hr = 1,200 mWh per day stored.

Assume you get Channel Activity Detection working = 2mA * 3.7V = 7.4 mW for 24 hours = 178 mWh per day. That gives you a 24/7 Mesh Footprint from your Solar Router(s).

Then calculate how much Solar Router Budget remains for actually transmitting Lora Messages from the nodes back to the Boron: 1,200 mWh - 178 mWh = 1,022 mWh “available” per day.

You mentioned an average Message was 20 mA for 3 Seconds, so 20 mA * 3.7V = 75 mW.
3 seconds / 3600 sec per hour = 0.00083 hours per message
75 mW * 0.00083 hours = 0.0625 mWh per message

1,022 mWh (available) / 0.0625 mWh per message = 16,000 messages per day.

A Node w/ a 5 minute cycle will send 288 messages per day. Lets assume we need to add 10% for collisions = 317 messages per day.

16,000 messages per day / 317 messages per node = 50 Nodes Serviced, based on the Solar Router’s Power Budget.

That seems A LOT higher than I was expecting at the start. Please check my math and assumptions.
I could have made a wrong turn somewhere.
But even if you cut my calcs in half, providing and servicing a Mesh for 25 nodes with that tiny panel would be extremely impressive.

[Edit]
After reading this I should probably mention this works as one or more Hub/Spoke Networks instead of a “real” Mesh. Probably no reason in dealing with the overhead of LoraWan when your routers can simply aggregate messages and forward to the Boron. So I’ve used the term “Mesh” a little too loosely.
[/Edit]

jgskarda · August 18, 2022, 9:40pm

Your math I think is close... a few corrections:

The LoRa transmission of a message is actually ~130 mA for ~100 ms. This occurs sometime during the 3 second window dedicated for that node. During this 3 second window a LoRa router is currently programmed to be fully "awake" to listen for and then route a LoRa message for each node. It is awake a full 3 seconds per node only to provide a +/- tolerance on each side so the end to end communication of a node can complete before the next node needs to transmit. During this 3 second per node listen time, it is burning 20 mA when it's not transmitting just sitting there listening for when it arrives. If we have 10 nodes, it's awake burning 20mA for 30 seconds + ~ 20 transmissions of LoRa (if it truly needs to re-transmit each one). What I mean is sometimes the node would be in range of the Particle+LoRa node so the LoRa router doesn't have to re-transmit it. It just hears the message and throws it out based on the discovered routing table.

So to squeeze the power out the LoRa Router nodes. Reduce the time in this mode or reduce the current draw.

Reduce the +/- tolerance per node by increasing precision of setting the RTC (~2-3X reduction in power) and configure a specific window for each node to TX. Kind of tricky but seems doable.
Use a RX_Sleep mode (sleep the MCU, keep the radio in full receive mode) (2X reduction in power). This should be easy/very doable.
Use CAD mode (if we can get it to work), 10X reduction (2mA) This seems hard.

I may try and build on your napkin math and do an excel sheet later today or this weekend. I'm probably at a point to put some better numbers behind it. I'll try and post here/share it here when ready.

Absolutely, a 2mA LoRa listen mode would be the golden ticket and would vastly simplify all of this. It could stay in this mode indefinitely using that solar panel. I'm not fully ruling this out yet but man... I google searched a lot and went down some rabbit holes attempting this. From what I understand given my early investigation, this effectively requires the MCU to continually "micro sleep/wake" every few milliseconds. Every time it wakes up, it checks for CAD, if detected, it puts the radio into listening mode to listen to the full message, if it is not detected it goes back asleep for 5 ms and then continually repeats this. It's essentially attempting to detect the pre-amble of a LoRa message. The only way to do this is to continually probe the radio at at least the length of the preamble. You'd think this would be done within the LoRa radio itself but everything I've found/tried this is the method it uses. Here's a few references if you want to dig into it more:

Now that I reread some of these, I think the challenge is “false” CAD signals cause the radio to wake the MCU and then stay awake until some LoRa RX timeout until it’s truly a false message. The lack of finding any good example of it makes me question if it’s too complex, not reliable or burns too much power yet. It does still have to spend a finite period of time in Receive mode so it’s not actually 2mA either. Interesting concept though!

This is very much what I'm hoping to achieve through one method or another. I tried CAD and couldn't figure it out. I then tried the waitForCADTimeout() method/randomize each nodes offset and it worked but couldn't scale beyond 5, so trying to orchestrate the show was the next logical option. As you pointed out earlier, I wonder how a node could "self heal". What I mean by that is if it "falls out of sync" and doesn't get a response in say 1 hour or after maybe 10 transmit attempts, it would occasionally put itself back into RX Sleep mode to wake up when it hears a neighboring LoRa transmission.

In any case... it's a fun challenge.

chipmc · August 19, 2022, 12:49am

@all, Wow! there seems to be a lot of energy around this topic.

I was dropping my daughter off at college over the past few days and I was surprised at the amount of reading I needed to do to catch up.

Interestingly, my daughter is going to Virginia Tech and the route from that august institution to my home takes me past one of my most troublesome locations for Particle devices - Pilot Mountain State Park’s Bean Shoals access. This place is seriously cellular radiation deprived and while that might be a good thing for the Park’s patrons, it has been an enduring torn in my side.

To get an idea, please see this map from the FCC’s 4G LTE coverage database:

Despite the hint of cellular coverage using Particle’s EtherSIM carriers (AT&T, T-Mobile and US Cellular) I could never get counters in this area to connect. Then I applied for Verizon’s developer program and was able to get the vehicle counter to connect (most of the time) but not the trail counter. You can see that no one carrier has this area covered though Verizon is the best in my experience.

I have a few parks like this where I can connect to most - but not all - the sensors. This is where LoRA comes in. With LoRA, we can separate the location of the gateway (based on wireless coverage) from the sensor nodes (based on where the data needs to be collected). My wife and I tested the following: We found place with adequate cellular coverage and placed the gateway. Then, we walked to the two sensor locations and tested the LoRA connection - it worked! despite being heavily wooded in places and in full late summer North Carolina bloom.

Working with Jeff, I am confident that I can have these nodes form a much more reliable and connected sensor network. Once in place, adding new nodes (such as one that is at the end of a 6.6 mile corridor trail) should be straight forward.

Then, I plan to see if this approach can solve my connectivity problems in a handful of other parks where the terrain and cellular coverage has caused me significant heartburn.

Going forward, I would like to use the following Triage for connections:

Particle EtherSIM
Particle EtherSIM with a high-gain antenna
Verizon 3rd Party SIM (likely inheriting the high-gain antenna from the step before)
LoRA
Disconnected (data logged to microSD card and uploaded using my bulk-uploader tool

Thanks,

Chip

jgskarda · August 19, 2022, 1:27am

Great to hear! Thanks for sharing some additional real world Particle + LoRa experience. It's great exploring, building and collaborating together on this!

I don't know the count... but I had several Particle + LoRa devices out there for this specific reason. I.e. I don't have cellular connectivity right where I need it (For my use case, this is typically the lowest part of the woods/lowest cellular connection). But I do have connectivity up that hill 500 - 1000' away. In those cases, it worked quite well. Even better as the Particle + LoRa device was also used as a sensor in that area so served dual purposes. Thanks for sharing!

jgskarda · August 19, 2022, 1:56am

Alright… time to get some ideas on names to call a device as well as let’s see what the poll feature in Discourse is all about.

What’s the best name for the Particle + Lora device that listens to LoRa nodes and transfers the packets to the cloud:

LoRa Particle Gateway - Not to be confused with a true LoRaWAN Gateway
LoRa Pseudo Gateway - Since it’s not a true gateway
LoRa Bridge - It bridges the connection between LoRa and Cellular
LoRa Hub - It’s the central point/concentrator of LoRa Messages
Something else…

0 voters

What do we call a LoRa Node that participates/services listening too and forwarding LoRa traffic from other nodes:

LoRa Router Node - It routes the LoRa traffic
LoRa Mesh Node - Not truly mesh but it could be
LoRa Forwarder Node - It forwards the LoRa traffic
LoRa Hub node - Since it’s a hub in a multi hub/spoke model.
LoRa Concentrator - Although it doesn’t really concentrate anything… it just sends the message on.
Something else…

0 voters

And since we are at it… what would we call an “end device” that does not participate/listen to traffic from other nodes. It simply wakes up, sends it’s data, listens for a response and then falls asleep.

LoRa End Node
LoRa Spoke Node
LoRa Sleepy Node
Something else…

0 voters

jgskarda · August 20, 2022, 7:11pm

Anyone out there play around with LoRa and the various modem configurations? I.e. this:
rf95.setModemConfig(RH_RF95::Bw125Cr45Sf128)

Where:

  Bw125Cr45Sf128 	Bw = 125 kHz, Cr = 4/5, Sf = 128chips/symbol, CRC on. Default medium range.
  Bw500Cr45Sf128 	Bw = 500 kHz, Cr = 4/5, Sf = 128chips/symbol, CRC on. Fast+short range.
  Bw31_25Cr48Sf512 Bw = 31.25 kHz, Cr = 4/8, Sf = 512chips/symbol, CRC on. Slow+long range.
  Bw125Cr48Sf4096 Bw = 125 kHz, Cr = 4/8, Sf = 4096chips/symbol, low data rate, CRC on. Slow+long range.
  Bw125Cr45Sf2048 Bw = 125 kHz, Cr = 4/5, Sf = 2048chips/symbol, CRC on. Slow+long range.

For reference… The default and all my testing to date I believe is here on this chart:

My use case is higher density sensors that publish fairly often thus exploring mesh seems to make more sense for me than say a SF of 12. However, in a use case with low density LoRa end nodes maybe just wanting to add 1-2 that are far away, it might make more sense to try a higher SF.

Early on when I explored this, a few didn’t work at but after I updated the RF9X_RK library to include all recent changes from Radiohead, maybe it’s worth testing.

@chipmc - Maybe it’s worth checking it out given your use case.

chipmc · August 21, 2022, 3:04pm

@jgskarda ,

Indeed. There is a tradeoff you make with LoRA: battery life, range, throughput - like the old joke - pick two. I have not played with this much but I think you are right it is worth looking into.

For my use case with low density, low data volume and infrequent reporting windows - I may go for higher spreading factors and higher transmitting power.

For you, getting more data across in a shorter period of time is a plus. Also, if you have adjacent sap collection installations, you may not want as much transmission power.

Screen Shot 2022-08-21 at 11.03.38 AM

Like you, I am assuming LoRaWAN and LoRA share some characteristics here.

Chip

jgskarda · August 21, 2022, 3:48pm

Alright... so I have the basics of sub second RTC time synronization functional between the Particle LoRa Gateway and a LoRa End Node. It seems to be working well. However, I'm shocked how much constant drift there is even when both Boards have the same RTC and crystal. ~220 PPM difference? I'm fairly new to the intricacies of RTCs. Am I doing something wrong or thinking about this wrong? The two are sitting 5-10' apart both in my basement at the same air temperature.

Every 5 minutes, the LoRa End node wakes up, takes readings, transmits it's data and waits for a response. When the Particle Lora Gateway, receives the message, it reads the time from the AB1805 and then transmits this time back (formatted as outlined earlier in this thread). When the message is then received by the LoRa end node, it compares it's own AB1805 RTC time with what was received and writes this out via Log.info()/Serial Print. IF the absolute value of the time delta > 500 ms, it'll set the time again on the AB105. I.e. currently 1/2 second accuracy... I could adjust this to say 250 to get 1/4 second accuracy in theory.

Plotting the data... it seems the LoRa node AB1805 is advancing ~65 ms ahead of the Particle LoRa Gateway every 5 minute interval. The program then "corrects" the time once it's 500 ms off every ~40 minutes or so. This is about 224 PPM error. The spec sheet of the crystal used on both sides says +/- 20 PPM. I'm not sure what I'm doing wrong/not understanding but I was not expecting that much constant drift. I was expecting a max drift of say 20-40 PPM (i.e. 5-10 ms per 5 minutes worst case.) not 200 PPM+.

Next, I'm going to test several other boards to see if this amount of drift is more/less with different AB105 and crystals. I could look at adjusting the RTC calibration registers every time a correction is made but not sure I really want to try and get that fancy. Any other ideas or what am I doing wrong?

UPDATE: It seems my observations match this: https://lowpowerlab.com/forum/low-power-techniques/ab1805-rtc-accuracy/ but the part that still isn't making sense is I am comparing two AB1805 RTCs so you'd think the two would be closer together but I'd drift from the Particle Cloud time.

As an FYI... this was done by making updates to a Forked version of the AB1805_RK library and then forking another version of it compatible with an Arduino M0+ MCU for my Lora End Node. The main change required for this test was adding the ability to set and read the hundredths register of the AB1805.

i.e. From this:

ab1805.getRtcAsTime(time_t &time)
ab1805.setRtcFromTime(time_t &time)

to this:

ab1805.getRtcAsTime(time_t &time, uint8_t hundredths)
ab1805.setRtcFromTime(time_t &time, uint8_t hundredths = 0)

Rftop · August 21, 2022, 5:24pm

I’m probably the last person in this forum when it comes to qualifications or experience to discuss RTC accuracies.
But your graph looks like the definition of a Systematic Error.
I would have expected drift to be a Random Error ?

Are you sure that your “Correction” takes into account the airtime and/or any processing time inherent to the transmission ?

Your deviation graph just looks too perfect to me to not be a systematic error, but this is outside my wheelhouse.

jgskarda · August 22, 2022, 4:50am

Well... after spending most of the day reading about, google searching and investigating RTC as well as testing out 3 different carrier boards and 3 different LoRa node PCB boards to see how different the clock speeds are between each, it finally hit me.

"Hey you dummy... power to the RTC comes from LI+ not from 3V3 like everything else. Maybe you should plug a battery into LI+ to "stabilize" the voltage the RTC sees"

I had a USB cable plugged directly in since I was logging data to serial. That powered everything up so it never occurred to me to also plug in a LiPo battery to the board. Without the LiPo battery, I'm guessing that RTC was seeing all sorts of "noise/spikes" from the PMIC trying to charge a battery? As soon as I added a battery, it got MUCH better. I'd say more in the "acceptable/expected range".

To make this all easier I decided to order a GPS module with a 1 PPS (pulse per second) output. Thinking I could use that to count the pulses of the 32768 clock interrupt over say 30 seconds PPS signal from the GPS using interrupts. If it becomes required, I could likely come up with a program to calibrate the RTC based on the GPS 1 PPS signal. I.e. Plug in the GPS into the board, run this sketch and 30 seconds later the RTC is calibrated. Yes, they will still drift based on temperature, but their baseline would be calibrated together.

Either that or buy a decent oscilloscope. Anyone have recommendations on a reasonably priced oscilloscope? I should probabply get one.

I think of Drift as one clock is running faster than the other. I.e. not random but constantly increasing or decreasing. It's my understanding that the transmit/processing/airtime correction (if constant) would just put things in "sync" more precisely. It wouldn't cause the time on one RTC to drift/walk relative to the other.

From my testing to date, if the LoRa message makes it from LoRa node to the gateway and back to the LoRa node without any re-tries, time can be set pretty darn precise and maintained precise. I.e. a round the horn (LoRa --> Particle --> LoRa) from start of transmission to response message received takes about 290 ms +/- ~2 ms.

The "gotcha" in this concept is the random error that comes from the variable number of re-tries that can occur with any LoRa transmission (currently set to max 3 re-tries). Re-tries are nice to help get messages through that wouldn't of made it otherwise, but this does add a level of inaccuracy/variability.

With direct LoRa node to LoRa Particle Gateway and back... we can actually account for it since we know how long it took to transmit and the time between transmission start and when we received a message. But as soon as it makes multiple hops in a Mesh, then we wouldn't know the number of retries a message made in any one hop. That is unless each node "updated/tacked on" a number of retries component to each message. That's not currently in the Radio Head library and would have to be developed. You can extract the number of re-tries a particular node took to send a message, you can also track the total number of hops in an end to end message but currently there is not a way to extract the total number of retries.

Overall, I think this concept of sub second time synchronization seems doable and will certainly be better than +/- 1 second accuracy. I think I'll also be able to reduce the dedicated window each LoRa node uses from 3 seconds to at least 2 seconds, maybe even 1.5.

While I wait for the GPS module to show up, I may try and refine a LoRa mesh node to allow the MCU to sleep while listening mode to reduce current from ~21 ma to ~10 mA when servicing the mesh.

jgskarda · August 22, 2022, 9:24pm

Here is a quick update regarding RTC accuracy… after adding the battery to the LoRa Particle Gateway board to stabilize the RTC… I now see ~14 ppm (1.3 seconds per day) drift. This is more what I was expecting and very acceptable. I then moved the devices farther apart to try and induce more “re-tries” to better understand the dynamic of what happens with missed messages. Here is what that looks like:

Notice the slow increase is “drift” and would be about 1.3 seconds/day total if it didn’t correct itself and second, notice the “spikes” this occurs when a message required 1 or more retries when sending the message. If you notice… the spikes are consistent indicating a message took 2 or 3 re-tries (i.e. the error is # of retransmissions * duration to transmit). It seems unless we can track/append/update the number of re-tries that occurs as the message is routing from node to node this sub second time synronization will be for nothing… This maybe harder/more complicated than what it’s worth. I’ll peak at the RadioHead library to see what’s possible but eh… not sure I want to try and tackle that.

Rftop · August 22, 2022, 10:57pm

As I said before, I never could get a Sleepy Network to pass my Validation Trials (but it was pretty stringent). I’m still hoping that you can pull it off. You’ve already made a lot of progress.

Since the goal is to not abandon/orphan a remote node while staying in the power budget, maybe consider having the solar router/hub/forwarder/ initiate a protocol to sync an orphan that hasn’t phoned home ?
IE: If it’s missing a child, leave the radio on longer if the budget allows ?
Unless something is BAD wrong, it should take less than 5 minutes to save an orphan.
No worries at all IF the sun’s shining.
Just “brain-storming”

PS. I like your idea about calibrating w/ GPS PPS on the bench, if necessary.

jgskarda · August 23, 2022, 4:35am

@Rftop - This sounds like a challenge. In any case... it'll be fun trying. Collaborating with you and others is the way to make it happen!

I always blamed any miss LoRa messages on two nodes talking over each other thus, having the LoRa Particle Gateway "coordinate the show" and set time much more precisely on the LoRa nodes is why I'm so focused on this timing thing. Hopefully I'm not going down a dead end.

Adding the number of "re-transmissions" in a LoRa message was actually much easier than I anticipated. Turns out buf[5] of a RH_ReliableDatagram is the RH_Mesh_Message_Type with the following possible values:
0 = RH_MESH_MESSAGE_TYPE_APPLICATION
1 = RH_MESH_MESSAGE_TYPE_ROUTE_DISCOVERY_REQUEST
2 = RH_MESH_MESSAGE_TYPE_ROUTE_DISCOVERY_RESPONSE
3 = RH_MESH_MESSAGE_TYPE_ROUTE_FAILURE

We can effectively "inspect" buf[5] within RHReliableDatagram.cpp to see what type of message we are sending. If it's an application message, then we can "hijack" the last byte of the message and increment it each time we transmit. Each node along the route will do the same. We could Hijack the first byte too but hijacking the last just made it easier for me to test.

I simply added this to RHReliableDatagram.cpp part that sends the LoRa messages.

    if (buf[5] == 0){
        buf[len-1]++;
    }

and that should do it...

To illustrate this.... I printed out a message within the application as well as within RHReliableDatagram.cpp. In this case, I purposefully powered off the LoRa Particle Gateway to force it to re-try. Notice how several bytes get prepended to the message within the RHReliableDatagram. This is the RHReliable Datagram header with a fixed length/structure. It is then followed by the same payload. In this example, I dedicate the last byte in the application message to number of transmissions and set it to 0. I then hijack it and updated it with each re-transmission. In this case, it tried to send the message 4 times and incremented it each time.

If I can take this into account, then setting time within a node should be pretty darn precise. Multiple hops/retransmissions should be taken into account and Actual Time = RX_Time + Number Transmissions * Time per Transmission. Time per transmission should be constant given a specific spreading factor, bandwidth and number of bytes in the payload.

Next steps is to actually use this number of transmissions to correct the time received and see if that eliminates the "spikes" in that earlier chart. Then the 1 PPS GPS module and see how accurate we can get this thing...

Yeah great idea... currently a user can enter "burst mode" and request data every 1 minute. This is used especially for site surveys to send data (including RSSI and SNR) every 1 minute. When in Burst Mode, I'm thinking the Sleepy LoRa Mesh Node would stay awake or at least sleep in LoRa Listen mode the entire time with the MCU sleeping. So if a device became orphaned, we could just enter "burst mode" allowing an orphan to sync up with the family again. Currently the system exits burst mode after 1 hour. Is this kind of what you had in mind.

This thread is becoming fairly focused on LoRa intricacies in the last few posts... Is everyone OK with that? There is some great people in this community and thought this would make for some great conversation of LoRa + Particle.

Topic		Replies	Views
Project Share - Low Power LoRA / Particle Gateway Project Share boron	12	894	February 10, 2023
New Compound? New Features? More Cowbell? General	21	4019	March 29, 2017
Particle with only LoRa Networking Cloud	26	12428	August 28, 2020
LoRaWan Functionality Troubleshooting	21	3551	May 15, 2020
My attempt at RF95 (LoRa) Project Share	21	10753	April 19, 2019

Long Range IoT Networks - Chapter 2 | Particle + LoRa Better together

Related topics