OneWire Library Bug / Proposed FIX (June 2019) Affects DS18B20 and other OneWire devices

rick…thanks! I don’t know if you had the chance to look at the exact changes (ATOMIC BLOCK additions) I made to the OneWire file, I feel that almost all of them were critical to this issue with possibly the exception of the onewire.reset.

I made the changes rather quickly and so could be wrong there. Also, all this is pretty new to me. I will look at them a bit closer. Unfortunately, I can’t test any of this personally, I only have a Photon.

The issue with your first fix I think is mostly for those “systems” that have multiple sensors on the OneWire. The algorithm to find the addresses seems to be very transaction intensive and currently fails to consistently find all sensors when there are more than a few (4+) tied to the OneWire bus. To fix that, that algorithm could possibly be re-written to be more robust (I haven’t looked at this algorithm at all, so this may be a FALSE statement). I’m sure there are users that could accept a low percentage of CRC errors on the OneWire bus if THAT particular issue is addressed.

The 2nd fix seems nice as in opt-in for those not using mesh. EDITED: actual doesn’t really help anyone. I am assuming the ATOMIC_BLOCK change will be made in all cases. That already fixes the issue for those mesh devices that are not using Mesh…so no need for that 2nd fix unless we can find a way to apply it for those devices with mesh enabled…Rick doesn’t seem comfortable with that.

Also…I am not familiar with the particle.io family, I don’t think we have samples of the entire spectrum…or do we.

What do others think?

I’m planning on looking at the changes Rick proposed. I’m gonna look at the timings one more time, and a closer look at the Atomic Blocks and their criticality. I’ll create a file(s) to post here, unfortunately, I wont be able to get to it until probably tomorrow.

1 Like

@rickkas7 I thought (at a glance and out of ignorance…I am new to all this) that you were providing two methods to disable interrupts for the “deep down code” inside of OneWire. However, when I went to try it out, it’s obviously not that.

We have made changes to the lowest bit level of the OneWire protocol. Currently the largest “interrupt blockout time” needed (to make OneWire error-free) is about 60us+. If that is an acceptable level for mesh operation then all we need is the blockout method (REALLY disable interrupts)…so that we can put that into the OneWire.cpp file and test it.

If 60us is NOT acceptable, there is perhaps one other possibility that does not eliminate all OneWire errors (in my testing about 0.1% still occur). That method requires a blockout method for interrupts that is about 15us+ long. Is that acceptable?

In either case, we can’t move forward to test it unless we know how to TOTALLY blockout the interrupts for mesh enabled devices.

There is already a workaround for devices that don’t have mesh (or don’t enable it), and it passes 100% of the time for all that have tested it.

The __disable_irq() should block the high priority mesh interrupts. I think 15 uS should be OK. At least it would be worth a try to see.

@rickkas7 I don’t know how to compile that…what header would I need? It shows as an error when I enter that code…I didn’t actually try to compile it though

identifier “__disable_irq” is undefined

sorry…all this is new to me…I am compiling locally on my windows computer

Hmm… I’ll take a look tomorrow. I didn’t run the test with __disable_irq and there is a possibility that the tests were done from a monolithic build so system firmware functions were available. I’ll see if there’s another option.

Ok sounds good, thanks for your help…

FYI: The 15us delay option doesn’t wrap either the OneWire.reset or the OneWire.write_bit (0) with the high interrupt protection. They are the 60us operations but turns out are much less critical than the OneWire.read_bit op and the OneWire.write_bit (1).

EDIT: I have updated the first post of this thread with a status and “clearer problem statement” that I will try to keep updated.

Update: I didn’t hear from @rickkas7 today. He’s either very busy or an “interrupt lockout” for the mesh enabled devices may not be very straightforward. I am sure we will hear an update when he figures it out.

I went ahead and updated the OneWire.cpp file in the first post of this thread. I took out the redundant interrupt disable/enables which were replaced by the Atomic Blocks. I also made a judgement call to change a couple timings. I did see a thread where someone had hooked up a scope to the OneWire bus and it seemed clear that 1us to drive the bus low by a particle.io device was plenty of time…and it meets the spec.

Changes in that file (from the original) have a comment next to them with //JC in it for now.

1 Like

Well, no word from @rickkas7, I am continuing to watch this, but obviously moving on. I’m done with my project and it works great on the Photon, especially with library FIX.

I attempted to file a PR on the OneWire bus protocol that was listed about 5 or 6 posts ago. Don’t know if I was successful filing the PR on the Particle Library, my own version of it, or an entirely different version of it. Github is not intuitive to me and I don’t have a lot of time to spend on it.

It does surprise me though (having worked in the industry), that issues/bugs like this don’t seem to have a central repository, I probably just can’t find it. How do new/existing customers not keep chasing the same problems over and over…like this thread a few months ago that seems to me obviously linked to this bug:

I also have concerns about the OneWire “Library” as well since all of the various DS18xxx libraries seem to have the OneWire files inside of them, and no LINK to the library pointed out by rickkas7. Even if that library gets changed, I don’t understand how it would propagate to the DS18xxx libraries that everyone seems to grab.

In any case, I don’t use MESH but here is what I would suggest based on user comments previously: IF you are doing a OneWire address scan to discover OneWire devices, disable your mesh during that operations. Once the addresses are successfully recorded, there doesn’t seem to be any other choice but to live with the high CRC error rate associated with transfers on the bus.

As I said, I’ll continue to watch this thread and others regarding OneWire and (I think) awesome DS18B20s.

Unfortunately not on the particle-iot repo but on the upstream repo by Hotaman.
However, he has already merged that PR into his repo and Rick and Julien (both Particle) have also commented/contributed on that

Actually no
This is a “link” but to a specific version of that library

While this would pose a problem if the dependency-library got a fix and the dependent library wouldn’t pull that in - on the other hand, that same strategy should prevent a “bad fix” of the dependency from also killing the dependent library.
This is a canundrum that may want to be addressed - e.g. via an “override” feature where the dependency gets ignored when a project explicitly imports another version of the dependency-library.

BTW, @rickkas7 - could you activate the Issues feature in the this repo, please?

Scruff, I knew my attempt with that PR did not go well, I thought about trying again but worried I would make it worse, haha. And it’s a good thing that my “lack of knowledge” about library referencing was wrong, too.

Aaahhh…that’s actually what I was looking for and couldn’t seem to find a way to do it, but I did see that it was an option in other repositories.

@peekay123 you said in another thread: "The challenge with any OneWire library is that they use bit-banging with delayMicroseconds() for timing and disable interrupts to ensure that FreeRTOS doesn’t preempt use code while OneWire is doing its thing. The problem is that’s really not DeviceOS
friendly "

One of the first things that crossed my mind when the cause of OneWire’s failures was discovered …are there OTHER protocols/operations where disabling/enabling interrupts is so key to success: other hardware protocols implemented like OneWire? Are there any obvious ones? I could imagine there may be some software protocols that use these techniques as well…

This OneWire implementation was flawed from the day it was ported to the particle.io platform because of the introduction of those higher level system interrupts that can break into an assumed safe, uninterruptible window. Atomic Blocks “fixes it” for non-Mesh devices, but at some cost to the system.

With the introduction of Mesh SUPER-interrupts that can interrupt Atomic_Blocks (yes I am making up terms), I can only imagine the scope of the validation effort. Not my job though, haha.

It is great news that a solution seems to have been found. Thank you for all your efforts. I’d almost given up using OneWire with Particle.

So now, can anyone tell me what example software I could I use to let me reliably use a Xenon to poll 2 DS18B20 sensors?

If I have to copy and paste the atomic block modified code into a cpp file, which version should I use and how would I go about doing it?
Any help on this would be greatly appreciated.

Hey Colin, I would use the version of the onewire.cpp file in the first post of this thread…it works great for me, and for those that I have received feedback from. There doesn’t seem to be any real push to fix this in the official library, or to officially document the problem.

As far as coding examples, I am currently sampling six DS18B20 sensors in my Photon project. It uses the code that I posted in my thread linked (also in the first post of this thread). That code can be modified fairly easily but it is pretty different than other code that I have seen.

Thank you very much. Yes, the code is pretty different, but full of gems! I’m working on adapting it now. Thanks again.

Thank you so much for the great work here. I have struggled with the onewire lib for years. Add one more verification this fix works 100% on the Photon. Using 2 probes my error rates dropped from ~10% to 0.

Okay, well-meaning idiot here. :smile:

How do I update the OneWire library with the code earlier in this thread? I see it’s called by the DS18B20 library, but that library is called directly from Particle resources and is not in my firmware code. I’m using the Web IDE.

Any pointers in the right direction greatly appreciated.

Thanks!

Since 2.0.3 is the latest OneWire version in the cloud the DS18B20 library can’t really pull a newer version in. I’d reupload the DS18B20 lib if there was a newer public OneWire version.

Having said that, you can add new files to a Web IDE project via the (+) icon top right.
Then name the added files according to the library files you’d need to modify and copy paste the library codes into the respective file tabs.
It’s not comfortable but at least that way you can tweak the libraries to your needs.

@bsatrom, can the above fix be included in the official OneWire library and a new version made public?
BTW, the GitHub link is missing in 2.0.3 - having that available would make it easier to file issues and PRs on the library :wink:
Please also enable the Issues tab on the repo (as already mentioned half a year ago)

IMHO, official libraries should set an example in best practices on the meta-data side too :wink:

2 Likes

Thanks for flagging @ScruffR, I’ll get the library fixed this week. In the meantime, I’ve just enabled Issues for the repo.

1 Like

Thanks, @ScruffR. That did it.

I “removed” the integrated DS18B20 library from my build in the IDE, and text copied and pasted the .h and .cpp files for DS18B20, OneWire, and DS18 as new library file tabs.

As others have noted, my DS18B20 temp sensors now seem super-robust and reliable in their readings.

Thanks also to all above (@Jonpcar and others) who took the time and effort to figure this thing out and make things a bit better for all of us.

My electron would have good readings when using AUTOMATIC mode. However, when doing SEMI_AUTOMATIC it wouldn’t be reliable, after following the instructions on this post to modify the code it’s now running great in whatever mode I run it.
Thanks a lot for that, you saved a bunch of people’s headaches!

1 Like

Hey folks, just wanted to let you all know that the OneWire library has been updated with this fix, and published: https://build.particle.io/libs/OneWire/2.0.4/tab/DS18.cpp.

Thanks @ScruffR for the callout and @Jonpcar for the original fix!

4 Likes