OneWire Library Bug / Proposed FIX (June 2019) Affects DS18B20 and other OneWire devices

UPDATE: Original Post starts below

PROBLEM:

The current particle.io implementation of the OneWire protocol has a unique problem: transfers on the OneWire are subject to being “interrupted” by higher level system functions/calls. When this occurs inside the critical window of a OneWire “bit transfer”, that transfer will fail and result in a CRC error.

Solutions should always be designed to deal with a low percentage of CRC errors, but the current particle.io implementation has a rate that is too high.

Particularly troublesome, this “bit transfer fail” seems to cause havoc for the OneWire “Search for OneWire devices” algorithm which is "bit transfer" intensive. Many applications use this OneWire function to “find” and enumerate devices attached to the bus. With this “interrupt issue”, it is very common to see this algorithm fail to find the addresses of all attached devices, especially when there are 3+ devices attached to the bus.

This bug has been around for a number of years, probably every since the bus was ported to the particle.io platform. A search yields many, many old threads fighting this problem.

SOLUTION:

Copy the code in the window below to a onewire.cpp file then include that file in your project build directory.

Protect the critical windows of the OneWire protocol by REALLY blocking all interrupts during those critical "bit transfer" times. Although the OneWire protocol DOES implement a disable/enable interrupt procedure around these windows, higher level system operations are not prevented making their way in.

For the following devices, there is a FIX that works. That FIX uses ATOMIC_BLOCKS around the critical OneWire code to protect the bit transfers. That FIX works for:

Photon, Argon (mesh disabled), Xenon (mesh disabled), Boron (mesh disabled).

You see the pattern….currently a solution for “mesh enabled” devices is being sought.

With the FIX, the programs I am running have over 2 million temperature conversions without a single CRC error. This OneWire bus is VERY stable when an interrupt can not disrupt the transmission of a bit in progress. That's really the way buses are designed to work.

So far, a few particle.io users have verified that this FIX works for them. Thanks everyone!

jonpcar reports that the fix works 100% for Photon (system thread enabled or not)
@picsil reports that the fix works 100% for Boron LTE (no mesh)
@remcohn reports the fix works 100% for Argon (no mesh)
@fragma reports the fix works 100% for Xenon (no mesh)

@remcohn reports the fix doesn't work for xenon (mesh) or argon (mesh)
@fragma reports the fix doesn't work for xenon (mesh)


Original Post Below

PREVIEW : I am now getting ZERO errors reading my DS18B20 devices on the OneWire bus. I’m looking at a spreadsheet where my latest experiment has run ~30 temperature conversions per second, over 140K in 2 1/2 hours, without a single CRC error.

I am a newbie and have been working on a Photon project for the last few months Recently, I’ve been working on some sensors, one of which is the OneWire DS18B20 temperature sensor. After reading all the threads and fumbling through some code, I posted a project and was pretty happy with the results … I was ready to move on. Thanks @Bear for the help/discussion there. Here is that link:

To recap briefly: I struggled, as many have, with this sensor/bus and ended up modifying some code I sucked out of the various libraries.

Here is what I liked about my solution:

  • Five DS18B20 devices, all on the same bus and getting accurate readings at about 1/second
  • Less than 1% errors, but I had abandoned the CRC and was doing my own checks

Here is what I didn’t like:

  • My system was still in the “lab” and error rates were much lower than those I had read about
  • I wasn’t using CRC
  • Errors went up to roughly 5% when I enabled multi-threading testing, I suspected they would go higher when I start combining facets of my project.

So, yesterday I added CRC and some other test code to figure out why the failures were happening. It looked like 100% of the bus failures were due to two cases:

(1) a DS18B20 didn’t recognize that it was being asked to drive back its data, so the single wire bus floats to ‘1: a CRC error results and the raw temperature read as all ones (-1).

(2) real CRC errors occurred because the OneWire protocol failed to read any random bit (or more, but usually not) of the 72 bits serialized over the bus as a ‘0; the Photon would see it as a ‘1. I never saw the opposite case.

So, I started looking at timings in the OneWire library and found a few that I thought weren’t quite to spec, one by quite a bit. I changed those timings and the CRC error rate was lowered back down to about 1%.

Then the Eureka moment…I had perused the OneBus code a few days ago and had thought it was implemented very well…bit twiddling and disabling/enabling interrupts at all the right places to implement the “bit at a time” protocol (my background, 30+ years ago was assembly code). But I am not at all familiar with the Photon.

Having just enabled the Photon’s multi-threading capability, I had read something about ATOMIC_BLOCKS. I went back and decided I would place those around the most critical (in my view) portions of the OneBus code (I left the interrupt stuff in because I want to ask about what actually interrupts vs ATOMIC_BLOCKS accomplish). In any case, I thought the two were going to be redundant…NOT.

I haven’t seen a single CRC error since I changed that OneWire.cpp file…except the ones I induce by unhooking one of my DS18B20s from the bus. It’s perfect.

So, I am hoping there is some interest by others who are using DS18B20s, to try it out. Maybe the easiest way is by replacing the library OneWire file in your own application, but I am not sure because pretty much all of this is new to me.

Here is the edited OneWire.cpp file. The changes I have made can be found by searching for my initials: JC

Also, I realize that there is a discussion that would take place that there are two issues here and are independent from one another:

  1. ATOMIC_BLOCKS
  2. Should the timings even be changed? Not necessarily...(1) still works.
/*

Particle Verison of OneWire Libary

Hotaman 2/1/2016
Bit and Byte write functions have been changed to only drive the bus high at the end of a byte when requested.
They no longer drive the bus for High bits when outputting to avoid a holy war.
Some folks just can't accept that a 10K resistor works just fine when the calculation calls for 10,042.769 ohms.
Bit and Byte writes are now 100% compliant with specs and app notes.

Support for P1 and Electron added by Hotaman 11/30/2015

Support for Photon added by Brendan Albano and cdrodriguez
- Brendan Albano 2015-06-10

I made monor tweeks to allow use in the web builder and created this repository for
use in the contributed libs list.

6/2014 - Hotaman 

I've taken the code that Spark Forum user tidwelltimj posted 
split it back into separte code and header files and put back in the 
credits and comments and got it compiling on the command line within SparkCore core-firmware


Justin Maynard 2013

Original Comments follow

Copyright (c) 2007, Jim Studt  (original old version - many contributors since)

The latest version of this library may be found at:
  http://www.pjrc.com/teensy/td_libs_OneWire.html

OneWire has been maintained by Paul Stoffregen (paul@pjrc.com) since
January 2010.  At the time, it was in need of many bug fixes, but had
been abandoned the original author (Jim Studt).  None of the known
contributors were interested in maintaining OneWire.  Paul typically
works on OneWire every 6 to 12 months.  Patches usually wait that
long.  If anyone is interested in more actively maintaining OneWire,
please contact Paul.

Version 2.2:
  Teensy 3.0 compatibility, Paul Stoffregen, paul@pjrc.com
  Arduino Due compatibility, http://arduino.cc/forum/index.php?topic=141030
  Fix DS18B20 example negative temperature
  Fix DS18B20 example's low res modes, Ken Butcher
  Improve reset timing, Mark Tillotson
  Add const qualifiers, Bertrik Sikken
  Add initial value input to crc16, Bertrik Sikken
  Add target_search() function, Scott Roberts

Version 2.1:
  Arduino 1.0 compatibility, Paul Stoffregen
  Improve temperature example, Paul Stoffregen
  DS250x_PROM example, Guillermo Lovato
  PIC32 (chipKit) compatibility, Jason Dangel, dangel.jason AT gmail.com
  Improvements from Glenn Trewitt:
  - crc16() now works
  - check_crc16() does all of calculation/checking work.
  - Added read_bytes() and write_bytes(), to reduce tedious loops.
  - Added ds2408 example.
  Delete very old, out-of-date readme file (info is here)

Version 2.0: Modifications by Paul Stoffregen, January 2010:
http://www.pjrc.com/teensy/td_libs_OneWire.html
  Search fix from Robin James
    http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1238032295/27#27
  Use direct optimized I/O in all cases
  Disable interrupts during timing critical sections
    (this solves many random communication errors)
  Disable interrupts during read-modify-write I/O
  Reduce RAM consumption by eliminating unnecessary
    variables and trimming many to 8 bits
  Optimize both crc8 - table version moved to flash

Modified to work with larger numbers of devices - avoids loop.
Tested in Arduino 11 alpha with 12 sensors.
26 Sept 2008 -- Robin James
http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1238032295/27#27

Updated to work with arduino-0008 and to include skip() as of
2007/07/06. --RJL20

Modified to calculate the 8-bit CRC directly, avoiding the need for
the 256-byte lookup table to be loaded in RAM.  Tested in arduino-0010
-- Tom Pollard, Jan 23, 2008

Jim Studt's original library was modified by Josh Larios.

Tom Pollard, pollard@alum.mit.edu, contributed around May 20, 2008

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Much of the code was inspired by Derek Yerger's code, though I don't
think much of that remains.  In any event that was..
    (copyleft) 2006 by Derek Yerger - Free to distribute freely.

The CRC code was excerpted and inspired by the Dallas Semiconductor
sample code bearing this copyright.
//---------------------------------------------------------------------------
// Copyright (C) 2000 Dallas Semiconductor Corporation, All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a
// copy of this software and associated documentation files (the "Software"),
// to deal in the Software without restriction, including without limitation
// the rights to use, copy, modify, merge, publish, distribute, sublicense,
// and/or sell copies of the Software, and to permit persons to whom the
// Software is furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included
// in all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
// OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
// MERCHANTABILITY,  FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
// IN NO EVENT SHALL DALLAS SEMICONDUCTOR BE LIABLE FOR ANY CLAIM, DAMAGES
// OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
// ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
// OTHER DEALINGS IN THE SOFTWARE.
//
// Except as contained in this notice, the name of Dallas Semiconductor
// shall not be used except as stated in the Dallas Semiconductor
// Branding Policy.
//--------------------------------------------------------------------------
*/

#include "OneWire.h"
#include "application.h"

OneWire::OneWire(uint16_t pin)
{
    pinMode(pin, INPUT);
    _pin = pin;
}
// Perform the onewire reset function.  We will wait up to 250uS for
// the bus to come high, if it doesn't then it is broken or shorted
// and we return a 0;
//
// Returns 1 if a device asserted a presence pulse, 0 otherwise.
//
uint8_t OneWire::reset(void)
{
    uint8_t r;
    uint8_t retries = 125;

    noInterrupts();
    pinModeFastInput();
    interrupts();
    // wait until the wire is high... just in case
    do {
        if (--retries == 0) return 0;

        delayMicroseconds(2);
    } while ( !digitalReadFast());

    noInterrupts();

    digitalWriteFastLow();
    pinModeFastOutput();   // drive output low

    interrupts();
    delayMicroseconds(480);

    //noInterrupts();               //JC take out interrupt disable
    ATOMIC_BLOCK(){                 //JC put in ATOMIC BLOCK

        pinModeFastInput();    // allow it to float

        delayMicroseconds(70);  

        r =! digitalReadFast();
    }                               //JC END Atomic Block
    //interrupts();                 //JC put in ATOMIC BLOCK

    delayMicroseconds(410);

    return r;
}

void OneWire::write_bit(uint8_t v)
{
    if (v & 1) {
        //noInterrupts();               //JC take out interrupt disable
        ATOMIC_BLOCK(){                 //JC put in ATOMIC BLOCK

            digitalWriteFastLow();
            pinModeFastOutput();   // drive output low

            delayMicroseconds(1);       //JC 10 -> 1

            pinModeFastInput();    // float high
        }                               //JC END Atomic Block
        //interrupts();                 //JC take out interrupt enable

        delayMicroseconds(59);          //JC 55 -> 59 (spec 60 = 1+59)
    } else {
        //noInterrupts();               //JC take out interrupt disable
        ATOMIC_BLOCK(){                 //JC put in ATOMIC BLOCK

            digitalWriteFastLow();
            pinModeFastOutput();   // drive output low

            delayMicroseconds(65);      //JC 65 -> 60 (spec 60)

            pinModeFastInput();    // float high
        }                               //JC END ATOMIC BLOCK
        //interrupts();                 //JC take out interrupt enable

        delayMicroseconds(5);           //JC Leave it...but note it requires RC time constant rise
    }
}

//
// Read a bit. Port and bit is used to cut lookup time and provide
// more certain timing.
//
uint8_t OneWire::read_bit(void)
{
    uint8_t r;

    //noInterrupts();               //JC take out interrupt disable
    ATOMIC_BLOCK(){                 //JC put in ATTOMIC BLOCK
        digitalWriteFastLow();
        pinModeFastOutput();

        delayMicroseconds(1);       //JC 3 -> 1

        pinModeFastInput();    // let pin float, pull up will raise

        delayMicroseconds(13);      //JC 10 -> 13 (spec 15 max, read at ~14 = 1 + 13)

        r = digitalReadFast();
    }                               //JC END Atomic Block
    //interrupts();                 //JC take out interrupt enable

    delayMicroseconds(46);          //JC 53 -> 46 (spec 60 = 1 + 13 + 46)

    return r;
}

//
// Write a byte. The writing code uses the active drivers to raise the
// pin high, if you need power after the write (e.g. DS18S20 in
// parasite power mode) then set 'power' to 1, otherwise the pin will
// go tri-state at the end of the write to avoid heating in a short or
// other mishap.
//
void OneWire::write(uint8_t v, uint8_t power /* = 0 */) 
{
    uint8_t bitMask;

    for (bitMask = 0x01; bitMask; bitMask <<= 1) {
        OneWire::write_bit( (bitMask & v)?1:0);
    }

    if ( power) {
        noInterrupts();

        digitalWriteFastHigh();
        pinModeFastOutput();        // Drive pin High when power is True

        interrupts();
    }
}

void OneWire::write_bytes(const uint8_t *buf, uint16_t count, bool power /* = 0 */) 
{
    for (uint16_t i = 0 ; i < count ; i++)
        write(buf[i]);

    if (power) {
        noInterrupts();

        digitalWriteFastHigh();
        pinModeFastOutput();        // Drive pin High when power is True

        interrupts();
    }
}

//
// Read a byte
//
uint8_t OneWire::read() 
{
    uint8_t bitMask;
    uint8_t r = 0;

    for (bitMask = 0x01; bitMask; bitMask <<= 1) {
        if ( OneWire::read_bit()) r |= bitMask;
    }

    return r;
}

void OneWire::read_bytes(uint8_t *buf, uint16_t count) 
{
    for (uint16_t i = 0 ; i < count ; i++)
        buf[i] = read();
}

//
// Do a ROM select
//
void OneWire::select(const uint8_t rom[8])
{
    uint8_t i;

    write(0x55);           // Choose ROM

    for (i = 0; i < 8; i++) write(rom[i]);
}

//
// Do a ROM skip
//
void OneWire::skip()
{
    write(0xCC);           // Skip ROM
}

void OneWire::depower()
{
    noInterrupts();

    pinModeFastInput();

    interrupts();
}

#if ONEWIRE_SEARCH

//
// You need to use this function to start a search again from the beginning.
// You do not need to do it for the first search, though you could.
//
void OneWire::reset_search()
{
    // reset the search state
    LastDiscrepancy = 0;
    LastDeviceFlag = FALSE;
    LastFamilyDiscrepancy = 0;

    for(int i = 7; ; i--) {
        ROM_NO[i] = 0;
        if ( i == 0) break;
    }
}

// Setup the search to find the device type 'family_code' on the next call
// to search(*newAddr) if it is present.
//
void OneWire::target_search(uint8_t family_code)
{
   // set the search state to find SearchFamily type devices

   ROM_NO[0] = family_code;

   for (uint8_t i = 1; i < 8; i++)
      ROM_NO[i] = 0;

   LastDiscrepancy = 64;
   LastFamilyDiscrepancy = 0;
   LastDeviceFlag = FALSE;
}

//
// Perform a search. If this function returns a '1' then it has
// enumerated the next device and you may retrieve the ROM from the
// OneWire::address variable. If there are no devices, no further
// devices, or something horrible happens in the middle of the
// enumeration then a 0 is returned.  If a new device is found then
// its address is copied to newAddr.  Use OneWire::reset_search() to
// start over.
//
// --- Replaced by the one from the Dallas Semiconductor web site ---
//--------------------------------------------------------------------------
// Perform the 1-Wire Search Algorithm on the 1-Wire bus using the existing
// search state.
// Return TRUE  : device found, ROM number in ROM_NO buffer
//        FALSE : device not found, end of search
//
uint8_t OneWire::search(uint8_t *newAddr)
{
    uint8_t id_bit_number;
    uint8_t last_zero, rom_byte_number, search_result;
    uint8_t id_bit, cmp_id_bit;

    unsigned char rom_byte_mask, search_direction;

    // initialize for search
    id_bit_number = 1;
    last_zero = 0;
    rom_byte_number = 0;
    rom_byte_mask = 1;
    search_result = 0;

    // if the last call was not the last one
    if (!LastDeviceFlag)
    {
        // 1-Wire reset
        if (!reset()){
            // reset the search
            LastDiscrepancy = 0;
            LastDeviceFlag = FALSE;
            LastFamilyDiscrepancy = 0;

            return FALSE;
        }

        // issue the search command
        write(0xF0);

        // loop to do the search
        do
        {
            // read a bit and its complement
            id_bit = read_bit();
            cmp_id_bit = read_bit();

            // check for no devices on 1-wire
            if ((id_bit == 1) && (cmp_id_bit == 1)){
                break;
            }
            else
            {
                // all devices coupled have 0 or 1
                if (id_bit != cmp_id_bit){
                    search_direction = id_bit;  // bit write value for search
                }
                else{
                    // if this discrepancy if before the Last Discrepancy
                    // on a previous next then pick the same as last time
                    if (id_bit_number < LastDiscrepancy)
                        search_direction = ((ROM_NO[rom_byte_number] & rom_byte_mask) > 0);
                    else
                        // if equal to last pick 1, if not then pick 0
                        search_direction = (id_bit_number == LastDiscrepancy);

                    // if 0 was picked then record its position in LastZero
                    if (search_direction == 0){
                        last_zero = id_bit_number;

                        // check for Last discrepancy in family
                        if (last_zero < 9)
                            LastFamilyDiscrepancy = last_zero;
                    }
                }

                // set or clear the bit in the ROM byte rom_byte_number
                // with mask rom_byte_mask
                if (search_direction == 1)
                  ROM_NO[rom_byte_number] |= rom_byte_mask;
                else
                  ROM_NO[rom_byte_number] &= ~rom_byte_mask;

                // serial number search direction write bit
                write_bit(search_direction);

                // increment the byte counter id_bit_number
                // and shift the mask rom_byte_mask
                id_bit_number++;
                rom_byte_mask <<= 1;

                // if the mask is 0 then go to new SerialNum byte rom_byte_number and reset mask
                if (rom_byte_mask == 0)
                {
                    rom_byte_number++;
                    rom_byte_mask = 1;
                }
            }
        }while(rom_byte_number < 8);  // loop until through all ROM bytes 0-7

        // if the search was successful then
        if (!(id_bit_number < 65))
        {
            // search successful so set LastDiscrepancy,LastDeviceFlag,search_result
            LastDiscrepancy = last_zero;

            // check for last device
            if (LastDiscrepancy == 0)
                LastDeviceFlag = TRUE;

            search_result = TRUE;
        }
    }

    // if no device found then reset counters so next 'search' will be like a first
    if (!search_result || !ROM_NO[0]){
        LastDiscrepancy = 0;
        LastDeviceFlag = FALSE;
        LastFamilyDiscrepancy = 0;
        search_result = FALSE;
    }

    for (int i = 0; i < 8; i++) newAddr[i] = ROM_NO[i];

    return search_result;
}

#endif

#if ONEWIRE_CRC
// The 1-Wire CRC scheme is described in Maxim Application Note 27:
// "Understanding and Using Cyclic Redundancy Checks with Maxim iButton Products"
//


//
// Compute a Dallas Semiconductor 8 bit CRC directly.
// this is much slower, but much smaller, than the lookup table.
//
uint8_t OneWire::crc8( uint8_t *addr, uint8_t len)
{
    uint8_t crc = 0;

    while (len--) {
        uint8_t inbyte = *addr++;
        for (uint8_t i = 8; i; i--) {
            uint8_t mix = (crc ^ inbyte) & 0x01;
            crc >>= 1;
            if (mix) crc ^= 0x8C;
                inbyte >>= 1;
        }
    }

    return crc;
}
#endif

#if ONEWIRE_CRC16
bool OneWire::check_crc16(const uint8_t* input, uint16_t len, const uint8_t* inverted_crc, uint16_t crc)
{
    crc = ~crc16(input, len, crc);

    return (crc & 0xFF) == inverted_crc[0] && (crc >> 8) == inverted_crc[1];
}

uint16_t OneWire::crc16(const uint8_t* input, uint16_t len, uint16_t crc)
{
    static const uint8_t oddparity[16] =
        { 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0 };

    for (uint16_t i = 0 ; i < len ; i++) {
        // Even though we're just copying a byte from the input,
        // we'll be doing 16-bit computation with it.
        uint16_t cdata = input[i];
        cdata = (cdata ^ crc) & 0xff;
        crc >>= 8;

        if (oddparity[cdata & 0x0F] ^ oddparity[cdata >> 4])
            crc ^= 0xC001;

        cdata <<= 6;
        crc ^= cdata;
        cdata <<= 1;
        crc ^= cdata;
    }

    return crc;
}
#endif

2 Likes

@Jonpcar Thank you! I copied your code directly over OneWire.cpp in my own project, and it compiled perfectly the first time. I notice that readings take slightly longer than before, but it is definitely more accurate. I’ve only been running with it for a few minutes, but so far no retries with readings every 20 seconds. I was getting between 1 and 20 retries previously.

Glad you could verify it! Hope it continues to work. The code will change, definitely as the interrupt enabling/disabling is redundant to the ATOMIC BLOCKS. I did test the fixes without threading and it seems to work just as well. I wasn’t sure if the ABs had any impact on non-threading, but they seem to.

I wouldn’t expect the code to slow down noticeably but possibly if you have some timing mechanism that reads millis(). I made those timing changes rather quickly and some of my reasoning may be off and would definitely be up for discussion, but in the scheme of the protocol,they don’t have much impact.

I copied this over from another thread…so discussion about this issue doesn’t get too spread out.

remcohn

Hi,
I am not using the DS18B20 part (yet?) either.
I have copied your modified version of oneWire.cpp/.h with the atomic blocks, and im still getting the same issue.

to narrow down the issue, i switched to a single device and using ‘ReadRom’. and i noticed its making single-bit errors. its always reading a (different) bit as 1 while it should be a zero.

Reducing the delayMicroseconds in the read_bit() funciton from 12 (from your example) to 1 seems to reduce the number of errors.
I also noticed that in your example, you use both noInterrupts() followed by an ATOMIC_BLOCK(). If i read the manual correctly, this should not be needed?

I will try it with an Argon later today. may be the mesh connection is messing with the timing even inside an ATOMIC_BLOCK()

Reply

Jonpcar

1m

I played with the numbers as you have too (going to extremes in some cases, like you have to test them). That particular spec, Read Data Valid, is one I also had down at ‘1’ for a while (zero didn’t work, haha). And yes it improves things (because it reduces the chance of some type of interrupt coming in and messing up your data bit read)…but keeping it at ‘1’ will really ruin any chance of extending the OneWire very far (distance) or putting more devices on the same line (capacitances/resistances). In fact, I think that particular spec (Read Data Valid) is the ONE timing that is key for the entire bus. It sets both the “drive 0 time” (falling edges) and the “RC timeconstant pullup time” (rising edges). I am making up terms because I don’t know the real ones.

Once I realized it was some type of interrupt still getting into the middle of those data bit reads/writes, I adjusted the timings back to “spec” numbers.

And, yes. I realize that ATOMIC BLOCKs and interrupts are redundant for the Photon…but I don’t know about all the other devices that use this library and didn’t want to mess those up, so I left them in there for discussion.

But…since I want the bus to be fixed, can you ask any further questions on the other thread (THIS THREAD…I MOVED IT)?..I want to get more eyes on the Library thread so that it gets fixed for everybody.

I suspect there might be a specific reason why the ATOMIC BLOCK “fix” may not have worked for your XENON.

i set the delay back to 12uS, and made a pin high just before we pull the dataline low, and set the pin high again after the bit is read. usually this is 27uS. bit long for my taste, for a piece of code that has 3+12uS delay and sets a few pins, but ok. it is still 7uS before the device lets go of the bus again when its sending a 0.

But sometimes that 27uS gets stretched. i have seen 43 - 78uS, way longer then allowed, and past the time the device keeps driving the bus.

Clearly something is messing with the timing, even inside a ATOMIC_BLOCK().

I can post in your threat, but i dont think its the same issue. or at least the lack of ATOMIC_BLOCK() in the library isnt all thats needed to solve my issue on an Xenon.

My next steps: test on an Argon and see if i can dig out a free Photon somewhere

Thanks remcohn…it’s possible the ATOMIC_BLOCK may not work for the XENON.

But I think we both agree that some type of interrupt is getting into the middle of those bit transmissions/receptions. In my case with the Photon, the ATOMIC BLOCK fixed that. I don’t know what the fix is for the XENON, but I believe it is the same underlying issue as your timings seem to show.

But further changes of the timings will not fix the interrupt issue; it may improve it but it can’t fix it. I think we are on the right track.

I posted in your threat too just now, but to complete the info here too: its mesh related. on an Argon with mesh disabled (this is important!), and your fixes, the search is 100% reliable.

I have worked with many nRF project ever since they where released, and timing has always been fucked up on them. its very hard to work around the sucky limitations of the blob they provide. This is definitely something for the Particle developers to comment on. Afaik, the nordic blob has an interface where you can ask how long you got till an interrupt happens. may be that can be made available to us so we can stop sending bits and wait for the blob to finish its job.

YEAH…now we are getting somewhere. Thanks remcohn! Sorry, I only have two Photons, can’t help with anything else.

Not the first time this solution has been brought up. @keithrussell came so close......

Back in the day (haha), we used to have a DI command that would Disable (ALL) Interrupts...except for the NMI (Non-Maskable Interrupt).

The OneWire bit transmission requires about 60us, is it really hard to lock down code for that length of time?

By the way, remcohn. I missed (didn't understand) this comment in your post until I reread it. My attempt at adjusting this timing was to get the Photon to sample right at the end of window when the data was guaranteed to be valid (Read Data Valid 15us). That maximizes the response time for devices on the bus.

But you have the logic analyzer to show that my attempt does not meet the intent, and actually violates the spec. There may be OneWire devices that don't drive the data back that long (27us). You are right to be uncomfortable with that ...you obviously know this "timing business" much better than me, what do you think?

To get right near the spec (15us), it seems that the numbers should be the (3+1us) that we both tried and agree, gave the best results. But that assumes that all devices that use the library have about 10-13us of built-in overhead for those few lines that toggle the pins. But real life is different than specs...if real devices drive longer than that Read Data Valid 15us window, then it might be best to take advantage of it.

For those trying to follow this conversation: the current delay() numbers in the OneWire.h for Read Data Valid are (3us+10us) values. 3us to drive the bus to a 0, 10us to read back the response (bus will stay low if data being read is a '0 and will be pulled high by the resistor if a '1 is to be read). I changed that to (3us + 12us = 15us spec)...but in real life it may be too much based on the remcohn's picture which reads on his xenon as an actual delay of 27us. The current library value of (3us+10us) is about 10us over the spec...but probably ok in real life. The "longer" this value, the more exposure to an interrupt on data reads and corruption of the bit.

Of course...changing these timings do not solve the underlying problem of the "interrupt interference" that will occur in any window no matter how small/large unless they can be blocked.

my timing of 27us isnt 100% ‘fair’ either. it doesnt pinpoint exactly where the sample is beeing taken, since i dont know how much delay there is after the read untill the pin is set low again.

The falling edge of the bus to the rising edge of the ‘indicator’ pin is 24uS, and the read happens a few uS before that.

But more important: my timings are for a Xenon, Argon, and most likely Boron too. In the OneWire.h, there are 3 differend ways defined to set the pins, for the Core, Photon/Electron and ‘others’, where i assume ‘others’ are Xenon,Argon,Boron, but the ‘else’ does make me nervous as well. i would rather see a clearly defined method for 3rd gen devices.
The defines for the 1st and 2nd gen devices write almost directly to the hardware, while the ‘else’ for 3rd gen, uses the Hardware Abstraction Layer (HAL), so it will be (much) slower, see this comment:

      // This could probably be speed up by digging a little deeper past
      // the HAL_Pin_Mode function.

For 1st/2nd gen, your 12uS delay would be closer to perfect then 1uS. but just use what works for you :slight_smile: let it read as fast as it can for a few hours try to fine the upper and lower bound where it still works, then pick a number in the middle.

But like you said, none of this is related to ATOMIC_BLOCK() not working as specified when the new Mesh is involved.

haha...mine works PERFECTLY now, ZERO errors. I'm in this more because I feel sorry for this little bus. There are SO many failure attempts at using it in postings on this site...When I read about this bus, I thought it was the coolest concept...(I used to design microcontrollers at Intel). I would have NEVER thought a one wire bus was viable. But truthfully, almost no-one on this particle site has had real success with the intent of this bus. I have seen no examples of people successfully hooking up 4+ devices to a single wire...I might have missed them.

Buses are designed to be robust...and I think this bus is...even with its software implementation. LOTS of devices can be attached to this bus, and are, with other microcontrollers. But like you, I almost gave up on it because the protocol to scan addresses cannot successfully make it through more than 3 or so devices. There are too many bus operations that need to take place for that function to be successful when 4+ devices are on the bus...probably all due to this "interrupt issue"

Adding to my previous comment on this new code, I still see exactly 1 attempt on both of the DS18B20 sensors I currently have attached. One of the sensors is a known to be authentic DS18B20+ I sourced from Mouser. Another is an “assumed to be authentic” waterproof temperature probe from Adafruit. I just swapped the latter for a cheap Aliexpress waterproof temperature probe of questionable authenticity. It also reads accurately in one attempt every time.

I’m using Boron LTE, and mesh networking is disabled. I do not currently use BLE or NFC.

At this point, even though it’s still early, maybe you should consider forking this library and publishing it.

On a Xenon I swapped in your OneWire in my test code which is a modified version of the DS18b20 libraries single drop example. It did help a bit, but not as much as you’re seeing. A 30min test (~1800 reads) before and after the change showed roughly a 21% vs 16% error rate; I repeated it a few times. I define as error as failing the CRC check or returning exactly 0C which I’ve seen pass the CRC in the past. This is an external/waterproof version which I’ve found can have about +10% higher error rate than the raw chip, based on the 20 or so I’ve tested.

Hey Picsil, truthfully I have a github account but I really have no clue how to use it for real purposes yet. I think we need to hear from the particle.io developers…with this being posted late on a Friday afternoon, I suspect they haven’t had time to bring any kind of focus to this. Those guys work their rears off, answering questions/issues at all hours of the workweek…they deserve weekends. Hopefully, they’ll have some answers for the mesh device issues AND I think there needs to be a bit more thought about the timings… remchon’s logic analyzer pointed out a clear shortcoming in my by-the-book methodology.

@picsil, don’t know what your setup is like, and I know nothing about mesh…but is it possible to simply enable mesh on your Boron and change nothing else to see if the failures “come back”, like can be done for system thread enabling. I realize it may not be that simple, you might actually have to USE mesh to see the failures.

Fragma, thanks for giving it a shot…your reporting is similar to renchon’s, the fix doesn’t seem to work with xenon (mesh or not). Something is different with that device. Interrupts must not get totally disabled with the atomic blocks.

The timing changes I made to the file are probably the main reason you are seeing the small improvement. Not because of the Read Valid timing discussed earlier in this thread…But I suspect because of the write 1 timing change. I think that library number was out of spec. I decreased it from 10 to 3 and saw improvement in my pre-atomic block numbers… but the spec actually says 1, which I was a bit nervous about committing to. Not sure whether the reduced timing actually makes it work better for bus timing issues, or because there is simply a smaller window for the interrupts to raise havoc. I suspect the latter.

I’m using cheap 5 for $16 sensors…I ran in the 9-bit mode all last night, six samples per second on all five sensors…well over a million transactions on the bus without a single crc error.

Try to hook up a USB cable for debugging and make some timer that toggles the mesh on an off in a 60 second interval and see what happens Mesh.off() / Mesh.on(). my guess would be that you see all errors disappear when the mesh is off.

remchon, are you suggesting that @Fragma try that experiment with a xenon ( both you and Fragma have reported that the xenon doesn’t get “fixed” with either mesh/no mesh)?

Or did you mean to suggest @picsil try that technique with his Boron which he says is “fixed”, but with NO mesh.

Or the other possibility: I am wondering if you are finding that the xenon IS working (some of the time) by toggling the mesh on/off?

I have not tried the Xenon with Mesh off, but i would suspect it would work. but since the Xenon doesnt have wifi, you would have to toggle the mesh on and off, or flash over USB.