Intermittent OneWire CRC Fail

I am using the official OneWire library to talk to a DS18B20 temperature sensor with a Core. The Core is fully updated to the most recent stable firmware, including the CC3000.

The problem is that the Core occasionally has a CRC error when talking to the DS18B20. I am reading the sensor every 15 seconds, and I get errors every few minutes. Sometimes it goes for an hour or two with no errors, then starts again. I have not noticed any time of day or any other correlations.

I am directly powering the DS18B20 from the 3.3V and Ground. I am using a 4.7K pull-up resistor.

The kicker is that I am running identical code on an ESP8266-12E with no errors.

Here is the code that is detecting and returning the error:

// *** Read the scratchpad and extract the temperature
// *** Sets *temp to the temperature in scratchpad
// *** Returns TRUE if the CRC passed, else FALSE
bool RetrieveTemperature(float *temp)  {

  byte i;
  byte data[9];
  byte present;

  // read the scratch pad
  present = ds.reset();
  ds.write(0xCC, 0);        // skip addressing the device because we only have 1
  ds.write(0xBE, 0);        // Read Scratchpad

  // read the values

  for ( i = 0; i < 9; i++) {           // we need 9 bytes
    data[i] = ds.read();
  }

  // If the CRC fails, return FALSE
  if (data[8] != OneWire::crc8(data, 8)) {
      return FALSE;
  }

  // Convert the data to actual temperature
  // because the result is a 16 bit signed integer, it should
  // be stored to an "int16_t" type, which is always 16 bits
  // even when compiled on a 32 bit processor.
  int16_t raw = (data[1] << 8) | data[0];

  *temp = ((float) raw * 0.0625);
  
  return TRUE;

} 

Is it possible that some background communications process is interrupting the timing of the OneWire protocol and causing intermittent read fails? Is there a set of commands I can wrap around the reads to protect them?

Good question; hopefully someone more knowledgeable will answer... I would assume that during the loop() nothing interrupts it, so that it can work just like an Arduino or other micro controllers.
It could also be external interfeirance.

Do you have pull-up resistors ([edit] see that you are)... how long are your wires? sometimes you have to account for wire resistance to determine the value of the pull-up.

@keithrussell could make a call to noInterrupts() do the read, and then re-enable interrupts with interrupts()

Actually, not so. Interrupts still occur unless explicitly disabled (atomic). Also, loop() gets a 1ms time slice from FreeRTOS and will get pre-empted when it takes more than 1ms, again unless single threading is explicitly enabled. Arduino leads a simple life and doesn't have to contend with a complex multi-tasking environment. I believe the OneWire library DOES disable interrupts during time critical code portions.

If you search this forum, there are plenty of topics regarding the DS18B20. Getting CRC errors can be caused by a lot of things including code, the firmware environment, wiring, electrical environment, etc. I would consider these errors part of using these sensors and write your code to include exception handling including CRC errors, wild temperatures, etc. :wink:

Good point. I had a 30-foot wire in the prototyping stage, but the installation has less than 24 inches of wire. There is no discernible difference between the performance at 18 inches vs. 30 feet. Also, I have swapped out probes. No difference.

[quote="peekay123, post:5, topic:28795, full:true"]
Actually, not so. Interrupts still occur unless explicitly disabled (atomic). Also, loop() gets a 1ms time slice from FreeRTOS and will get pre-empted when it takes more than 1ms, again unless single threading is explicitly enabled. [/quote]

So I suspected. Thanks for the confirmation.

I'll look into it.

Agreed. Error checking should always be a part of any communications protocol. However, OneWire is a current mode system, and is inherently very stable and immune to wiring differences and electrical noise. The results I am seeing are far outside the bit-error-rate we would expect from this system. Reading every 15 seconds, I should see an error once every few years, not a few times an hour.

I'm thinking about following up on two possibilities: 1) the OneWire communication routine is being interrupted at some critical time, or 2) the timing parts of the routine have a boundary condition where a read or write is too close to an edge to be reliable.

There is a distant third possibility that the dV/dt of either the transmit or receive signal is too low at the I/O pin switch point to provide a consistent edge. This highly unlikely, because then changes in wire distance and pull-up resistor would make changes in the system, and they don't seem to.

@keithrussell, exception handling should consider all possible failures, including FreeRTOS/Interrupts causing the intermittent CRC failure. The CRC failure is actually great since without it, your data would be crap (which exception handling should have caught!). The fact that 18 inches vs 30 feet shows no discernible difference indicates that the bit-banged OneWire library is susceptible to occasional failures. You could use a Maxim DS2484 I2C-to-1Wire interface to handle the OneWire protocol, especially if you have multiple sensors.

Agreed. Good idea. That would take all the timing problems, interrupt issues, and power and control questions completely off the table. That's probably why the DS2484 exists. :smile:

1 Like

OK, I looked at the library read and write routines, and they are generously sprinkled with noInterrupts() statements, but there is nothing preventing the RTOS from preempting any of the routines. So, I am going to wrap my calls to the library with SINGLE_THREADED_BLOCK() statements. The ATOMIC_BLOCK() would be overkill since it adds only interrupt suspension, and would likely turn off interrupts too long.

I shall report back.

2 Likes

I have wrapped the calls to the OneWire library with SINGLE_THREADED_BLOCK() statements, and the errors persist.

My conclusion is that the OneWire library has a boarder condition or other timing fault in the library itself. My next step is to check the communications lines with a logic analyzer, find the timing error and repair the library.