Millis() and micros() lose time when interrupts are disabled

I’m trying to drive some LED strips at a fixed framerate of 60fps using the ported Adafruit Neopixel library, but recently I noticed that they were actually refreshing at ~48fps. After some debugging, I traced the loss of time to the Adafruit_NeoPixel::show() function. Within this function, there is a period of time in which interrupts are disabled (__disable_irq()) and asm no-op instructions are issued in order to accurately time the output of pixel data to the LEDs.

My first thought was that when interrupts are disabled, the SysTick interrupt was being prevented from firing, which would explain why my interval-based code runs at a lower frequency than it should. To test this, I commented out the lines to __disable/enable_irq() but to my surprise, there was still a timing drift proportional to the number of LEDs I instantiated the neopixel object with.

The only remaining explanation I can think of is that the asm nop instructions are somehow uninterruptible by the SysTick IRQ, which is causing millis/micros() to report incorrect times? Does anyone know what else might be causing this?

The code I’ve been using to experiment is pretty simple:

#include "application.h"
#include "neopixel.h"

SYSTEM_MODE(AUTOMATIC);

#define USE_MICROS 0
#define NUM_PIXELS 10

uint32_t prev_clk = 0;
Adafruit_NeoPixel strip(NUM_PIXELS, D6, WS2812B);

void setup() {
  pinMode(D4, OUTPUT);
  Serial.begin(9600);
}

void loop() {
#if USE_MICROS
  if ((micros() - prev_clk) >= 16000) {
#else
  if ((millis() - prev_clk) >= 16) {
#endif
    digitalWrite(D4, HIGH);
    strip.show();
    digitalWrite(D4, LOW);
#if USE_MICROS
    prev_clk = micros();
#else
    prev_clk = millis();
#endif
  }
}

I’m using a logic analyzer to capture D4 and measuring fps as 1000/(time between rising edges). The show() function should be run every 16ms but for me runs about every 19ms.

Thanks!

Sorry, there was a bug in my last post. It turns out that commenting out the interrupt enable/disable really does solve the issue, so the problem is definitely that the SysTick interrupt is prevented from ticking the system time counters.

The corrected code is as follows:

#include "application.h"
#include "neopixel.h"

SYSTEM_MODE(AUTOMATIC);

#define USE_MICROS 0
#define NUM_PIXELS 120

uint32_t prev_clk = 0;
Adafruit_NeoPixel strip(NUM_PIXELS, D6, WS2812B2);

void setup() {
  pinMode(D4, OUTPUT);
  Serial.begin(9600);
}

void loop() {
#if USE_MICROS
  if ((micros() - prev_clk) >= 16000) {
    prev_clk = micros();
#else
  if ((millis() - prev_clk) >= 16) {
    prev_clk = millis();
#endif
    digitalWrite(D4, HIGH);
    strip.show();
    digitalWrite(D4, LOW);
  }
}

I don’t see any interrupt enable/disable code in the snippets above. Either way, you shouldn’t be blocking interrupts for more than a handful of microseconds ideally to keep the system running.

millis() requires interrupts to tick. micros() will continue ticking with interrupts disabled if you are on firmware 0.4.4 or later.

I completely agree with your point about interrupt policy, but those calls were in the Adafruit Neopixel library to guarantee proper timing of the one-wire data protocol. I guess the answer is to use micros() for accurate timekeeping, but if anything I saw worse timing result with micros() than with millis() (even with fw v0.4.4). I’ll try again and report back here.

Based on this thread and this implementation, I don’t think micros() is any less prone to missed ticks issue except at the sub-millisecond level. Since I’m trying to time 1000/60 = 16.666…ms, I’m seeing the same problem as when I use millis(). Granted, it’s not a bug or anything - I just wanted to confirm that this is behavior I should be seeing. Since the timing of the one-wire protocol is deterministic, I can easily work around the lost time.

Thanks!

As you see from the implementation linked, micros() uses the last value for the millisecond count and the saved cycle counter and the current cycle counter, and uses the difference between cycle counters to determine the current time, so it should continue operating even when interrupts are disabled. It will be accurate for around 1 minute, until the cycle counter overflows.

1 Like

I think we are agreeing. The slowdown you’re seeing is because you set prev_clk after the show function rather than before, so your timing also includes the time it takes to setup the LEDs which is significant.

If you set prev_clk immediately after reading millis()/micros(), then you’ll find the timing is much more accurate.

EDIT: Oh I see you did that in the 2nd post. And this still doesn’t work when using micros()?

Yes, sorry for the confusion, that was the bug I introduced when paring down my code to a simple example, but one I fixed in the second post. The net result is still the same.

Here’s a bit of code to illustrate my point. It posts the values of both functions after 5ms delays, with and without interrupts enabled:

What prints out is the following (micros, millis) values:

1, 0
5618, 6  // Time lost between here
6976, 8  // and here with both fns.
13464, 14

When interrupts are disabled for that second delay, both functions are inaccurate. Apologies for harping on this - just want to get the satisfaction of hearing someone else say it is a problem with the library.

EDIT: Here’s what you see without interrupts disabled at all:

0, 0
5617, 6
11537, 12
17659, 18
1 Like

No probs, I’d like to get to the bottom of this too since it’s not in line with my understanding of how the system should function.

Which platform (Photon/Core) and firmware version are you using?

Thanks, I appreciate it! I’m using a Photon running v0.4.5.

This behavior actually makes sense to me given the dependence on the millisecond counter. If the millisecond counter is inaccurate, then the microsecond counter is only (relatively) accurate so long as the millisecond counter isn’t ticked or it overflows. Here’s another gist showing how the micros counter is basically accurate (thanks to counting cycles like you said) until you re-enable interrupts, at which point it “jumps” to fall back to the millisecond counter:

The output is:

0, 0
5617, 0  // micros() is good since it's using cycles
978, 1   // micros() is ??? since it falls back to millisecond counter

I’m kind of surprised the system doesn’t catch up once interrupts are re-enabled. But then they aren’t queued, just thrown away, so we lose those ticks forever.

We could have millis() look at the micros() value and extrapolate elapsed time from that. But I’m not sure it’s worth putting time into sinceI think disabling interrupts is a bad idea, and surely there is a better alternative waiting to be found?

That’s an interesting idea but you’re right, it’s probably not worth your time to fix, especially with the existence of a quick workaround. Under the not-too-horrible assumption that I’m losing a fixed amount of time every data output, it’s easy enough to add that time back in whenever I check the clock. That said, I’m not so sure if this will still be a fix (or even an issue!) once IntervalTimer is working with the Photon.

There are valid reasons to disable interrupts for measurable (but not significant in the big scheme of things), so this should be made to work.

1 Like

So you want the timer to keep ticking when the source of ticks is disabled. Sure, that’s easy.

What are the legit reasons for disabling interrupts? The only case I can think of is timing-sensitive code, and the solution there is to put that code itself in a high-priority interrupt so that it executes without interruption while still letting the other system interrupts be queued rather than dropped.

There’s all kinds of reasons, and all kinds of example code out there.

Mainline code that just does timing via nops seems pretty common in arduino sketches, they just want to make sure that they delay for < n > nops, no longer. Mutex against ISRs is another.

reiterating?

Interrupts should never be disabled for more than about 5 or 10 microseconds.
USERS’ ISRs need to run fast (say, a few microseconds or maybe 50 at most), then exit.

If the systick interrupt request occurs while interrupts are disabled for a few microseconds - that’s fine. The interrupt will be taken when reenabled. But obviously, if disabled for one systick interval (e.g., 1 mSec), that interval is lost. There’s no way to stack up interrupt requests, i.e., 2+ timer interrupts.

1 Like

Agreed. Lost interrupts are bad, but masking interrupts for brief, critical sections is legitimate.

yes, and a brief disable of ALL interrupts probably won’t cause a lost 1mSec systick interrupt. So millis() and microseconds() return values will be unaffected. Again, best practice is to disable no more than about 10uSec, usually far less.

micros() is (supposed to) be derived from the 1mSec systick timer count by a method like interpolation.