Sounds good if it can be done in very few cycles. How would you do it?
something like
millis() + DWT->CYCNT % (SYSTEM_US_TICKS*1000)
Although since 2^32 and (SYSTEM_US_TICKS*1000) are not common divisors, this wonāt be perfect in all cases.
An improvement is to save the value of DWT->CYCNT at the start of each millisecond in the SysTick() handler.
This is rapidly spinning out into something much more complex, and probably more than we need for the original use case!
Did this get fixed in 0.4.3 to use the entire 32 bits before rolling over?
If not, the documentation still says:
Returns the number of microseconds since the device began running the current program. This number will overflow (go back to zero), after approximately 59.65 seconds.
Instead of "approximately 59.65 seconds" I suggest you put "exactly 59652323 microseconds (0 .. 59652322)". You need that number to do the math.
Hi @BDub,
I just re-read this post more carefully, and see that the DWT frequency for the Photon is different than for the core. Does that mean that the micros() rollover point is different too?
This is a real problem. It can be solved if you know exactly where the rollover point is by doing the right arithmetic. But now it looks like there may be different rollover points for micros() for the Photon and Core, and code needs to account for that.
You previous posts where you suggest:
I'm afraid that suggestion does not work. If I need to time an event that takes less than a millisecond, I can't use the millis() counter. But remember, that event might occur right near and span the 59.652323 second boundary (for the Core), so I need to take into account that exact rollover point to do the subtraction right to get the event's duration.
Also, looking at the code of micros() there is a divide in it. This is a bit unfortunate for those who would hope that such a call would be as efficient as possible.
And looking at the code for delayMicroseconds : It caps the maximum delay time to UINT_MAX / SYSTEM_US_TICKS, which is probably OK since who would use delayMicroseconds for such a long delay? But that should be documented. And, I still think that number is different for Photon seeing as that function uses SYSTEM_US_TICKS which will be different for Photon.
And yet another thing ... neither 72 or 120 divides evenly into 2^32. This means that due to the division's truncation, that last microsecond of timing will be different as it rolls over. It will be shorter. This might not seem like a big deal, but imagine a sensor that emits pulses every few milliseconds from 5 to 25 microseconds in duration that measures something. That means every so often, sort of randomly, the measurement might be 10% off or so. A user not knowing about this anomaly would be unable to explain this and probably suspect something was wrong with the sensor.
I think this problem is a big deal. This is why I've posted my second post to insist that this is either fixed, or this exact rollover points for the Core and Photon are published. Or, your very good suggestion of the getDWT() function be implemented and recommended in the documentation for doing precise timing.
@rvnash, I will leave the details of the micros() code to the pros but one goal of the new SparkIntervalTimer library will be to implement new timer functions including input capture. This mode allows precise timing of an input pulse using a hardware timer. It is too early yet on how and when everything will be implemented but it is absolutely on the hit list
@peekay123 I am very interested in that. For an RC project I did last year, I implemented such a thing on a Teensy3 but it had narrow scope to do servo in functionality, so I could read a 6-channel remote control and have my thingy do things in response. It worked well. Iām assuming that what you have in mind is a more generalized input capture that could be used for this. Let me know if you want help with ideas, coding, or testing.
@rvnash, once the official Photon firmware is released to the compile farm (IDE, CLI, DEV) and auto-update is deployed then I will be releasing a quick port of the existing SparkIntervalTimer for folks to port their existing libraries and code. Having a stable firmware is key. After that, I will be working on a new version of the library with all the features discussed. And yes, like the existing library, it will be generalized.
The goal is to a) Exploit all hardware timers on the STM32F205 even if they have no associated GPIO pins (perfect for timer interrupts only), b) Implement new timer functions like Input Capture and Output Capture (Tone does that). There is possibly more but we have to start somewhere!
Yay! I'm glad to see you digging into this @rvnash
If you read again I am saying if you want precise timing for something that's almost a minute long (>59.652323s micros() rollover), that you should use the millis() counter. If you need such dynamic range, <1ms ~ >59652ms you could check the micros() counter first and if you get to be over (10246464)us aka 0b10000000000000000000000 (a quick bit mask check), you could switch to testing against the millis() counter. If your very short even is happening asynchronously, then you need to use interrupts. Read on
Agreed this makes the result not super accurate, but you can roll your own precision counter pretty easily using interrupts and the DWT counter Stopwatch for Pinewood Derby - #17 by BDub
SYSTEM_US_TICKS is the number of ticks per microsecond, calculated as SystemCoreClock / 1000000. So it is calculated based on each system's core clock speed (72MHz vs 120MHz). You're right we should document this rollover number for both systems (the Core is done, Photon would need to be calculated). If you are doing the math and would like to submit a pull request to update the comments of that code, please do! Even if you just submit the information as a firmware/docs issue that would be sweet.
It would also be good to document this anomaly, which could be solved by rolling your own DWT/Interrupt timer, or Peekay's proposed timer capture mode.
@BDub good answers to all my questions except one. I'm still concerned about the issue of timing things that span the 59652323 boundary. Take your own code example from above, but substituting 50 for 5000.
If the first line happens to execute at a random time when micros() happens returns 59652322, the next while loop will never exit. This is because micros() will always be <= 59652322 and the subtraction will roll around in an unsigned way and be some very large positive number. Your code would hang randomly and intermittently. A terrible problem to have with what looks like such a simple line of code.
So the user would need to know that magic number exactly, and adopt it into the subtraction the way @gorsat suggests above. THAT is why the exact numbers need to be documented. And since this number is different from Photon, both numbers need to be documented. In fact, it would be a good idea if there were #defines for these numbers.
However, I still think micros() should be fixed to cover the entire 2^32 bits. The reason is that very simple code like this exists in peoples code all over the Arduino (and other devices) world. And none of it will work right unless you know precisely when it will rollover.
On your suggestion that I contribute to the code and documentation ... I feel I'm still a bit of a newby in this world, give me a few months with my new toys.
Actually I got my own logic wrong here. It will exit almost immediately, not waiting the entire time. But still, the point is very anomalous behavior will result intermittently.
Yep, this is where the getDWT() would be very handy. Just add this to your code:
#define getDWT() (DWT->CYCCNT)
I agree Brett. In my code I will definitely do that. And my code will work.
But for every one of me who looked into this deeply and understands the need for this, I would guess there are 100 people who are just going to blindly use micros() without knowing about this rollover. Canāt you at least agree to help them out with documenting the exact rollover positions? I take it the fixing micros() to use a timer tuned to 1-click per microsec is off the table.
Iād love to make micros()
return a 32-bit value. However, for it do that weād need to have an actual microsecond counter. To my knowledge, thereās no simple way to continually update a 32-bit micros count from the DWT counter.
If anyone knows different, please chime in!
Fixing the micros() rollover was discussed in this issue.
EDIT: I know how to fix this! stay tuned!
Fixed. Hereās the PR: https://github.com/spark/firmware/pull/493, with the code changes and tests. I also added a rollover test for good measure.
I reviewed the part of the pull request with these code changes and made a small comment. (My first time ever using that feature of GitHub.) Anyway, excellent job! And thanks. I still donāt really like how much code is in that function for performance reasons, but that can be characterized. Iāll use DWT->CYCCNT when performance counts, but micros() will at least work correctly!
Thanks for sharing your thoughts on this. Any suggestions how we can trim the code? Better still would be to measure it to see if if is fast enough for what you have in mind, otherwise we may be falling into the trap of premature optimization.
I believe this will run faster and with more precision than the arduino version. Hereās a blog discussing the arduino implementation of micros()
- youāll see it involves a similar amount of code, plus the arduino implementation has a precision of only 4 microseconds.
Given that we are running on a MCU thatās at least an order of magnitude faster (due to clock speed and architecture), Iām hoping that performance will be more than fast enough. The only way to know is to try it and measure.
If it turns out you need it to go even faster, using DWT->CYCNT is there if you want very precise timing and the lowest amount of overhead.
Ideas for trimming are hard. That darned divide! I assume you have access to a manual that gives clock cycles for instructions. Question: Does the 16 bit divide instruction take many fewer cycles than the 32 bit? If so, hereās a crazy idea. You donāt really need all 32 bits of the CYCCNT. You end up subtracting off up to a milliseconds worth and dividing by 120. If you can ensure that you capture 1000 * 120 counts. You can do this by first dividing by 8, then you only need 1000 * 15 counts, which fits in 16 bits.
Make system_millis_clock
a 16 bit number and assign it like this
system_millis_clock = (unit16_t)(DWT->CYCCNT >> 3); // divide by 8
Then in the math of micros() do this
uint16_t cyccntDiv8 = (unit16_t)(DWT->CYCCNT >> 3);
uint16_t elapsed_since_millis = ((cyccntDiv8-base_clock) / 15); // divide by 15 more
return (base_millis * 1000) + elapsed_since_millis;
Ok, so there a bunch of tradeoffs here. On the plus side, the 32 bit division becomes a 16 bit division, and it uses 2 less bytes of RAM. On the minus side you have to incur the shift and cast every millisecond tick, you have an additional shift and cast during the micros call, and a cast up during the final addition in the return statement. Honestly, I donāt know which one would win. Itās probably best to try it out empirically.
The CORE code would need to divide by 9.
But you are right, the best case is this is a handful of cycles speed up. It may even be worse. You are right, in a world where we have 120 MHz micro-controllers (!) this kind of stuff is unnecessary.
Edit: after learning that the arm architecture here has a built in divide, my optimization would almost certainly make things worse. Iād delete this post but it leads Into the next post.
Thanks for the input there - I think we both agree that optimization isnāt required here, since we canāt tell how fast it will be just by looking at the source code. (Looking at the assembler produced is a different matter.)
Iām definitely a proponent of getting the design right first, then speed/memory optimizing when tests show that itās necessary, since optimizing code rarely makes code more readable or easier to maintain, so we must get a clear payback for spending the time implementing optimizations, and that includes determining they are necessary in the first place! With low-level optimization, code becomes more obfuscated or brittle, so itās important IMHO to be sure the speed increase is needed and worth the increased code complexity. For example, in this case, the original code works on the Core/Photon/Electron without modification, while the optimzations above would require different versions of the code for each platform.
(I know you know this @rvnash - Iām just writing it down for anyone else reading, since premature optimization is the root of all evil. )
buuutā¦having said all that, itās still interesting to dig into the details, and see if we can rebuke the instinctive reaction to the divide instruction about it not being fast enough.
On the STM32, the divide instruction is always 32-bits, and takes between 2-12 cycles, depending upon the number of leading/trailing zeros in the operands. The arduino code also has a divide instruction (which is also used in the arduino implementation of micros()), but since itās an 8-bit MCU the 32-bit divide has to be done using a block of code to implement 32-bit division in terms of the 8-bit div instruction. Iāve seen this in avr projects and the code is not trivial, increasing the program size by several hundred bytes iirc. So I think we are already hands-down much faster than the arduino ever would be even if it were running at 120MHz.
I guess Iām just trying to help everyone kick the habit of making performance judgements by looking at source code, and instead measure the performace before assuming itās bad!
I saw that. I didn't dig deeper to see perhaps if that divide could be resolved at compile or linker time by a smart compiler and linker.
But in any case my arduino days are behind me and I'm throwing my energies behind you guys.
Also, the rest of your points are spot on true.
It looks like I have hit this rollover error and was really foxed until I found this thread. However, I still dont know how to fix it. Do I need to upgrade the firmware, OR add a check to my code for rollover OR use the DWT ?
I am using interrupts (two) to measure the length of a pulse on two pins. Everything works fine until the before time is just below the rollover (2^32/120 usec on the Photon) = 35.79 seconds and the after time is after the rollover. Then I get a weird LARGE number for the duration.
Code snippet:
volatile unsigned long duration0; // volatile, we set this in the Interrupt and read it in loop so it must be declared volatile
volatile unsigned long duration1;
volatile unsigned long StartPeriod0 = 0; // set in the interrupt
volatile unsigned long EndPeriod0 = 0; // set in the interrupt
volatile unsigned long StartPeriod1 = 0; // set in the interrupt
volatile unsigned long EndPeriod1 = 0; // set in the interrupt
volatile boolean pulse0 = false; // set in the interrupt and read in the loop
volatile boolean pulse1 = false;
...
void loop() {
analogvalue = analogvalue + analogRead(0);
analogcount++;
if(pulse0)
{
lowpulseoccupancy0 = lowpulseoccupancy0+duration0;
Serial.print("Int0 ");
Serial.print(EndPeriod0);
Serial.print(",");
Serial.println(duration0);
pulse0 = false;
}
if(pulse1)
{
lowpulseoccupancy1 = lowpulseoccupancy1+duration1;
Serial.print("Int1 ");
Serial.print(EndPeriod1);
Serial.print(",");
Serial.println(duration1);
pulse1 = false;
}
....
void isrduration0()
{
// if the pin is low, its the start of an interrupt
if(digitalRead(D2) == LOW)
{
StartPeriod0 = micros();
}
else
{
// if the pin is high, its the rising edge of the pulse so now we can calculate the pulse duration by subtracting the
// start time from the current time returned by micros()
if(StartPeriod0 && (pulse0 == false))
{
EndPeriod0 = micros();
duration0 = EndPeriod0 - StartPeriod0;
StartPeriod0 = 0;
pulse0 = true;
}
}
}
void isrduration1()
{
// if the pin is low, its the start of an interrupt
if(digitalRead(D3) == LOW)
{
StartPeriod1 = micros();
}
else
{
// if the pin is high, its the rising edge of the pulse so now we can calculate the pulse duration by subtracting the
// start time from the current time returned by micros()
if(StartPeriod1 && (pulse1 == false))
{
EndPeriod1 = micros();
duration1 = EndPeriod1 - StartPeriod1;
StartPeriod1 = 0;
pulse1 = true;
}
}
}
Since all the variables are unsigned long, I had expected that the rollover would work. But here is a snippet to show that it does not. In the output below, the first number is the end of the pulse and the second is the duration all in microseconds.
Int0 35558244,143070
Int1 41172,4259334629 (WRONG!!)
Int0 73296,4259378494 (WRONG!!)
Int0 226766,60096
Appreciate your help.
A.J.
Thatās a good question, AJ. There is a pull request outstanding for this fix: https://github.com/spark/firmware/pull/493
But Iām not sure what that all means, how the various branches are organized, and what the procedure is inside Particle for fixes to get promoted up to the released software. I donāt even quite understand what it means to be āreleasedā in this world.