@MrRocketman, I agree with @harrisonhjones here.
With some more background info about the intended use, we might be able to give some better advice.
I guess you want to rotate the carry flag, set by the compare operation (for clarity I would use cmps, to indicate you are directly interested in the flags, instead) into bit 31 of sendbyte. I don’t know of a “asm-quick” C way to code this off the top of my head.
One thing I’m not sure about is your use of cp in your second example - I guess it’s a typo.
And since you want to write your result back to its original variable, you could try this
Yes, please try it with C first. The clock rate for the core is much faster than an Arduino, so when your loop() is not distracted by cloud/wifi/internet stuff, the straight-line execution speed of the CPU and the efficiency of the GCC-ARM compiler should make hand crafted assembly like this unneeded except for the most extreme cases.
@bko, it’s a pitty that we all have to assume things since we don’t get more info of @MrRocketman
Your assumtion that we are dealing with uint8_t is logical, given the wording sendbyte and that the original code comes from an 8bit library.
I did miss that with my answer. I was focused too much on the asm rrx which only works as expected when it’s dealing with 32bit (I’m not aware of a byte version like rrxb).
So even in asm you’d have to “rebuild” the functionality of an 8bit rrx and so - I’d guess - you’ll end up with no speed gain by using asm whatsoever.
While your code does what @MrRocketman 's pseudo code asks for, interpreting the (not working) asm code I’d come up with this
sendbyte >>= 1; // rrx would always right shift by 1
if (pwmCalue > counter) // if cmp had set the carry flag (counter-pwmval->carry)
sendbyte |= 0x80; // it would need to be "inserted" into bit 7
More details: I tried running with C which was an exclellent suggestion! Turns out the hard fault is due to something else that I have yet to track down. So that’s good news sort of.
So this bit of code comes from the ShiftPWM library. I have heavily modified it for my purposes which is for dimming AC lights. I have a zero cross circuit, and opto isolated triacs connected to a shift register. I have everything working great on my little arduino.
Yes the value I was sending in as sendbyte is a byte. I have a byte array called pwmValues. 1 byte per strand of lights I’m controlling. So I have a timer interrupt running at 120Hz * 255 brightness levels. = 30,000 + times a second. Then I have a counter that counts from 255 down to 0 in this interrupt which I pass in as ‘counter’. So the purpose of the asm is to check each strand of lights pwmValue to see if it’s time to turn on. Store the result in sendbyte, and repeat for the other 7 channels of the shift register. Then start over for the next shift register. So with 32 strings of lights (4 shift registers) this code could be running 1 million times a second.
I definitely wasn’t taking into account the 32bit nature of the Spark. That’s got to be the problem I was seeing.
Thank you everyone for your feedback and ideas! I will look into changing things to be 32 bit compatible and report back.
That’s the code I’m trying to replicate in asm. sendbyte is the byte that gets sent over SPI to the shift register. This code gets called 8 times with the same sendbyte variable. But with each different string of light’s ‘pwmValue’. The goal is to build the byte for each shift register as fast as possible.
Not sure what that means! The interrupt runs at 30KHz or so (about 33us per interrupt). You most likely calculate all channels on each pass of the interrupt. In the ISR you service 32 count values, 32 compare/bit set operations, 4 byte values for the shift registers and I assume you write out those bytes to the SPI on each interrupt (so 4 SPI writes). I am also assuming you are clocking the SPI at DIV1 for max speed (36 MHz if I recall).
So I can see why optimizing the compare/bit set operation is important. It might be interesting to use some tight C++ code to see how the compiler optimizes things. Knowing you have 4 bytes of shift registers, you could do all your shift calculations with a 32-bit var and then write out each byte to the SPI
I don’t know what optimization level the web gui compiles to but -O4 optimization out of the compiler, with possible re-swizzling of the C code to make sure the optimizer has a clear shot, should be able to achieve almost max speed.
In test.txt you can see the C code along with its asm representation after build.
Alternative optimizer settings would be -O2,-O3 and -O4 however you like it, but it would be good tp know what WebIDE usually does.
@MrRocketman, one thing you have to know with this community - it never sleeps and these guys are rocket fast, so once you ask your question, don’t be surprised if you got your answer before you pressed Create Topic or Reply
Is there really a -O4 optimization flag? I thought -O3 was the max.
The web ide and local builds both use -Os optimization flag to optimize for size so the binary will actually fit on the core - without that the binary is much much larger.
Fortunately, you can instruct the compiler to optimize individual functions for speed like this:
// first declare the function - with optimization attribute
// then the function body
I too agree with comments that you really rarely need asm - gcc does a fantastic job of optimizing code. And as always try to avoid premature optimization! Profile first, then write asm if it’s truly needed.
Okay, I think you are trying to port my ShiftPWM library.
That trick in assembly works on AVR, but I do not know enough about the Spark Core’s ALU to give you the spark equivalent. At least you would have to account for the Spark being 32 bits instead of 8.
The original code is here:
The first line of assembly fetches a duty cycle setting from memory (2 clock cycles)
The second line compares the duty cycle with a counter. (1 clock cycle).
The result of this will be stored in the carry on AVR.
With a rotate over carry instruction, the compare result is shifted into a byte. (1 clock cycle).
Do that 8 times, and you’ll have 8 pin states ready in a byte to send out over SPI.
With for loop overhead, this takes about 5 clock cycles per pin to update all pins. With hardware SPI the SPI transfer can overlap with calculating the pin states. This gives very fast software PWM.
ShiftPWM supported hardware SPI and bitbanging ports (software SPI). Paul Stofregen (the man behind Teensy) ported bitbanging to the 32-bit teensy 3, see this pull request:
That might be a better starting point in porting this to Spark.
I have stopped supporting ShiftPWM and selling the boards because I am way to busy with BrewPi.
So I’m thinking I’ll just use normal C. Although the asm does seem to work if I shift the result over 24 bits. The problem now is with my interrupt timer. Forgive me if this is dumb but I’ve been hunting down the issue between this timer/asm code for the past 2 days to no avail.
So first off here is my code:
So I’ve narrowed down the hard fault I’m getting to be whenever I call my *(–tmpPWMValues) within my timer interrupt handler (The one that happens ~ 30kHz). So lines 663 for asm or 677 for the C version. So I messed around and moved the same code up to my hardware zero cross interrupt (line 639) and it works fine (which happens ~120Hz). So to me this means it’s not some volatile issue but something else. I just can’t wrap my head around why the same code works in the hardware interrupt but not the timer interrupt.
I feel like this would be WAY easier if I had a JTAG debugger.
@MrRocketman, this is not related to your error but make sure you get the latest version of SparkIntervalTimer from the IDE or my repo. I made some minor but important changes to the interrupt priority level used by the timer ISR so it would not interfere with key firmware interrupts.