Great, thanks for trying - I’m interested to hear how it works for you!
That’s correct, no interrupt masking. It’s only the code path that actually tries to acquire the bus that’s affected. There are a few other states where the interrupt handler is just changing state variables, and this can proceed unhindered.
The meat of it is here - it tries to acquire the bus, and if successful changes state and starts the spi read as before. If unsuccessful, it enters a new state to flag the request as pending. When the spi bus is released, the handle_spi_request() (just a few lines up) function is called to check and service any outstanding request.
@bko, @zachary I just ended with the above step and agree regarding the disable/enable of interrupts within the get|set_socket_active_status() call. It seems odd and it is not part of the original TI code.
I raised some eyebrows when I saw that code too. Turning on/off interrupts is heavy-handed here. The get_socket_status and set_socket_status can be done without turning off interrupts by using atomic operations such as compare and swap.
And if the socket_status variable is really shared between threads, then it at least should be declared as volatile, to be sure the value is always fetched, otherwise subsequent fetches can be optimized away by the compiler.
OK - 49 of 50 iterations passed with the vanilla @mdma code. I’m not too concerned about the single failure at this point.
I am now testing a hybrid which obviates the need for handle_spi_request() and simply masks CC3000 IRQ before acquiring the lock. The IRQ masking code is just a proof of concept hack, because the stm32f10xexti.c functions are obfusticated, over complex and essentially useless.
Hybrid code passed 50 of 50 iterations so far. I will leave it running tests overnight.
Proof of concept code is in github changes are limited to core-common-lib. Commit message contains preliminary TODO list before this is anywhere near ready for consideration for prime time.
Apologies in advance if you have to manually merge/hack stuff because the baseline for this branch causes you build headaches.
@mdma, please note I did not pull over some of the additional atomic changes you had made in your repository.
I invite testing, especially with stress tests that simultaneously access the WiFi interface and the external flash. Post results here.
@AndyW, I’m surprised you got a test failure - it ran successfully for me with 1000s of iterations. Can you share your test code so I can try?
What’s the benefit of masking the interrupt and removing the call to handle_spi_request()? That’s then preventing irq code executing that can safely execute unhindered since it’s not touching the spi bus. You could disable the irq handler only if the state is IDLE, which is when it uses the bus, but then we are getting into a very tight coupling of the irq behaviour and the arbitration logic. The handle_spi_request() function takes care of this, and doesn’t couple the arbitration logic to the irq behaviour.
That shouldn't be necessary. It's not using anything from stdint.h - just the standard data type int. I just checked out the code and it compiles as is. Which line is failing for you? Could you try again - I just did a push - not sure if there were unpushed changes, but at least now I know we are seeing the same thing. This might also explain the test failure if it wasn't the latest code.
It was the int_least8_t definition that was required by stdatomic.h that was tripping this up, for my toolchain (gcc version 4.8.4 20140526 (release) [ARM/embedded-4_8-branch revision 211358]) this required stdint.h to compile.
As I said, I’m not too concerned about the single failure (my testing of the hybrid model also exhibited similar issues at approx 1% rate) - I think that is an artifact of the testbed, and not really related.
#
# Tune DEVICE/TOKEN/FIRMWARE with the correct values for your situation.
#
TOKEN=xxxxxxxxxx
DEVICE=yyyyyyyyyy
FIRMWARE=$HOME/spark/vanilla/core-firmware/build/core-firmware.bin
COUNT=0
PASS=0
FAIL=0
while true
do
COUNT=$((COUNT+1))
curl --max-time 5 https://api.spark.io/v1/devices/$DEVICE?access_token=$TOKEN
curl --max-time 5 -H "Authorization: Bearer $TOKEN" https://api.spark.io/v1/devices/$DEVICE -X PUT -F file=@$FIRMWARE
if [ $? -ne 0 ]
then
FAIL=$((FAIL+1))
else
PASS=$((PASS+1))
fi
echo
echo $COUNT/$PASS/$FAIL
sleep 60
done
WRT IRQ masking - there’s always more than one way to skin a cat. I’m used to masking/unmasking interrupts to avoid collisions, rather than setting flags and calling manually duplicated code from some other place. At this point, however we do it, we’re in sow’s ear/silk purse territory.
The flags are just a new state in the state machine which is already part of the cc3000 spi implementation, so to my mind it seems a natural direction to go in. And of course, I can refactor the duplicated code for the production version of this - I kept the duplicate so it was easier to read and understand what’s happening.
Using a mutex will also work in cases where someone wants to pull data from the network on the main thread and write that to flash, while pulling it out of flash on a timer interrupt (e.g. to render as audio) - there’s been talk of this in implementing streamed audio playback/recording. Accessing flash on an interrupt wouldn’t be possible if the spi bus is managed with interrupt masking, since you don’t know in general which interrupts to mask!
The mutex solves this simply, and would allow flash access from both the main thread and on an interrupt. Furthermore, the same general approach with mutexes can be applied to secure any shared resources that we want concurrent access to. (This was my thought when I heard @satishgn was making the code multi-threaded with user and system code running in separate threads - there was no concurrency control in the present system, so this wouldn’t work without taking steps to apply arbitration to all shared resources.)
Did a quick run of 12 tests, first one failed (maybe a fluke), and one didn’t seem to get the updatebegin (didn’t turn purple), but the rest passed. so 10 passes out of 12 tests.
Thanks. If I understand you correctly you mean this will allow updating the firmware if OTA is broken (which it is for me still).
If that’s the case I think (for me at least) it’s easier just to use ST-Link JTAG. Pretty easy and seems to always work.
But before putting much more time into development of our product we need to know that the OTA issues will be fixed, otherwise the whole thing is a no go, as we need OTA.