MUTEX behavior vs SINGLE_THREADED_BLOCK behave differently. I would think they should behave the same with the attached code

xenon
Tags: #<Tag:0x00007f038fb0fe70>

#1

Ran these tests on a Xenon with device os rc.27

I know, I know, @rickkas7 says that Threading is evil and we should be using FSM’s instead. :-). But I like threading, so here is my question.

I have 3 threads running at the same priority that are constantly toggling a different GPIO. The purpose is to show, using a digital logic analyzer, that the multitasking is doing a round robin between each thread every 1ms.

My initial code uses the SINGLE_THREADED_BLOCK to toggle the GPIO so that when the thread loses the cpu to another thread, its GPIO will always be low.

Here is the code:

SYSTEM_MODE(MANUAL);
SYSTEM_THREAD(ENABLED);

  uint8_t d4 = D4;
  uint8_t d5 = D5;
  uint8_t d6 = D6;

os_mutex_t myMutex;
os_thread_t thread1;
os_thread_t thread2;
os_thread_t thread3;

void myThread(void *arg) {
  uint8_t* value = (uint8_t*)arg;
  Serial.println("printing from thread, woo hoo");
  Serial.println(*value);

  while (1) {
   SINGLE_THREADED_BLOCK() {
//      os_mutex_lock(myMutex);
      digitalWrite(*value, HIGH);
      delayMicroseconds(20);
      digitalWrite(*value, LOW);
//      os_mutex_unlock(myMutex);
    }
  }
}

void setup() {
  Serial.begin(9600);
  waitFor(Serial.isConnected, 30000);

  pinMode(D4, OUTPUT);
  pinMode(D5, OUTPUT);
  pinMode(D6, OUTPUT);

  os_mutex_create(&myMutex);

  Serial.println("hello world!!!");
  os_thread_create(&thread1, "testThread1",OS_THREAD_PRIORITY_DEFAULT, myThread, &d4, 4096);
  os_thread_create(&thread2, "testThread2",OS_THREAD_PRIORITY_DEFAULT, myThread, &d5, 4096);
  os_thread_create(&thread3, "testThread3",OS_THREAD_PRIORITY_DEFAULT, myThread, &d6, 4096);
}

void loop() {

  delay(100000);

}

The visible behavior on the logic analyzer looks perfect.

Now if I change the code to use a MUTEX rather than the SINGLE_THREADED_BLOCK, the behavior is not round robin and become inconsistent.

SYSTEM_MODE(MANUAL);
SYSTEM_THREAD(ENABLED);

uint8_t d4 = D4;
uint8_t d5 = D5;
uint8_t d6 = D6;

os_mutex_t myMutex;
os_thread_t thread1;
os_thread_t thread2;
os_thread_t thread3;

void myThread(void *arg) {
  uint8_t* value = (uint8_t*)arg;
  Serial.println("printing from thread, woo hoo");
  Serial.println(*value);

  while (1) {
//   SINGLE_THREADED_BLOCK() {
      os_mutex_lock(myMutex);
      digitalWrite(*value, HIGH);
      delayMicroseconds(20);
      digitalWrite(*value, LOW);
      os_mutex_unlock(myMutex);
//    }
  }
}

void setup() {
  Serial.begin(9600);
  waitFor(Serial.isConnected, 30000);

  pinMode(D4, OUTPUT);
  pinMode(D5, OUTPUT);
  pinMode(D6, OUTPUT);

  os_mutex_create(&myMutex);

  os_thread_create(&thread1, "testThread1",OS_THREAD_PRIORITY_DEFAULT, myThread, &d4, 4096);
  os_thread_create(&thread2, "testThread2",OS_THREAD_PRIORITY_DEFAULT, myThread, &d5, 4096);
  os_thread_create(&thread3, "testThread3",OS_THREAD_PRIORITY_DEFAULT, myThread, &d6, 4096);
}

void loop() {

  delay(100000);

}

Here is the logic analyzer trace.

Am I misundertanding how the mutex works or does this look like a bug?


#2

I’m not positive, but I think you’re running into issues because you’re not explicitly yielding to other threads before looping. Add to the bottom of your while(1) block a:

os_thread_yield();

and see if that changes the behavior.

I’m thinking that os_mutex_unlock() rightly does not yield to other threads, so unless you os_thread_yield() you’re not providing enough opportunity to for the other threads to swap in.


#3

Hmm, interesting. My updated code is:

 while (1) {
//   SINGLE_THREADED_BLOCK() {
      os_mutex_lock(myMutex);
      digitalWrite(*value, HIGH);
      delayMicroseconds(20);
      digitalWrite(*value, LOW);
      os_mutex_unlock(myMutex);
      os_thread_yield();
//    }
  }

Now it is yielding to each thread on each iteration. Did I put it in the wrong place?


#4

I could be wrong but my understanding of os_thread_yield is really an opportunity for a thread to play nice and give control back to another thread early because it has nothing else to do. But with a pre-emptive scheduler, the system should not ever require that of the thread. If the thread holds the cpu for more than 1ms, it will be swapped out for another that that is eligible to run.


#5

AFAICT the SINGLE_THREADED_BLOCK postpones thread switching so when the time is up but the flow is still inside the block the respective thread keeps control for the time being, but as soon it leaves the block that thread will lose control.
With the mutex thread switching is not prevented, but the block that would gain control can’t as the mutex is still locked and hence has to yield control immediately.
In this case successful switching where the receiving thread can actually pick up control can only happen exactly while the mutex is unlocked.
A very slim chance, hence the rather long periodes where one thread can hog control.

https://docs.particle.io/reference/device-os/firmware/photon/#single_threaded_block-


#6

Ah @ScruffR. Interesting. I don’t think that if 3 threads are all waiting on the same mutex that when it frees up the others that are waiting shouldn’t be made eligible to run based on order of wait. More investigation tomorrow. :slight_smile:


#7

Yes, but are they?
Aren’t only two waiting and one active (most the time)?
You have 20+ microseconds with the locked mutex (compared to a few nanoseconds unlocked) in which the thread switch can occure and would just “bounce” back to the current thread due to the not free mutex and hence the waiting two threads yielding immediately.

Your mutex is only used between the three threads but the thread arbiter is happy to be unaware of its existance and just do its business as usual.
The mutex is not making the block between lock and unlock atomic hence also not preventing a switch.


#8

Yea, I mispoke. I realize that 1 thread has the mutex and the other two would be waiting. So I ran a test using the mbed rtos on an STM32 F303RE Nucleo board. Here is the code which I believe is essentially equivalent to what I ran on the Xenon.

#include <mbed.h>
#include <rtos.h>

DigitalOut dout1(PB_4);
DigitalOut dout2(PB_5);
DigitalOut dout3(PB_6);

DigitalOut led(LED2);
Mutex mutex;

Thread t1;
Thread t2;
Thread t3;

void toggleDigitalIO(DigitalOut *dout) {
  while (1) {
//    mutex.lock();
    *dout = 1;
    wait(0.00002);
    *dout = 0;
//    mutex.unlock();
  }
}

int main() {

  t1.start(callback(toggleDigitalIO, &dout1));
  t2.start(callback(toggleDigitalIO, &dout2));
  t3.start(callback(toggleDigitalIO, &dout3));

  while(1) {
    wait(osWaitForever);
  }
}

Without the mutex, I see this in the DLA which I would expect.

Now add the mutex in

#include <mbed.h>
#include <rtos.h>

DigitalOut dout1(PB_4);
DigitalOut dout2(PB_5);
DigitalOut dout3(PB_6);

DigitalOut led(LED2);
Mutex mutex;

Thread t1;
Thread t2;
Thread t3;

void toggleDigitalIO(DigitalOut *dout) {
  while (1) {
    mutex.lock();
    *dout = 1;
    wait(0.00002);
    *dout = 0;
    mutex.unlock();
  }
}

int main() {

  t1.start(callback(toggleDigitalIO, &dout1));
  t2.start(callback(toggleDigitalIO, &dout2));
  t3.start(callback(toggleDigitalIO, &dout3));

  while(1) {
    wait(osWaitForever);
  }
}

So not what I expected, but as I think through it this does make sense. t1 has the mutex, t2 and t3 are waiting. As soon as t1 releases the mutex, the RTOS is involved and gives tiime to t2 or t3, whoever is next. And so on in a round robin fashion. This is what I would expect on the Xenon. I would not expect the semantics of a mutex and threading using a pre-emptive RTOS be that different.

I can try out another test case today using FreeRTOS directly see what happens.


#9

I don’t think I’d count on round-robin in either case.

A mutex is just a lock. There doesn’t necessarily have to be a concept of someone else waiting for that lock. Even if 10 other thread are waiting a thread can still give up the lock and then immediately reclaim the lock. Actually this is a good thing as it avoids some potential issues (https://en.wikipedia.org/wiki/Lock_convoy).

It’s possible SINGLE_THREADED_BLOCK is only working because the threads are taking long enough that the OS does a context switch on exit, and internal to the scheduler is a queue. I don’t think that’s guaranteed though.

If you want to control the order that the threads are run in then I think you’re going to need to add something to enforce that. You could probably use a queue of some sort. From a brief reading of freeRTOS stuff maybe the task and task_notify stuff would work, where tasks enter that they’re waiting in to a queue.

Either way expecting round-robin if you haven’t enforced round-robin in some way seems unlikely to work.


#10

For fun I did some investigations in the OS code on what the os_mutex_lock code actualy does.

it looks like this

int os_mutex_lock(os_mutex_t mutex)
{
xSemaphoreTake(mutex, portMAX_DELAY);
return 0;
}

and os_mutex_unlock is implemented as

int os_mutex_unlock(os_mutex_t mutex)
{
xSemaphoreGive(mutex);
return 0;
}

xSemaphoreGive adds an object (the mutex) to a que and xSemaphoreTake checks to see if its an object available on the que. I cannot see in the FreeRTOS that xSemaphoreGive yields the control to another thread and thus as described above the chanse of having a context switch at the same time as the mutex is unlocked is very slim. SINGLE_THREADED_BLOCK works above since it encapsulates the use of the mutex, so the context switch s postponed by up to 20 us+ which it takes to execute the code within the SINGLE_THREADED_BLOCK.

mbed on the other hand seems to do a yield when the mutex is released and thus the behaviour looks like the one you got when you explicitly executed yield after releasing the mutex on the Xenon