Particle threads tutorial


#1

Particle threads tutorial

Threads allow concurrent execution of multiple bits of code. They’re popular in desktop operating systems like Windows and in languages like Java. Threads have limited support in the Particle platform, but exist.

Though the semantics are a bit different, you might use threads in the same way you would use separate processes in Unix as well.

Updated versions of this document can be found in Github.

Say no to threads

While this is a thread tutorial, in most cases you can get by without threads, and not using threads will make your life easier.

  • Most APIs are not thread safe
  • Limited memory and no virtual memory makes using threads impractical
  • There is no way to stop a thread once started

If you look through the Windows or Java APIs, it’s abundantly clear which API calls are thread-safe, because they are listed as MT-safe or not. The Particle APIs are generally not safe, but there’s no single reference as to what is safe.

Every thread must have its own stack, that’s how threads work. The problem is that there is only about 60 KB of free memory on a Photon or Electron. The stack is normally 6 KB. Having more than a few threads will eat up your memory in no time!

In Windows or Java, there is virtual memory so each thread can be allocated a 1 MB stack and not have to worry about running out of memory, even with a large number of threads.

Finally, it’s really a pain to debug unsafe thread code. It’s unpredictable and timing-sensitive. When you look for the bug it can stop happening.

Say yes to finite state machines

Finite state machines are a much better paradigm for memory and processor constrained devices like the Particle devices. There’s only one stack, and no need to worry about thread concurrency.

Platforms like node.js work in a single-threaded environment using finite state machines or chained callbacks. This is a better model and even though the Particle platform is C++ instead of Javascript, the model works the same way.

But I really want to use threads

OK. You’ve been warned. Here we go. This information is unofficial and subject to change.

Using Threads

A bit of background:

  • Threads are based on FreeRTOS (currently) but there is abstraction layer over it in case this changes.
  • Threads are preemptively scheduled.
  • A thread that yields will be called up to 1000 times per second (1 millisecond interval).
  • Most API calls are not thread safe.
  • Basic synchronization capabilities exist, including mutex, recursive mutex, and queues.
  • Most threads calls are not safe to use in an interrupt service routine. However you can use os_queue_put from an ISR.
  • The default worker thread stack size is 3K. (main loop is 6K bytes, and software timers are 1K).
  • Threads are not supported on the Spark Core.

How fast does a thread run?

Here’s a simple example of implementing a thread to increment a counter in a loop.

#include "Particle.h"

SYSTEM_THREAD(ENABLED);

void threadFunction(void);

Thread thread("testThread", threadFunction);

volatile int counter = 0;
unsigned long lastReport = 0;

void setup() {
	Serial.begin(9600);
}

void loop() {
	if (millis() - lastReport >= 1000) {
		lastReport = millis();

		Serial.printlnf("counter=%d", counter);
	}
}


void threadFunction(void) {
	while(true) {
		counter++;
	}
	// You must not return from the thread function
}

The counter value is incremented in a tight loop and printed once per second from the main loop and note how fast it increments! Each line of the log is 1 second.

counter=131861864
counter=141750524
counter=151636087
counter=161521997
counter=171402702
counter=181294657
counter=191172552
counter=201055887

With yield

A much better idea is to yield the CPU when you’re done instead of crazy looping like that. Here’s my modified threadFunction:

void threadFunction(void) {
	while(true) {
		counter++;

		os_thread_yield();
	}
	// You must not return from the thread function
}

Now the counter increments at a more sane rate, about 1000 calls per second, same as loop (on the Photon).

counter=3161
counter=4180
counter=4792
counter=5794
counter=6787
counter=7789

Periodic scheduled calls

It’s also possible to schedule periodic calls. In this case, we schedule the thread to execute every 10 milliseconds (100 times per second):

#include "Particle.h"

SYSTEM_THREAD(ENABLED);

void threadFunction(void *param);

Thread thread("testThread", threadFunction);

volatile int counter = 0;
unsigned long lastReport = 0;
system_tick_t lastThreadTime = 0;

void setup() {
	Serial.begin(9600);
}

void loop() {
	if (millis() - lastReport >= 1000) {
		lastReport = millis();

		Serial.printlnf("counter=%d", counter);
	}
}


void threadFunction(void *param) {
	while(true) {
		counter++;

		// Delay so we're called every 10 milliseconds (100 times per second)
		os_thread_delay_until(&lastThreadTime, 10);
	}
	// You must not return from the thread function
}

In the serial monitor you’ll also note how much more regular the counts are this way:

counter=300
counter=400
counter=500
counter=600

Synchronized access

System resources are not thread-safe and you must manually manage synchronization.

For example, the USB serial debug port (Serial) can only be called safely from multiple threads if you surround all accesses with WITH_LOCK(), as in:

#include "Particle.h"

SYSTEM_THREAD(ENABLED);

void threadFunction(void *param);

Thread thread("testThread", threadFunction);

volatile int counter = 0;
unsigned long lastReport = 0;
system_tick_t lastThreadTime = 0;

void setup() {
	Serial.begin(9600);
}

void loop() {
	if (millis() - lastReport >= 1000) {
		lastReport = millis();

		WITH_LOCK(Serial) {
			Serial.printlnf("counter=%d", counter);
		}
	}
}


void threadFunction(void *param) {
	while(true) {
		WITH_LOCK(Serial) {
			Serial.print(".");
		}
		counter++;

		// Delay so we're called every 100 milliseconds (10 times per second)
		os_thread_delay_until(&lastThreadTime, 100);
	}
	// You must not return from the thread function
}

Serial output:

.......counter=30
..........counter=40
..........counter=50
..........counter=60
..........counter=70
..........counter=80
..........counter=90
..........counter=100

Note that you must add WITH_LOCK in both your thread AND in the loop thread (and any software timers).

Note: The logging class, such as Log.info, is MT safe and you can call it from multiple threads without a lock. It’s much better to use that instead of directly writing to Serial.

Using a mutex to block a thread

One handy trick is to use a mutex to block your thread until something happens elsewhere. In this example, a SETUP/MODE button click handler can unblock the thread to make one run.

#include "Particle.h"

SYSTEM_THREAD(ENABLED);

void startupFunction();
void threadFunction(void *param);

// The mutex is initialized in startupFunction()
STARTUP(startupFunction());

Thread thread("testThread", threadFunction);

os_mutex_t mutex;


void buttonHandler();

void setup() {
	Serial.begin(9600);

	System.on(button_click, buttonHandler);
}

void loop() {
}

void buttonHandler() {
	// Release the thread mutex
	os_mutex_unlock(mutex);
}

// Note: threadFunction will be called before setup(), so you can't initialize the mutex there!
// STARTUP() is a good place to do it
void startupFunction() {
	// Create the mutex
	os_mutex_create(&mutex);

	// Initially lock it, so when the thread tries to lock it, it will block.
	// It's unlocked in buttonHandler()
	os_mutex_lock(mutex);
}

void threadFunction(void *param) {
	while(true) {
		// Block until unlocked by the buttonHandler
		os_mutex_lock(mutex);

		WITH_LOCK(Serial) {
			Serial.println("thread called!");
		}
	}
	// You must not return from the thread function
}

You should use a mutex instead a busy wait (testing for a condition in a while loop) whenever possible as mutexes are a fundamental and very efficient part of FreeRTOS. A thread blocked on a mutex doesn’t use any CPU.

Reading serial from a thread

One problem with the hardware UART serial is limited buffer size. One workaround for this is to read it from a thread. In this example it reads the USB serial just because it’s easier to test.

The thread reads data from the serial port and buffers it until it gets a full line. Then it makes a copy of the data and puts it in a queue. The queue is read out of loop(), but the serial port is continuously read even if main is blocked.

This is also handy on the Electron, as the main loop thread on the Electron is only called 100 times per second (vs. 1000 on the Photon).

#include "Particle.h"

SYSTEM_THREAD(ENABLED);

void threadFunction(void *param);

Thread thread("testThread", threadFunction);

// Instead of using STARTUP() another good way to initialize the queue is to use a lambda.
// setup() is too late.
os_queue_t queue = []() {
	os_queue_t q;
	// 20 is the maximum number of items in the queue.
	os_queue_create(&q, sizeof(void*), 20, 0);
	return q;
}();

system_tick_t lastThreadTime = 0;
char serialBuf[512];
size_t serialBufOffset = 0;

void setup() {
	Serial.begin(9600);
}

void loop() {
	// Try to take an item from the queue. First 0 is the amount of time to wait, 0 = don't wait.
	// Second 0 is the reserved value, always 0.
	char *s = 0;
	if (os_queue_take(queue, &s, 0, 0) == 0) {
		// We got a line of data by serial. Handle it here.
		// s is a copy of the data that must be freed when done.
		Serial.println(s);
		free(s);
	}
}


void threadFunction(void *param) {
	while(true) {
		while(Serial.available()) {
			char c = Serial.read();
			if (c == '\n') {
				// null terminate
				serialBuf[serialBufOffset] = 0;

				// Make a copy of the serialBuf
				char *s = strdup(serialBuf);
				if (s) {
					if (os_queue_put(queue, (void *)&s, 0, 0)) {
						// Failed to put into queue (queue full), discard the data
						free(s);
					}
				}

				// Clear buffer
				serialBufOffset = 0;
			}
			else
			if (serialBufOffset < (sizeof(serialBuf) - 1)) {
				// Add to buffer
				serialBuf[serialBufOffset++] = c;
			}
		}

		// Delay so we're called every 1 millisecond (1000 times per second)
		os_thread_delay_until(&lastThreadTime, 1);
	}
	// You must not return from the thread function
}

Thread pools

This example is a thread pool. Say you have an operation that takes a variable amount of time to run. You want to run these operations on one or more worker threads. The operations are put in a queue, so you can queue up operations until a thread is available to run it. The queueing operation is fast, so it won’t block the thread you call it from.

There’s more code to this in Github as the thread pool is implemented as a class in user firmware, not part of system firmware. However, this is how it’s used:

#include "Particle.h"

#include "ThreadPool.h"

SerialLogHandler logHandler;

// Create a pool of 2 threads and 10 call entries in the call queue
ThreadPool pool(2, 10);
volatile int lastCallNum = 0;


void buttonHandler();

void setup() {
	Serial.begin(9600);
	System.on(button_click, buttonHandler);
}

void loop() {
}

// This function is called when the SETUP/MODE button is pressed
void buttonHandler() {
	// When the button is pressed run a function that takes a random amount of time to complete, from 0 to 5 seconds.
	int callNum = lastCallNum++;

	// In 0.7.0 at least, Log.info from a system event handler doesn't do anything. You won't
	// see this log message.
	Log.info("thread call %d queued", callNum);

	// This is a C++11 lambda. The code in the {} block is executed later, in a separate thread.
	// It also has access to the callNum variable declared above.
	pool.callOnThread([callNum]() {
		// The code is this block run on a separate thread. You'll see these log messages.
		int fakeRunTime = rand() % 5000;
		Log.info("thread call %d started fakeRunTime=%d", callNum, fakeRunTime);

		// You'd normally actually do something useful here other than delay. This
		// is to simulate some tasks that takes a variable amount of time.
		delay(fakeRunTime);

		Log.info("thread call %d done", callNum);
	});
}

The important part is pool.callOnThread. This queues up a call and the code within the {} block is executed later.

Here’s a sample output:

0000007325 [app] INFO: thread call 0 started fakeRunTime=933
0000008258 [app] INFO: thread call 0 done
0000117385 [app] INFO: thread call 1 started fakeRunTime=2743
0000120128 [app] INFO: thread call 1 done
0000120275 [app] INFO: thread call 2 started fakeRunTime=1262
0000120585 [app] INFO: thread call 3 started fakeRunTime=1529
0000121537 [app] INFO: thread call 2 done
0000121537 [app] INFO: thread call 4 started fakeRunTime=4700
0000122114 [app] INFO: thread call 3 done
0000122114 [app] INFO: thread call 5 started fakeRunTime=508
0000122622 [app] INFO: thread call 5 done
0000126237 [app] INFO: thread call 4 done

Threaded TCPClient

In the asynctcpclient project, threads are used to make the connect() method of the TCPClient class asynchronous.

More Details

You can find more documentation in the source.

Note that if you are browsing the concurrent_hal not all functions are exported to user firmware. In particular, you cannot use these functions from user firmware:

  • os_condition_variable
  • os_semaphore

Particle.connect() blocking main loop permanently, even with SYSTEM_THREAD(ENABLED)
#2

Great tutorial, thank you very much.

I have a some questions:

  • I was not aware that the main loop thread on the Electron is only called 100 times per second, why is this?
  • Is this also true for SYSTEM_THREAD(ENABLED)
  • With SYSTEM_THREAD(ENABLED) I think there is already a queue on the application thread wrapped by the ActiveObjectCurrentThreadQueue and helper macros like APPLICATION_THREAD_CONTEXT_ASYNC can/should user firmware code use them or should it create its own queue for stuff it would like to delegate/send to application thread.

#3

The Electron loop() only runs 100 times per second vs. 1000 on the Photon when you are not using SYSTEM_THREAD(ENABLED).

However, when you have SYSTEM_THREAD(ENABLED), it runs really fast, using all excess CPU, at least in 0.7.0 on the Electron.

I’d probably just use the macros like APPLICATION_THREAD_CONTEXT_ASYNC in system_threading.h. I should probably add those to the tutorial as I used the queue directly mainly to illustrate the queue functions, not as the best practice to defer to application thread. Good idea.

EDIT: I don’t think you can use APPLICATION_THREAD_CONTEXT_ASYNC from user firmware. Particle.h doesn’t include system_threading.h, and even if you include it manually, ApplicationThread is not exported. So I do think you need to use your own queue or other method.


#4

First of all, thanks for this, some great stuff in one place.

One note that might be useful to some folks - one can declare the thread before initializing it, giving an opportunity to set flags, initialize variables and queues in setup() or anywhere else as long as you do it before you initialize the thread. I don’t know of any disadvantage of this approach, though would gladly be corrected if it’s bad practice. An implementation might look like:

// declarations up top
Thread myThread;
void myThreadFunction(void *param);


setup()
{
  // do some init stuff that myThread will use

  myThread = Thread("myThreadName", myThreadFunction);   // starts the thread
}

Unrelated Question:
Any advantage to using os_thread_delay_until(); instead of delay();?

I’m guessing with the normal delay it still visits the thread regularly even if it doesn’t continue execution of lines of code?


#5

delay() on the Electron just called vTaskDelay so they should be equally efficient. On the Photon it calls into WICED, but it probably works the same.

The main difference is that os_thread_delay_until takes care of the math to keep the period constant.

The tip to delay start of the thread until setup is good. However, it’s also currently broken in 0.7.0 so you may not want to use it right now.


#6

The tip to delay start of the thread until setup is good. However, it’s also currently broken in 0.7.0 so you may not want to use it right now.

Ah, well that explains why my firmware hard faults on 0.7.0 (at least partly)! I didn’t bother debugging it, just stuck with 0.6.4. I have a feeling you just saved me a lot of future debugging time… thanks!

I suppose also that as long as the initialization code you need to run is practically scoped only to that thread you can also just put your init code in the body of the thread function, but before the loop portion of the thread, eg:

void threadFunction(void *param) {

    // do your init stuff here
    // 
    // 

	while(true) {

		// do the normal thread stuff here
        // 


		// Delay so we're called every 1 millisecond (1000 times per second)
		os_thread_delay_until(&lastThreadTime, 1);
	}
	// You must not return from the thread function
}

#7

5 posts were split to a new topic: Loop not getting called when using SYSTEM_THREAD(ENABLED)


#8

Hello! This is a fantastic tutorial. I was reading up on priorities in the FreeRTOS documentation and was wondering what the OS_THREAD_PRIORITY_DEFAULT is and does.


#9

I’m very new to Threads so bear with me for this question on fundamentals. In the first block of code, under How fast does a thread run? with the incrementing a counter example, could you explain the order of execution of the setup, loop, and threadFunction? I’m assuming first, setup runs. Then, does the thread function run after every time the loop runs? Do they go back and forth in running? Thanks.


#10

Threads and functions are different concepts.
When you take setup() and loop() they are functions which run on the so-called application thread, with SYSTEM_THREAD(ENABLE) you have one other (separate) thread for the system functions to be run on.
Both these threads run independently virtually along side each other, so you never know what function is “in execution” on the one thread while you have a particular function running on the other. That’s where synchronisation objects are needed, to “correlate” the to threads again.

If you then happen to open a new thread with its own “driving” function, you’ll just have another independent thread running along side the others.

So to your question about order of execution (assuming SYSTEM_THREAD(ENABLED))

  1. All object constructors are executed
  2. System thread is initialised and starts running its functions
  3. Application thread is initialised and starts running its functions

While the system thread is doing its stuff the application thread first executes the STARTUP() functions (only once), followed by setup() (only once) and then keeps calling loop() and other regular functions like serialEvent() and the likes.

If you are starting seperate threads, they will be split of the application thread and then execute independently with the fixed 1ms time slices.


#11

Is there a way to run the thread function periodically (call it every 2 seconds or so) without looping forever within the thread function? Can the thread function be called externally?


#12

No, that’s not how the threads work.

If you want one thread for all timers, run sequentially, use the software timers. There will only be one thread, and it will call each timer in sequence. The downside is that a mis-behaved timer can stop all the others from executing.

The flip side is one thread per timer, using os_thread_delay_until. There’s no way they can interfere with each other (unless they disable interrupts), but there’s a lot of overhead.