Getting started with Machine Learning on MCUs with TensorFlow


#1

Originally published at: https://blog.particle.io/2019/11/08/particle-machine-learning-101/

Over the past several months, I’ve been working with engineers at Google, Adafruit, and our own stellar team at Particle to make machine learning possible on Particle devices.

Today I’m happy to report that, with the help of TensorFlow Lite for Microcontrollers, it’s now possible to perform fast ML inferencing — that is, making predictions on live data using pre-trained models — with Particle devices.

In this post, I’ll show you how to get started using TensorFlow Lite in your own projects. To follow along, you’ll need a Particle Photon or Electron or one of the new Gen 3 devices like an Argon, Boron, or Xenon. You’ll also need a USB cable or other power source, but that’s it.

What is machine learning?

First, let’s talk briefly about what machine learning (ML) is, in abstract terms. There are a lot of great resources available for learning the ins and outs of ML, so I won’t attempt to dive into the details here. I will, however, attempt to provide a simple definition by highlighting how ML differs from traditional, algorithmic programming.

In a traditional computer program, when you want to perform a calculation, you craft an algorithm in code that provides instructions for obtaining the output you want from an input. For instance, imagine that you’re trying to determine the y value of a straight line given some x value. Assuming you have a few other pieces of data, you can do so using the straight line equationy = mx + b. In C code on an MCU, you’d express that in the following fashion:

float m = 2;
float b = 1;

float y, x;

x = 1;
y = m * x + b; // y is 3

x = 22;
y = m * x + b; // y is 45

When executing an algorithm, you’re providing inputs, the algorithm, and asking the machine to calculate the outputs, or result. In other words, your instructions are explicit.

But what happens if you don’t have all of the inputs you need, or what if you don’t even know the algorithm? For complex problems in the world of data science, you often have a bunch of input data and associated outputs. You know there’s a correlation between these, but the dataset is too complex to work it out on your own.

Enter machine learning. ML is a subfield of Artificial Intelligence (AI) that is focused on instructing a machine to infer the algorithm for a particular problem given only a set of inputs and outputs. You’ll still program the machine — which is why I like to refer to ML as human teaching — but instead of being explicit, you provide implicit inputs and outputs, and ask the machine to calculate the best correlation between the two. This activity is called training.

After the training, when you provide a new input that the machine hasn’t seen yet, you ask it to give you an output. This is called inference because the machine is inferring a result not based on an algorithm, but its own internal measure of the correlation between sets of data.

For instance, let’s assume you don’t have a straight line equation to help determine y values given an input and a few other variables. In an ML use-case, you might only have a few arrays in a Python program.

x_vals = [-1, 0, 1, 2, 3, 4]
y_vals = [-3, -1, 1, 3, 5, 7]

With the help of ML, you can train a model that can accept any unknown x, and infer a corresponding y value based on what the model learned from the input data above. The infer function below is an example placeholder for functionality that you’ll cover in a little bit.

x_val = 0.7148662

y_val = infer(x_val) # Outputs 0.5588859

The mechanics by which a machine is trained, the approaches it uses, and more, are a subject too broad for this post.

If you want to learn more about machine learning in general, and Deep Learning, in particular, I recommend the book Grokking Deep Learning, by Andrew Trask.

Note:The important thing to remember is that traditional approaches are explicit and algorithmic, where ML is implicit and based on finding correlations between sets of data.

So what does this have to do with Microcontrollers?

Everything, actually! While it is true that some of what you can do on MCUs is about manipulating data with known equations — for instance, mapping an analog input value to a percentage range — much of the work happening in the IoT is about making sense of large, complex sets of sensor data gathered in real-time. Imagine an accelerometer capturing the movement of a machine, or an electret microphone listening for an alarm. In these cases, you often cannot obtain an answer from our inputs with a simple equation.

Traditionally, this problem is solved by training ML models on powerful hardware, deploying those models on cloud infrastructure, and backhauling sensor data to the cloud for inference and decision-making.

But what if you could perform inference and make decisions closer to where your data is being collected? What if your MCUs could make sense of the data they collect and take action without ever needing to backhaul a single byte?

TensorFlow Lite makes this possible, and you can use the library yourself on Particle devices right now. Let’s dig into how.

Building an MCU-friendly model with TensorFlow

For the rest of this post, I’m going to walk through an end-to-end example of performing ML inferencing with TensorFlow Lite. I’ll be using the “toy” linear equation model discussed above. In a future post, I’ll cover a more complex use case with real sensor data. For now, this model is simple enough to allow you to explore the process — from model training to MCU execution — in a single post.

I’ve hinted at this already, but it’s important to point out here that TensorFlow Lite is a framework for performing real-time prediction (inference) on a pre-trained model on MCUs. The work of training and fine-tuning a model is still very much the domain of desktops, GPUs, and cloud servers. As such, the first step you need to take is to either find a pre-trained model that fits your use case, or train one yourself using a desktop, server, or cloud service.

For this example, I’ll use Google Colaboratory, a hosted, free, Python-based Jupyter notebook environment. This notebook in particular provides all of the steps needed to build a simple linear regression model for our straight line equation.

x = [-1, 0, 1, 2, 3, 4]
y = [-3, -1, 1, 3, 5, 7]

model = tf.keras.models.Sequential([
tf.keras.layers.Dense(units=1, input_shape=[1])
])
model.compile(optimizer=‘sgd’, loss=‘mean_squared_error’)
model.fit(x, y, epochs=200, verbose=1)

This is the entire model, using the same x and y values I showed you before. Using TensorFlow’s support for Keras, you’ll create a single-layer sequential model, compile the model with some instructions for finding the best correlation between the data sets, and then fit the model. The fit process performs internal calculations to determine a correlation between x and y and repeats that for the number of epochs you specify, 200 in this case. If you’ve set your model up properly, it will get better each time through.

When you run this in Colab or locally in a Python environment, TensorFlow will output its progress.

Train on 6 samples
Epoch 1/200
6/6 [==============================] - 1s 222ms/sample - loss: 7.8501
Epoch 2/200
6/6 [==============================] - 0s 3ms/sample - loss: 6.3725
Epoch 3/200
6/6 [==============================] - 0s 2ms/sample - loss: 5.2060
Epoch 4/200
6/6 [==============================] - 0s 2ms/sample - loss: 4.2843
Epoch 5/200
6/6 [==============================] - 0s 2ms/sample - loss: 3.5553
…
Epoch 196/200
6/6 [==============================] - 0s 3ms/sample - loss: 0.0178
Epoch 197/200
6/6 [==============================] - 0s 3ms/sample - loss: 0.0174
Epoch 198/200
6/6 [==============================] - 0s 3ms/sample - loss: 0.0171
Epoch 199/200
6/6 [==============================] - 0s 2ms/sample - loss: 0.0167
Epoch 200/200
6/6 [==============================] - 0s 3ms/sample - loss: 0.0164

Notice that the loss value decreases with each epoch. Loss is a measure of how far off from right the model was in a given run, so lower is better.

Once you have a trained model, the next step is to convert that model into something that TensorFlow Lite can work with. TensorFlow proper supports a number of different model file formats, while TensorFlow Lite supports only one, the tflite FlatBuffer format, which is optimized for size and thus perfect for constrained devices.

export_dir = 'saved_model/'
tf.saved_model.save(model, export_dir)

Convert the model to an in-memory object

converter = tf.lite.TFLiteConverter.from_saved_model(export_dir)
tflite_model = converter.convert()

Write to a file

tflite_model_file = pathlib.Path(‘linear_regression_model.tflite’)
tflite_model_file.write_bytes(tflite_model)

Once you have a TFLite model, you’ll need to convert it to a C array for use. Many MCU’s (including Particle devices) do not have native filesystem support, which means you’ll need to load your TFLite model into memory in order to use it. The recommended way to do this is to convert your model into a C array and compile it into your project.

On most operating systems, you can do this with the xxd command.

xxd -i linear_regression_model.tflite > linear_regression_model_data.cpp

That will yield something that looks like this:

unsigned char g_linear_regresion_model_data[] = {
    0x1c, 0x00, 0x00, 0x00, 0x54, 0x46, 0x4c, 0x33, 0x00, 0x00, 0x12, 0x00,
    0x1c, 0x00, 0x04, 0x00, 0x08, 0x00, 0x0c, 0x00, 0x10, 0x00, 0x14, 0x00,
    0x00, 0x00, 0x18, 0x00, 0x12, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00,
   // Many more lines here
};
unsigned int g_linear_regresion_model_data_len = 780;

Using TensorFlow Lite for Microcontrollers

With a model in hand, you’re ready to use it in your Particle projects. First, you’ll want to install the TensorFlowLite library in your project directory. Then, you’ll add the following includes from the library to our project source:

#include "tensorflow/lite/experimental/micro/kernels/all_ops_resolver.h"
#include "tensorflow/lite/experimental/micro/micro_error_reporter.h"
#include "tensorflow/lite/experimental/micro/micro_interpreter.h"
#include "tensorflow/lite/schema/schema_generated.h"
#include "tensorflow/lite/version.h"

Next, include the C array version of the model.

#include "linear_regression_model_data.cpp"

Then, you’ll set up some objects for logging and load the model into memory.

const tflite::Model *model = tflite::GetModel(g_linear_regresion_model_data);

if (model->version() != TFLITE_SCHEMA_VERSION)
{
error_reporter->Report(
"Model provided is schema version %d not equal "
“to supported version %d.”,
model->version(), TFLITE_SCHEMA_VERSION);

return;
}

Once the model is loaded, you need to configure some additional steps to load an operations resolver, instantiate an interpreter, and allocate tensors. I won’t duplicate those steps here, but if you’re interested in following along in code, check out the complete steps for the linear_regression example in the Particle TensorFlowLite library repository.

Invoking the model

Now for the fun-part: inference! Once triggered, the application performs inference 100 times, passing a random float x value between o and 1 into the model each time, invoking the model and obtaining the y result. After each run, the input (x) and output (y) values are logged to the serial console.

You’ll do this first by providing an x value to the input tensor of the model.

float x_val = randFloat(0, 1);

input->data.f[0] = x_val;

Then, you’ll run inference by calling the interpreter’s Invoke method and inspecting the result.

TfLiteStatus invoke_status = interpreter->Invoke();

if (invoke_status != kTfLiteOk)
{
error_reporter->Report(“Invoke failed on x_val: %f\n”,
static_cast<double>(x_val));
return;
}

If everything worked, you can grab the inferred y value from the output tensor.

float y_val = output->data.f[0];

Finally, you’ll log the x and y values to the Serial Console.

X Value: 0.14
Y Value: -0.52

X Value: 0.85
Y Value: 0.85

X Value: 0.71
Y Value: 0.57

X Value: 0.31
Y Value: -0.18

X Value: 0.96
Y Value: 1.07

When I was building this example, I added an Adafruit TFT display and plotted my straight line on the screen.

ML with Particle in action!

It’s so fast I captured a slow-motion version just so you can see the dots being drawn!

A slow-motion version of the linear regression model.

Note: The full source and instructions for this example can be found in the TensorFlowLite GitHub repository.

ML on MCUs is Here

Even with a toy example, it’s obvious that we’re closer to real “ML on the Edge” than ever before. Just imagine what you could create with some sensors and a trained model tied to your use case! In the next post, I’ll share one such case: using accelerometer data for gesture recognition.

In the meantime, check out the TensorFlowLite library and try some of the examples for yourself!


#2

Thank you for taking the time to write this! Impressive work and great read.
Gustavo.


#3

It certainly is impressive work and very well written. Thanks Will


#4

Thanks @armor and @gusgonnet!