Competing with TensorFlow Lite for Microcontrollers

Using the public magic wand dataset

For my first experiment, I used an official TensorFlow Lite MagicWand example.

For benchmarking, I used the trained TFLM model from the Arduino SDK and trained a Neuton model using the dataset in the mentioned repository.

Both models are validated on the same hold-out dataset and were tested on the same MCU (Arduino Nano 33 BLE Sense).

In the table below, you can see the resulting benchmarks. Models built with Neuton have advantages in several if not all aspects: accuracy, size, inference time, and most importantly, they are built automatically, hence also accelerating the time to market. Additionally, TensorFlow Lite for Microcontrollers (TFLM) and Neuton solutions were embedded into Arduino Nano 33 BLE Sense (CortexM4).

RESULTING BENCHMARKS
RESULTING BENCHMARKS


TensorFlow’s model was capable of recognizing only 3 classes: Wing, Ring, and Random, but it was not capable of recognizing Slope. On the other hand, Neuton successfully recognized all 4 classes. Below you can see the results of the experiments.

NUMBER OF CORRECTLY RECOGNIZED CLASSES
NUMBER OF CORRECTLY RECOGNIZED CLASSES

Collecting my own magic wand dataset

I decided to reproduce this experiment from the very beginning, namely from the data collection stage since in the previous experiment with the TF dataset, I could not control the data collection. The aim was to prove that the real-time accuracy on the device could be improved. I have used the Arduino Nano 33 BLE Sense for this experiment as well.

After training the Neuton model on the data I collected, I recorded the following real-time outcomes which showed great improvement in accuracy:

Step 1: Data Collection

The data collection included four classes: Wing, Slope, Ring, and Random class. Sensors reading were collected at 100 Hz frequency, with a sample duration of 2 seconds. So you get 200 rows of accelerometer sensor data for each sample. Check the instruction on how to perform the motion activity for each gesture.

NUMBER OF CORRECTLY RECOGNIZED CLASSES IN MY OWN MAGIC WAND DATASET

Procedure

The onboard accelerometer is the main sensor for this use case. The board is attached to the end of a stick/wand with a USB facing downwards, towards the place where you hold it so that the cable runs down the handle. Use sticky tape, a velcro strap, or some easy-to-remove method to attach the board and cable to the wand.

Step 1: Data Collection

The data collection included four classes: Wing, Slope, Ring, and Random class. Sensors reading were collected at 100 Hz frequency, with a sample duration of 2 seconds. So you get 200 rows of accelerometer sensor data for each sample. Check the instruction on how to perform the motion activity for each gesture.

Any other gestures apart from the above are treated as a random class. The codes for data collection are attached in the code section. Below is the summary of the collected training dataset.

In a similar way, collect validation dataset as well.

Step 2: Model Training

Upload your dataset on the Neuton platform and enable the following training settings:

Input data type: FLOAT32

Normalization type: Unified Scale for All Features

Window size: 2000ms

Frequency: 100 Hz

Feature Extraction Method: Digital Signal Preprocessing Neuton pipeline (enabled by default) which creates statistical features for each of the original features processing a window of 2000ms.

Feature Selection Method:

Bit depth: 8-bit

Once training is complete, you can check the metrics on the console. In my case, I got Accuracy: 0.977028.

From the confusion table, I can see that the model works really well across all target classes.

I got a model that weighs about 1 kb!

The total footprint of the solution embedded into the MCU is highlighted in the table below:

Step 3: Model Embedding and Inference

In this final step, I am going to embed the model to the device. Download your model as a Zip archive from the Neuton platform. Extract all the downloaded files and put them inside a folder named “src” in your project directory. It should look as shown below:

C:.
|   .DS_Store
|   MagicWandTestPP.ino
|
---src
    |   .DS_Store
    |   neuton.c
    |   neuton.h
    |   README.md
    |
    +---converted_models
    |       README.md
    |
    +---fe
    |   ---statistical
    |       |   Common.c
    |       |   Common.h
    |       |   DSP.h
    |       |   DSPF32.c
    |       |   DSPF32.h
    |       |
    |       ---fht
    |               fhtf32.c
    |               fhtf32.h
    |
    +---model
    |       dsp_config.h
    |       model.h
    |
    +---postprocessing
    |   ---blocks
    |       ---moving_average
    |               moving_average.c
    |               moving_average.h
    |
    +---preprocessing
    |   |   .DS_Store
    |   |
    |   ---blocks
    |       |   .DS_Store
    |       |
    |       +---normalize
    |       |       normalize.c
    |       |       normalize.h
    |       |
    |       ---timeseries
    |               timeseries.c
    |               timeseries.h
    |
    ---target_values
            multi_target_dict_csv.csv

To understand how the model will be functioning on the device, we will have a look at some important functions in the Arduino codes.

In the function, void OnDataReady(void *ctx, void *data), we pass the data buffer which contains the accelerometer sensor values for the defined window size using the callback neuton_model_window_size(). In this experiment, the window size is 2 seconds which means 200 rows of accelerometer data or 600 values of x, y and z acceleration axis to be fed into the neuton_model_set_inputs(inputs).To understand how the model will be functioning on the device, we will have a look at some important functions in the Arduino codes.

void OnDataReady(void *ctx, void *data)
{
input_t *inputs = (input_t*)data;
size_t inputsCount = neuton_model_inputs_count();
size_t windowSize = neuton_model_window_size();
for (size_t i = 0; i < windowSize; i++)
{
#ifndef NO_CALC
if (neuton_model_set_inputs(inputs) == 0)
{
float* outputs;
uint16_t index;
uint64_t start_time = micros();
if (neuton_model_run_inference(&index, &outputs) == 0)
{

uint64_t stop_time = micros();

 After the neuton_model_run_inference(&index, &outputs) callback is invoked, the model returns the output predictions. We use a custom function called NeutonPostprocessingBlockMovingAverageProcess(&avg, outputs, &outputs, &index) to accept probabilities from the neural network, average these probabilities, and return averaged probabilities if one class reaches a threshold (we have set a threshold to 0.95f). Also, if no gesture is recognized (index 3), we cancel suppression.

if (NeutonPostprocessingBlockMovingAverageProcess(&avg, outputs, &outputs, &index) == 0)
{
if (index == 0)
{
Serial.print(" Ring: O      ");
}
if (index == 1)
{
Serial.print("Slope: /_     ");
}
if (index == 2)
{
Serial.print(" Wing: \/\/   ");
}
if (index == 3)
{
//             Serial.print("Ooops! Nothing");
avg.suppressionCurrentCounter = 0;
continue;
}
Serial.print("     score: ");
Serial.println(outputs[index], 2);
}
}
}
#endif

inputs += inputsCount;
}

Conclusion

This remake of a popular case vividly demonstrates that a magic wand can be several times faster, smaller and more accurate than in the original experiment. The use of Neuton empowers the model to have better results on such ML applications. It's amazing how quickly the TinyML technology develops. A year ago, such experiments seemed pure magic, but now any schoolboy can produce similar solutions automatically.