Project | 1 dollar TinyML

« Back to project details Sort by:

Newest

Rev2 board design in progress (for motion)
12/15/2024 at 17:29 • 0 comments
The first board version had provisions both for both sound-based machine learning (using an analog MEMS microphone), and motion-based machine learning (using a digital MEMS accelerometer). However, to be below the 1 USD BOM target, one would have to populate one of these options.
As I have worked a bit more on motion-type use-cases lately, I saw that there is a need for a dedicated board revision for this use-case. The main painpoint was the large size and a lack of battery.

Revision 2 board for motion

The main changes are as follows:
- Size is now 20x40mm (including USB Type A for charging), down from 26x52 mm.
- Has connector for a LIR button cell battery. LIR1220 or LIR1632 should both fit
- Has a button in addition to RGB LED, to support basic user interactions
Also fixed a couple of design issues, like there was no 3.3V regulator (I thought running off battery would be OK, but the voltage was too high for some components).

I might do a couple of more tests on the rev1 board to make sure there are no other issue. But otherwise, I think this is ready to send to production.

Human Activity Detection with accelerometer

Over the last 6 months I have worked mainly on the MicroPython support in emlearn. This is now getting into a usable shape and I am focusing on practical examples and demos. One of them is for analyzing accelerometer data for Human Activity Detection. The example can detect/classify activities such as walking/standing/lying (using standard dataset), or one can collect custom data to implement exercise classification (squats/jumpingjacks/lunges/other). This is a good starting point for our low-cost board, and we can port the feature extraction code from (MicroPython) to C. And then we can collect more data to enable specialized use cases.

Transmitting raw accelerometer data over BLE advertisements?

For training, we will need to collect raw accelerometer samples. I transmit this data over BLE, ideally at 50 Hz sample rate, but 25 Hz might also be acceptable. However, the BC7161 supports only the advertisements part of the BLE stack, and a single advertisement only has 29 bytes of payload. Which is 0.375 seconds of data at 25 Hz (3 axes, 8 bytes per sample). And only 0.180 seconds at 50 Hz. That is theoretically doable with advertisements, but it is not typical to change the advertisement data so rapidly - so we will have to see if it works in practice when connected to a phone or computer.
The backup plan is to use a cable to a computer for data collection, or to have a more powerful device piggyback via the extension headers to store or transmit the data, over a BLE connection (not just advertisements) or WiFi connection.

More RAM, same cost

The Puya PY32F003 is now available from LCSC with 8 kB RAM and 64 kB FLASH. That is double what we had previously, and it actually has the same cost (15 cents @ 3k). Specifically, I switched to the PY32F003F16U6, which is QFN-20 - so it also takes much less space than the TSSOP20 used on the previous board. The extra RAM is not critical for analyzing accelerometer data, but it will come in very handy for doing audio analysis (which was a little cramped in 4 kB RAM).
Designing a rev2 board for audio will probably come some time later though.
Audio streaming over serial + USB
06/17/2024 at 18:16 • 0 comments
TLDR: Audio data can be streamed to computer over serial to USB. And using a virtual device in ALSA (or similar), we can record from the device as if it was a proper audio soundcard/microphone.

In a previous post we described the audio input of the prototype board, using the Puya PYF003 microcontroller. It consists of a 10 cent analog MEMS microphone, a 10 cent operational amplifier, and the internal ADC of the PY32. To check the audio input, we need to be able to record some audio data that we can analyze.

The preferred way to record audio from a small microcontroller system would be to implement audio over USB using the Audio Device Class, and record on a PC (or embedded device like RPi). This ensures plug & play with all operating systems, without needing any drivers. Alternatively, one could output the audio from microcontroller on a standard audio procotol such as I2S, and then use a standard I2S to USB device to get the data onto the computer. Example: MiniDSP USBStreamer.
However, the Puya PY32F003 (and most other sub 1 USD microcontrollers), does not support USB nor I2S. So instead we will stream the audio over serial, and use a serial-to-USB adapter to get it on the PC. This requires some custom code, as there are no standards for this (to my knowledge at least).

Streaming audio over serial

Since the serial stream is also our primary logging stream, it is useful keep it as readable text. This means that binary data, such as the audio PCM must be encoded. There are several options here. I just went with the most widely supported, base64. It is a bit wasteful (33% increase), but it is good-enough for our usages.
A default baudrate of 115200 in PY32 examples, on the other hand, will not do. The bandwidth needed for 8kHz sample rate of 16 bit PCM, base64 encoded is at least 2*8000*(4/3)*8 = 170 kbaud (ignoring overheads for message framing). Furthermore, the standard printf/serial communication is blocking:
So any time spent on sending serial data, is time the CPU cannot do other tasks.
It would probably be possible to set up DMA buffering here, but that would be additional complexity.
I tested the PY32 together with an FTDI serial-to-USB cable. It worked at least up to 921600 baud, which is ample.
The messages sent look like this going over the serial port. The data part is base64 encoded PCM for a single chunk of int16 PCM audio.
```
audio-block seq=631 data=AAD///7/AgAAAP3//f/8//3/AwACAAYAAAD//wAAAgAAAAgAAAD//wAA///+/wIA///+//7///8CAAAACAD///3////8/wMA//8AAAIAAgD9/wEACAAAAAEAAAAGAAAAAAACAP//BAD9//3/FwABAP7///8AAAQA/v8CAP7/AAD9/wEA/f8GAAIAAAD6//3/AAAHAAQA+f/e/wEA/v8AAAAA/v/+//3/AAAGAAIAAAD+/wYAAAABAP//AAAAAP7/AAD+//r//v/+/wEA/f/9/wAA/f/+////AAABAAYAAAD9/wAAAQABAAAA/v8GAAAAAQD+/wAAAAAAAAwAAgAAAA==  
```
Receiving is the data done with a Python script, using pyserial. The script identifies which of the serial messages are PCM audio chunks, and then decodes and processes them. Other messages from the microcontroller are logged out as-is.

Virtual soundcard using loopback

Getting the audio into our script on the PC side is useful. But preferably, we would like to use standard audio tools, and not have to invent everything ourselves. So the processing script takes the received audio data, and write it to an output sound device, using the sounddevice library. This allows playing it back on our speaker, which allows for simple spot checking. But even more useful is to use a loopback device, to get a virtual sound card for our device.

I tested this using ALSA loopback, which creates a pair of ALSA devices. The script can then write to one device, and a standard program that supports ALSA (which is practically everything on Linux) can read the audio stream from the other device.
```
# read data from serial, output to ALSA virtual device
python User/log.py --serial /dev/ttyUSB0 --sound 'hw:3,0'

# record audio from ALSA virtual device
arecord -D hw:3,1 -f S16_LE -c 1 -r 8000 recording.wav
```
Note: There is nothing ALSA specific about the Python script, so this approach should also work with other sound systems that support virtual devices. Such as PulseAudio/PipeWire on Linux, or on Mac OS or Windows.

Audio recording using ADC with PY32

Audio recording must be done at high samplerates (8kHz+) and at precise timing (no/minimal jitter). For that, we use the timer peripheral in the PY32, and wire it up directly to the DMA subsystem. This way, our CPU and program is not involved at all in sampling, and we get the data as convenient blocks in a size we specify (ex: 64 samples). This is pushed onto a queue in the DMA interrupt, and can be processed at a leisurely pace in the main loop.
We used this example in the PyF0 template repository as starting point: ADC SingleConversion Trigger DMA.

Recorded audio

The following audio was recorded by playing back a song on a phone,
with the headphone jack connected to the ADC of a standard PY32F003 development board.

Audio recording of speech (as spectrogram)

In the spectrogram, we can see the voice relatively clearly. However, there is also a bunch of noise. In particular, there is a lot of tonal noise, and occasional dropouts.

The tonal noise disappeared in one of the recordings, so it might be electromagnetic interference, for example over the USB, or from the multitude of switching power supplies around the device under test.
The occasional dropouts might be due to overflows in the audio processing queue, either on microcontroller or on host side. More buffering might fix it.

Next

Now we can run more tests of the audio input to debug these issues. With a clean power supply and an oscilloscope at hand. Eventually, we also need to include the on-board amplifier and microphone in the tests. It would be good to have some basic measurements of the frequency response, sensitivity, and noise floor.
And we also need to write firmware that can process the audio stream and run a Machine Learning classifier.
Front page news
05/02/2024 at 19:38 • 0 comments

This project recently hit Hackaday front page. So this seems a good time for a quick update, and maybe some clarifications.

What this project is

1. Research into ultra-low cost hardware for TinyML systems.

The motivation is to explore what is possible within an artificially constrained budget. And what are the implications on the software and ML side of the computational constraints such an environment has.

2. Testing grounds for the emlearn open-source software package. The software is mostly used on slightly more powerful microcontrollers, typically 0.5-5 USD for just the microcontroller, and similar amounts in sensors. But trying to scale down is a good torture test.

What this project is not

1. NOT a good starting point for getting into ML on microcontrollers and sensors (TinyML).

For that, I recommend getting much beefier hardware. Like an ESP32 with several megabytes of RAM and FLASH. That will be a lot more practical and fun. AdaFruit, Seed Studio, Sparkfun, Olimex etc all have good options. Arduino with Tensorflow Lite for Microcontrollers is probably the most practical software starting point still. I am working on MicroPython bindings for emlearn which has the goal to be super accessible. But that project is still in very early days.

2. NOT a ready-to-run board

Current rev0 boards have just been through basic HW bringup - with several critical problems for actual usage. But looks to be enough to continue testing on - which is all that matters for a rev0 board. A new board revision will come some time in the summer, after I have had time to test and develop some more. That might actually be usable, if we are lucky.
The BLE driver and firmware is also just skeletons at this point in time.

News

CNN running on PY32. I have been testing running some Convolutional Neural Networks on Puya PY32. I was able to port TinyMaix successfully, and run a 3 layer CNN that takes 28x28 dimensional input. This complexity would be suitable for doing simple audio recognition - which is of interest in this project. However, it used 2 kB RAM and 25 kB of FLASH - leaving only 2 kB RAM and 7 kB FLASH for the rest of the system. That would be a tight squeeze... But they claim the AVR8 port used only 12 kB FLASH - so maybe it can be optimized down. To be investigated....

emlearn + MicroPython presentation at PyData Berlin. The slides are available. Video is to be published in the coming weeks, I believe.
Going to TinyML EMEA 2024 in Milano, Italy in June. I will be presenting about the emlearn TinyML software project. And maybe also a little bit about this hardware project :)

Audio input for 20 cents USD

02/25/2024 at 15:40 • 0 comments

TLDR: Using analog MEMS microphone with an analog opamp amplifier, it is possible to add audio processing to our sensor.
The added BOM cost for audio input is estimated to be 20 cents USD.
A two-stage amplifier with software selectable high/low gain is used to get the most of the internal microcontroller ADC.
The quality is not expected to be Hi-Fi, but should be enough for many practical Audio Machine Learning tasks.

Ultra low cost microphones

The go-to options for a microphone for a microcontroller based system are digital MEMS (PDM/I2S/TDM protocl), analog MEMS, or analog elecret microphone.

The ultra low cost microcontrollers we have found, do not have pheripherals for decoding I2S or PDM. It is sometimes possible to decode I2S or PDM using fast interrupts/timers or a SPI pheriperal, but usually at quite some difficulty and CPU usage. Furthermore, the cheapest digital MEMS microphone we were able to find cost 66 cents. This is too large part of our 100 cent budget, so a digital MEMS microphone is ruled out.

Below are some examples of analog microphones that could be used. All prices are in quantity 1k, from LCSC.

MEMS analog. SMD mount

LinkMEMS LMA2718T421-OA5 0.06 USD
LinkMEMS LMA2718T421-OA1 0.08 USD
Goertek S15OT421-005 0.09 USD
CUI CMM-2718AT-42316-TR 0.47 USD

Analog elecret. Capsule

INGHAi GMI6050 0.09 USD
INGHAi GMI9767 0.09 USD

So there looks to be multiple options within our budget.

Example of MEMS analog microphones (from CUI)

The sensitivity of the MEMS microphones are typically -38 dBV to -42 dBV, and have noise floors of around 30-39 dB(A) SPL.

Analog pre-amplifier

Any analog microphone will need to have an external pre-amplifier
to bring the output up to a suitable level for the ADC of the microcontroller.

An opamp based pre-amplifier is the go-to solution for this. The requirements for a suitable opamp can be found using the guide in Analog Devices AN-1165, Op Amps for MEMS Microphone Preamp Circuits.

The key criteria, and their implications on opamp specifications, are as follows:

achieve the neccessary gain (Gain Bandwidth Product) - not introduce noise (Input Noise Density)
flat frequency response (Gain Bandwidth Product)
not introducing too much distortion (Slew Rate, THD)

Furthermore, it must work at the voltages available in the system, typically 3.3V from a regulator, or 3.0-4.2V from Li-ion battery.

ADC considerations

The standard bit-depth for audio is 16 bit, or 24 bits for high-end audio. To cover the full audible range, the samplerate should be 44.1/48 kHz. However, for many Machine Learning tasks 16 kHz is sufficient. Speech is sometimes processed at just 8 kHz, so this can also be used.

Puya PY32V003 datasheet says specify power consumption at 750k samples per second. However, ADC conversion takes 12 cycles, and the ADC clock is only guaranteed to be 1 Mhz (typical is 4-8 Mhz). That would leave 83k samples per second in the worst case, which is sufficient for audio. In fact, we could use an oversampling ratio of 4x or more - if we have enough CPU capacity.

The ADC resolution is specified as 12 bits. This means a theoretical max dynamic range of 72 dB. However, some of the lower bits will be noise, reducing the effective bit-depth. Realistically, we are probably looking at an effective bitrate between 10 bit (60 dB) and 8 bit (42 dB). Practical sound levels at a microphone input vary quite a lot in practice. The sound sources of interest may vary a lot in loudness, and the distance from source to sensor also has a large influence. Especially for low dynamic range, this is a challenge: If the input signal is low, we will a have poor Signal to Noise Ratio, due to quantization and ADC noise. Or, if the input signal is high, we risk clipping due to maxing out the ADC.

Finding the gain

The gain is a critical parameter for amplifier design, as it influences almost all other requirements. If we look at speech as reference. Normal speech level at 3 meters is approximately 50 dB(A) SPL, and up to 90 dB(A) SPL for shouting up close. These are short-time average levels. And because the sound pressure is not constant, the max level (which system also needs to represent) is quite a lot higher.

Given a microphone with a sensitivity of -38 dBV, and allowing for 20 dB headroom, the ideal gains would be between 65 dB (1800x) and 25 dB (18x).

level	preamp_gain

50.0	65.52
60.0	55.52
70.0	45.52
80.0	35.52
90.0	25.52

A two-stage amplifier with selectable gain

Intergrated Circuits for operational amplifiers come with either 1, 2, or 4 opamps. It turns out that a chip with 2 opamps can be had for basically the same price as 1. It is generally a good idea to split amplification into multiple stages, as this is less likely to hit the limits of the Gain Bandwidth Product of the opamp. However, in this case we can get another benefit which is more important: the ability to have two different gains. By providing them both to the microcontroller as separate ADC channels, we can switch between them in software. This can either be used statically in form of a high/low switch. Or it could be done dynamically by monitoring the inputs, as a very crude form for Automatic Gain Control (AGC).

Selecting the operational amplifier

Now we know all the parameters to select the opamp.

Gain needed. Up to 40 dB / 100x (per stage).
Bandwidth. Audio range, 20 kHz.
Mic noise floor. -102 dBV
Output voltage. 3.0V peak to peak

From this we can compute the key opamp specs. The equations are covered in the reference design guide from Analog Devices linked previously. We need something that has:

Gain Bandwith Product (GBP). 2 Mhz
Noise density. 20 nV/Hz
Slew rate. 0.25 V/us

I reviewed a bunch of cheap opamps at LCSC, that can run on the relevant voltages. Their specifications can be seen in the following table:

		cost	current	noise_density	slewrate	gbp
part	manufacturer
LMV321IDBVR	UMW	0.0290	0.06	27.0	0.52	1.00
TLV333	TI	0.2085	0.06	55.0	0.16	0.35
LM321LVIDBVR	TI	0.0320	0.09	40.0	1.50	1.00
GS8621	Gainsil	0.0702	0.25	18.0	1.66	3.00
GS8721	Gailsil	0.0847	1.50	12.0	9.00	11.00
LMV721	Tokmas	0.0732	1.50	11.5	9.00	11.00

We see that the commodity low-cost, low-power LMV321 type chips are slightly out of spec, in both noise density and gain bandwidth product. The LMV721 class of devices have more-than-good enough performance. The GS8621 is a good alternative that has lower power consumption.

Audio input BOM

Microphone Goertek S15OT421-005 0.0888 USD
Opamp Gainsil GS8632 0.0789 USD

Total of 16 cents, rounding up to 20 cents with capacitors and resistors.

Now that we have established that the hardware should be able to receive the audio,
we need to validate that we are able to process the audio signal with our rather weak microcontroller. That will be the topic of an upcoming post.

Board bringup of first prototypes successful
02/25/2024 at 12:25 • 0 comments
First prototype boards arrived this week.

In the weekend I did basic tests of all the subsystems:
- Charger/regulator. Voltage levels
- Microphone+amp. Gain, noise
- Microcontroller. Flashing, toggle GPIO pin - BLE module. I2C communication.
- Accelerometer. I2C communication
- LEDs. Blink
Board bringup - fun but messy

As always with a first revision, there are some issues here and there. But thankfully all of them have usable workarounds. So we can develop with this board.

Examples of issues identified:
- LEDs are mounted the wrong way. Flip them or use external pins on header
- Battery charger 4.2v is too high for BLE module and MEMS mic. Use external 3.3v regulator
- MEMS mic did not work. Use external elecret mic
Next step will be to write some more firmware to validate more in detail that the board is functional. This includes:
- Driver for Holtek BC7161 BLE module (I2C)
- Driver for ST LIS3DH accelerometer (I2C)
- ADC readout for audio input
Development board sent to production
02/11/2024 at 11:09 • 0 comments
I made an initial development board. This supports both sound-based and accelerometer-based ML tasks. As well as using the LEDs as a color detector. So this is intended to be used to develop and validate the tech stack. And then further cost-optimization will happen with later revisions.

These are the key components
- Microcontroller. Puya PY32F003
- BLE beacon transmitter
- Accelerometer. ST LIS3DH
- Microphones. Top-port or bottom-port MEMS, or external electret capsule
- Battery charger for LiPo/Li-Ion cells
- USB Type A connector for power/charge
Using a pre-built and FCC certified module for Bluetooth Low Energy, the Holtek BM7161.
This is a simple module based around the low cost BC7161 chip.

An initial batch of 10 boards have been ordered from JLCPCB.

Also did a check of the BOM costs. At 200 boards, the components except for passives cost
- With microphone: 0.66 USD per board
- With accelerometer: 0.825 USD per board
Additionally, there are around 20 capacitors, 1 small inductor, and 20 resistors needed.
This is estimated to be between 0.15 - 0.20 USD per board.
So it looks feasible to get below the 1 USD target BOM, for as low as 200 boards.

Also designed a small 3d-printed case, with holes for the microphone and LEDs / light sensor.
Activity Recognition using accelerometer with tree-based ML models
01/21/2024 at 22:33 • 0 comments
Summary/TLDR

This looks to just-barely-doable on the chosen microcontroller (4 kB RAM and 32 kB FLASH).
Expected RAM usage is 0.5 kB to 3.0 kB, and FLASH between 10 kB to 32 kB FLASH.
There are accelerometers available that add 20 to 30 cents USD to the Bill of Materials.
Random Forest on time-domain features can do a good job at Activity Recognition.
The open-source library emlearn has efficient Random Forest implementation for microcontrollers.

Applications of Activity Recognition

The most common sub-task for Activity Recognition using accelerometers is Human Activity Recognition (HAR). It can be used for Activities of Daily Living (ADL) recognition such as walking, sitting/standing, running, biking etc. This is now a standard feature on fitness watches and smartphones etc.

But there are ranges of other use-cases that are more specialized. For example:
- Tracking sleep quality (calm vs restless motion during sleep)
- Detecting exercise type counting repetitions
- Tracking activities of free-roaming domestic animals
- Fall detection etc as alerting system in elderly care
And many, many more. So this would be a good task to be able to do.

Ultralow cost accelerometers

To have a sub 1 USD sensor that can perform this task, we naturally need a very low cost accelerometer.

Looking at LCSC (in January 2024), we can find:
- Silan SC7A20 0.18 USD @ 1k
- ST LIS2DH12 0.26 USD @ 1k
- ST LIS3DH 0.26 USD @ 1k
- ST LIS2DW12 0.29 @ 1k
The Silan SC7A20 chip is said to be a clone of LIS2DH.

So there looks to be several options in the 20-30 cent USD range.
Combined with a 20 cent microcontroller, we are still below 50% of our 1 dollar budget.

Resource constraints

It seems that our project will have a 32-bit microcontroller with around 4 kB RAM and 32 kB FLASH (such as the Puya PY32F003x6). This sets the constraints that our entire firmware needs to fit inside. The firmware needs to collect data from the sensors, process the sensor data, run the Machine Learning model, and then transmit (or store) the output data. Would like to use under 50% of RAM and FLASH for buffers and for model combined, so under 2 kB RAM and under 16 kB FLASH.

Overall system architecture

We are considering an ML architecture where accelerometer samples are collected into fixed-length windows (typically a few seconds long) that are classified independently. Simple features are extracted from each of the windows, and a Random Forest is used for classification. The entire flow is illustrated in the following image, which is from A systematic review of smartphone-based human activity recognition methods for health research.

This kind of architecture was used for in the paper Are Microcontrollers Ready for Deep Learning-Based Human Activity Recognition? The paper shows that it is possible to perform similarly to a deep-learning approach, but with resource usage that are 10x to 100x better. They were able to run on Cortex-M3, Cortex-M4F and Cortex M7 microcontrollers with at least 96 kB RAM and 512 kB FLASH. But we need to fit into 5% of that resource budget...

RAM needs for data buffers

The input buffers, intermediate buffers, tends to take up a considerable amount of RAM. So an appropriate tradeoff between sampling rate, precision (bit width) and length (in time) needs to be found. Because we are continiously sampling and also processing the data on-the-run, double-buffering may be needed. In the following table, we can see the RAM usage for input buffers to hold the sensor data from an accelerometer. The first two configurations were used in the previously mentioned paper:

samples size percent
buffers channels bits samplerate duration
2.00 3 16 100 1.28 128 1536 37.5%
2.56 256 3072 75.0%
8 50 1.28 64 384 9.4%
2.56 128 768 18.8%
1.25 3 8 50 2.56 128 480 11.7%

16 bit is the typical full range of accelerometers, so it preserves all the data. It may be possible to reduce this down to 8 bit with sacrificing much performance. This can be done by scaling the data linearly, or implementing a non-linear transform such as square-root or logarithm to reduce the range of values needed.

Using 50 Hz sampling rate would also be very beneficial to reduce RAM usage. Assuming that feature processing is quite fast, it should also be possible to not use full double-buffering. It may also be possible to keep a buffer of computed features (much smaller in size) for the windows and classify them together. This would allow reducing the window size, but maintain information from a similar amount of time in order to keep performance up.

So it seems feasible to find a configuration under 2 kB RAM that has good performance.

Feature extraction

In the previously referenced paper, they used 9 features. These compute simple statistics directly on each window.
No FFT or similar heavy processing is used. This should have a negligible RAM (under 256 bytes) and FLASH usage (under 5kB).

Random Forest classifier, FLASH requirements

The previously mentioned paper tested using 10-100 trees, with a max_depth of 9. The authors found that more than 50 trees gave a marginal improvements in F1 score. They reported 10 trees used 10kB FLASH, and 50 trees to be around 50 kB FLASH.

However, it does not appear that they did any hyperparameter optimization to find smaller models. Therefore, I forked their git repository with the experiments and added my own tuning. I varied by the depth of the trees and the number of features per tree, in order to see if a smaller amount of trees. Both reducing number of trees and the depth helps to reduce model size. The code can be found at https://github.com/jonnor/feature-on-board-activities, and results are in the following image:

The original performance (50 trees, max_depth=9) is marked with a green dot (on the right side). We can see that 5 trees can just barely hit the same performance levels, and that 10 trees is able to improve the performance. And that the model size can be 4-10x smaller with no or marginal degradation in performance.

The size estimates in the plot are for the "loadable" inference strategy in emlearn. Benchmarks show that when using the "inline" inference strategy with 8-bit integers, then model size is approximately half. So the models that take 20 kB in the plot should in practice take around 10 kB.

At 10 kB these models are able to match the performance of the untuned model (which matched the deep-learning baselines), and fit in our 16 kB FLASH budget.
Ultra low cost microcontrollers
01/20/2024 at 01:01 • 0 comments
If the complete BOM for sensor is to be under 1 USD, the microcontroller needs to be way below this. Preferably below 25% in order to leave budget for sensors, power and communication.

Thankfully, there have been a lot of improvements in this area over the last years. Looking at LCSC.com, we can find some interesting candidates:
- ST STM32G030F6P6. 32 kB FLASH / 8 kB RAM. `0.30 USD @ 1k`
- WCH CH32V003. RISC-V. 16KB FLASH / 2KB RAM. 48MHz QFN-20. `0.15 USD @ 1k`
- Puya PY32F003x6. 4 kB RAM / 32 kB FLASH. `0.13 USD @ 1k`.
- Puya PY32F002. Cortex M0+. 20 Kb FLASH / 3 kB RAM. 24 Mhz `0.10 USD @ 1k`
- Padauk PFS154. 2 kB FLASH / 128 bytes of RAM, `0.06 USD @ 1k`
- Fremont Micro Devices FMD FT60F011A-RB. 1kB FLASH / 64 bytes RAM. `0.06 USD @ 1k`.
There are also a very few sub-1 USD microcontrollers that have integrated wireless connectivity.
- WCH CH582F. Bluetooth Low Energy. `0.68 USD @ 1k`
Implications for 1 dollar TinyML project

It looks like if we budget 10-20 cents USD to the microcontroller, then we get around:
- 16-20 kB FLASH
- 2-4 kB RAM
- 24-48 Mhz clock speed
- 32 bit CPU
- No floating-point unit (FPU)
At this price point the WCH CH32V003 or the Puya PY32F003x6 look like the most attractive options. Both have decent support in the open community. WCH CH32 can be targetted with cnlohr/ch32v003fun and Puya with py32f0-template.

What kind of ML tasks can we manage to perform on such a small CPU? That is the topic for the next steps.