Close

Log #11: docker

A project log for lalelu_drums

Gesture controlled percussion based on markerless human-pose estimation with an AI network from a live video.

lars-friedrichLars Friedrich 11/02/2024 at 08:220 Comments

In this log entry I document a refresh of my backend's operating system installation, that I needed to do recently. For the first time, I used docker to setup the CUDA/tensorflow/tensorrt stack and it turned out to work very well for me.

The reason for the refresh was that some days ago, the operating system installation in my backend system turned into a state in which I could not run the lalelu application any more. A central observation was that nvidia-smi showed  an error message saying that the nvidia driver is missing. I am not sure about the root cause, I suspect an automatic ubuntu update. I tried to fix the installation by running various apt and pip install or uninstall commands, but eventually, the system was so broken that I could not even connect to it via ssh (probably due to a kernel update that messed up the ethernet driver).

So, I had to go through a complete OS setup from scratch. My first approach was to install the cuda/tensorflow/tensorrt stack manually into a fresh ubuntu-22.04-server installation, similar how I did it for the previous installation one year ago. However, this time it failed. I could not find a way to get tensorrt running and it was unclear to me which CUDA toolkit version and tensorflow installation method I should combine.

Description of the system setup

In the second approach, I started all over again with a fresh ubuntu-22.04-server installation. I installed the nvidia driver by

sudo ubuntu-drivers install

The same command with the --gpgpu flag, that is recommended for computing servers, did not provide me with a working setup, at least nvidia-smi did not suggest a usable state.

I installed the serial port driver to the host, as described in Log #02: MIDI out.

I installed the docker engine, following its apt installation method and the nvidia container toolkit, also following its apt installation method.

I created the following dockerfile.txt

FROM nvcr.io/nvidia/tensorflow:24.10-tf2-py3

RUN DEBIAN_FRONTEND=noninteractive apt-get install -y net-tools python3-tables python3-opencv python3-gst-1.0 gstreamer1.0-plugins-good gstreamer1.0-tools libgstreamer1.0-dev v4l-utils python3-matplotlib

 and built my image with the following command

sudo docker build -t lalelu-docker --file dockerfile.txt .

The image has a size of 17.1 GB. To start it, I use

sudo docker run --gpus all -it --rm --network host -v /home/pi:/home_host --privileged -e PYTHONPATH=/home_host/lalelu -p 57000:57000 -p 57001:57001 lalelu-docker

In the running container, the following is possible:

Characterization of the system setup

I could record the following startup times

boot host operating system30 s
startup docker container< 1 s
startup lalelu application11 s

When the lalelu application is running, htop and nvidia-smi display the following

Finally, I recorded inference times in the running lalelu application. The result for 1000 consecutive inferences is shown in the following graph.

Since the inference time is well below 10ms, the lalelu application can run at 100fps without frame drops.

Discussions