Close

Log #19: blazepose

A project log for lalelu_drums

Gesture controlled percussion based on markerless human-pose estimation with an AI network from a live video.

lars-friedrichLars Friedrich 04/24/2025 at 18:370 Comments

The movenet pose estimation model, that was used for lalelu_drums so far, has the drawback that it provides keypoints for the wrists and for the ankles but no more details on the pose of the hands or the feet. In contrast, the blazepose pose estimation model provides additional keypoints for the pinky knuckle, index knuckle and thumb knuckle as well as for the heel and the foot.

While there is a pretrained tensorflow model for movenet available from kaggle, for blazepose there is only a tensorflow.js model. Therefor, blazepose could not be used for lalelu_drums so far, since a tensorflow model is required as input for the TensorRT conversion (see log entry #01).

Fortunately, I now found a way to convert the blazepose tensorflow.js model to a tensorflow model and compile it for fast inference using TensorRT. The conversion could be done using the tfjs-to-tf converter by Patrick Levin.

In the following video you can see the keypoints related to the feet, tracked with 100 fps (green dots). An average of the knee, ankle and heel keypoints for each leg is computed and visualized in the puredata patch ('right_y_plot' and 'left_y_plot'). This value is used to create trigger events, that are visualized by bang items in puredata and eventually trigger the rimclick and basedrum sounds that can be heard.

Discussions