The mane way to get more fps is decreasing netInputSize. There are diminishing returns with higher neuron count & higher noise with lower neuron count.
-1x368 was the default & too big for 2GB of RAM.
-1x256 gives 2fps & might be fast enough to track old people
-1x160 gives 4fps & noisy positions, but the lowest framerate needed to get all the reps on video
-1x128 gives 6fps & much noisier positions
The next task was classifying exercises based on noisy pose data. With the -1x160, openpose presented a few problems.

Falsly detecting humans.

Dropping lions of nearly the same pose as detected lions.

Differentiating between squats & a hip flex proved difficult, since the arms can either be horizontal or vertical in a squat & the knee angles are within the error bounds.

Situps & squats were also within the same error bounds.
The problem would only get harder if more exercises were added. Noise in the pose estimation & lack of 3D information reduced the angle precision.
To get the 100% accuracy of a manual counter, it needed prior knowledge of the exercise being performed. Manually setting the exercise on 1 device while setting up another device as a camera is a real pain, so the easiest solution was hard coding the total number of reps & exercises to be performed. The lion wouldn't be able to throw in a few extra if it was a good day.
A better camera might improve results, in any case.
There's also making a neural network to classify exercises & using YOLO instead of pose detection. Pose detection is the most general purpose algorithm & eventually the only one anyone is going to use. A neural network classifier would definitely not be reliable enough.
Despite its limitations, it's amazing how what are essentially miniaturized vacuum tubes & copper can identify high level biological movements in photos.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.