The leading solution is to always stream frames over a network to a big computer with GPU processing.

Another solution is solving a simpler problem than pose estimation for camera tracking & using pose estimation only for counting reps. Only camera tracking must be portable. Counting reps will always be done near the ryzen.

There are 2 competing libraries for GPU processing: OpenCL & CUDA. The choice depends on the CPU, GPU, & stock portfolio.

A starting point for just detecting people:

The raspberry pi does 1fps.

https://magazine.odroid.com/article/object-detection-in-live-video-using-the-odroid-xu4-with-gstreamer/?ineedthispage=yes

The odroid does 4fps.

The latest algorithm is YOLO. There are rumors of higher frame rates, but no good installation examples.

Running openpose with a GPU requires CUDA & CUDNN. CUDA requires 2GB of downloads from https://developer.nvidia.com/cuda-downloads. CUDNN comes from https://developer.nvidia.com/cudnn They're not linked from the nvidia.com home page.

The version of CUDA, CUDNN, & the X11 driver must all match & there's no documentation. Driver 410.78 happened to work with cuda-repo-ubuntu1604-10-0-local-10.0.130-410.48_1.0-1_amd64.

CUDA also requires rebooting & manually loading the nvidia-uvm module.

To test your CUDA installation:

cd /usr/local/cuda/samples/1_Utilities/deviceQuery

make

deviceQuery

To print the status of your graphics card:

nvidia-smi

Caffe must be rebuilt with CUDA, then openpose.

To build Caffe with CUDA:

edit caffe/Makefile.config
uncomment USE_CUDNN
comment out CPU_ONLY
tweek the CUDA_ARCH line
edit BLAS_INCLUDE, BLAS_LIB, LIBRARY_DIRS to include /root/countreps

All the objects need -fPIC, but nvcc complains about it.
Placing -fPIC after -Xcompiler in NVCCFLAGS, CXXFLAGS, LINKFLAGS but not in COMMON_FLAGS seems to fix it. All the dependencies for caffe were installed in /root/openpose.

PATH=$PATH:/root/countreps/bin make
PATH=$PATH:/root/countreps/bin make distribute

Comment out the tools/caffe.cpp: time() function if there's an undefined
reference to caffe::caffe_gpu_dot

The output goes in the distribute directory & must be copied manually.

cp -a bin/* /root/countreps/bin/
cp -a include/* /root/countreps/include/
cp -a lib/* /root/countreps/lib/
cp -a proto/* /root/countreps/proto/
cp -a python/* /root/countreps/python/

To build openpose with CUDA:

mkdir build
cd build
cmake \
-DGPU_MODE=CUDA \
-DUSE_MKL=n \
-DOpenCV_INCLUDE_DIRS=/root/countreps/include \
-DOpenCV_LIBS_DIR=/root/countreps/lib \
-DCaffe_INCLUDE_DIRS=/root/countreps/include \
-DCaffe_LIBS=/root/countreps/lib/libcaffe.so \
-DBUILD_CAFFE=OFF \
-DPROTOBUF_LIBRARY=/root/countreps/lib \
-DProtobuf_INCLUDE_DIRS=/root/countreps/include \
-DGLOG_INCLUDE_DIR=/root/countreps/include \
-DGLOG_LIBRARY=/root/countreps/lib \
-DGFLAGS_INCLUDE_DIR=/root/countreps/include \
-DGFLAGS_LIBRARY=/root/countreps/lib \
-DCMAKE_INSTALL_PREFIX=/root/countreps/ \
..

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/root/countreps/lib make VERBOSE=1
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/root/countreps/lib make install

The dreaded

Check failed: error == cudaSuccess (2 vs. 0) out of memory

error is caused by the GPU running out of memory. Reduce the netInputSize variable to -1x256.

Openpose on the GeForce GTX 1050 hit 14 frames per second, but the computer can't do anything else with the GPU like play a video. CUDA is a return to 1980's single tasking, but it's still amazing how well it can track a human pose in a blurry photo.

The terrabytes of opaque libraries required to make a computer vision program are how all computing is going to be done in the future. All these libraries are going to be part of the base system. Using computer vision won't involve tweeking neural networks directly or creating training sets directly, but using libraries. Unlike decoding a video or encrypting text, computer vision libraries are the result of millions of people & careers. Bedroom hackers aren't driving this generation of software.

The problems are far too complex for an end user to be directly programming the neural network part or the tensorflow part. The current training sets contain millions of photos or every experience of a hypothetical human who lived many hundreds of years. We're not far from a neural network containing every experience of every human who ever lived.

Making it portable

Check failed: error == cudaSuccess (2 vs. 0) out of memory

Discussions

Making it portable

Check failed: error == cudaSuccess (2 vs. 0) out of memory

Compiling openpose & the 1st test

Openpose on a Macbook with CUDA

Discussions

Become a Hackaday.io Member