Resizing the input layer using caffe2onnx

While waiting 9 minutes for the onnx python library to load a model, lions remembered file parsers like these going a lot faster 30 years ago in C & taking a lot less memory. The C parser used in trtexec goes a lot faster.

The next idea was to edit the input dimensions in pose_deploy.prototxt

name: "OpenPose - BODY_25"
input: "image"
input_dim: 1 # This value will be defined at runtime
input_dim: 3
input_dim: 256 # This value will be defined at runtime
input_dim: 256 # This value will be defined at runtime

Then convert the pretrained model with caffe2onnx as before.

python3 -m caffe2onnx.convert --prototxt pose_deploy.prototxt --caffemodel pose_iter_584000.caffemodel --onnx body25.onnx

name=conv1_1 op=Convn
    inputs=[
        Variable (input): (shape=[1, 3, 256, 256], dtype=float32), 
        Constant (conv1_1_W): (shape=[64, 3, 3, 3], dtype=<class 'numpy.float32'>)
        LazyValues (shape=[64, 3, 3, 3], dtype=float32), 
        Constant (conv1_1_b): (shape=[64], dtype=<class 'numpy.float32'>)
        LazyValues (shape=[64], dtype=float32)]
    outputs=[Variable (conv1_1): (shape=[1, 64, 256, 256], dtype=float32)]

It actually took the modified prototxt file & generated an onnx model with the revised input size. It multiplies all the dimensions in the network by the multiple of 16 you enter. Then comes the fixonnx.py & the tensorrt conversion.

/usr/src/tensorrt/bin/trtexec --onnx=body25_fixed.onnx --fp16 --saveEngine=body25.engine

The next step was reading the outputs. There's an option of trying to convert the trt_pose implementation to the openpose outputs or trying to convert the openpose implementation to the tensorrt engine. Neither of them are very easy.

openpose outputs:

trt_pose outputs:

Tensorrt seems to look for layers named input & output to determine the input & output bindings.

The inputs are the same. TRT pose has a PAF/part affinity field in output_1 & CMAP/confidence map in output_0. Openpose seems to concatenate the CMAP & PAF but the ratio of 26/52 is different than 18/42. The bigger number is related to the mapping (CMAP) & the smaller number is related to the number of body parts (PAF).

There's a topology array in trt_pose which must be involved in mapping. It has 84 entries & numbers from 0-41.

A similar looking array in openpose is POSE_MAP_INDEX in poseParameters.cpp. It has 52 entries & numbers from 0-51.

The mapping of the openpose body parts to the 26 PAF entries must be POSE_BODY_25_BODY_PARTS. trt_pose has no such table, but it's only used for drawing the GUI.

The outputs for trt_pose are handled in Openpose::detect. There are a lot of hard coded sizes which seem related to the original 42 entry CMAP & 84 entry topology. Converting that to the 52 entry CMAP & 52 entry POSE_MAP_INDEX is quite obtuse.

The inference for openpose is done in src/openpose/net/netCaffe.cpp: forwardPass. The input goes in gpuImagePtr. The output goes in spOutputBlob. There's an option in openpose called TOP_DOWN_REFINEMENT which does a 2nd pass with the input cropped to each body. The outputs go through a similarly obtuse processing in PoseExtractorCaffe::forwardPass. There are many USE_CAFFE ifdefs. Converting that to tensorrt would be a big deal. The trt_pose implementation is overall a lot smaller & more organized.

ONNX graphsurgeon

Openpose framerates with tensorrt

Discussions

Become a Hackaday.io Member