Converting tflite to tensorrt

As far as lions can tell, anything running efficientdet-lite on a jetson nano is upscaling INT8 weights from the lite model to FP16. It's always been a last resort if nothing else works, but it might be the only supported use case for the jetson nano.

https://github.com/NobuoTsukamoto/tensorrt-examples/blob/main/cpp/efficientdet/README.md

This one seems to be upscaling INT8 to FP16.

A final go with

https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch

entailed downloading efficientdet-d1 as a checkpoint & specifying 1 as the compound_coef which might be required for a 640x640 input size.

Download the checkpoint to the weights directory:

https://github.com/zylo117/Yet-Another-Efficient-Pytorch/releases/download/1.0/efficientdet-d1.pth

The training became:

python3 train.py -c 1 -p lion --head_only True --lr 1e-3 --batch_size 8 --load_weights weights/efficientdet-d1.pth --num_epochs 50 --save_interval 100

The ONNX export needed a hack to accept a -c option & became:

python3 export.py -c 1 -p lion -w logs/lion/efficientdet-d1_49_6250.pth -o efficientdet_lion.onnx

But tensorrt conversion ended in once again

Error Code 4: Miscellaneous (IShuffleLayer Reshape_1935: reshape changes volume. Reshaping [1,96,160,319] to [1,96,160,79].)

In the interest of just making something work, a conversion of efficientdet_lite to tensorrt seemed like the best move. It was also appealing because the training process was known to work.

Converting tflite to tensorrt involves writing a lot of custom software. Everyone has to write their own TFlite converter from scratch.

A test conversion began by downloading an example efficientdet-lite4 which supports 640x640. The example models are unlisted files.

wget https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco/efficientdet-lite4.tgz

This was decompressed into /root/efficientdet-lite4

It has to be converted into a bunch of protobuf files, then to an onnx file, & finally to the tensorrt engine. You have to download a bunch of repositories.

git clone --depth 1 https://github.com/google/automl
git clone --depth 1 https://github.com/NVIDIA/TensorRT

Install some dependencies:

pip3 install tf2onnx

Then you have to create a /root/efficientdet-lite4/efficientdet-lite4.yaml file describing the model.

---
image_size: 640x640
nms_configs: 
     method: hard
     iou_thresh: 0.35
     score_thresh: 0.
     sigma: 0.0
     pyfunc: False
     max_nms_inputs: 0
     max_output_size: 100

Inside automl/efficientdet/tf2/ run

OPENBLAS_CORETYPE=CORTEXA57 python3 inspector.py --mode=export --model_name=efficientdet-lite4 --model_dir=/root/efficientdet-lite4/ --saved_model_dir=/root/efficientdet-lite4.out --hparams=/root/efficientdet-lite4/efficientdet-lite4.yaml

The protobuf files end up in --saved_model_dir. It needs a swap space.

inspector.py needs a hack to access hparams_config.py

import sys
parent_directory = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
sys.path.append(parent_directory)

It needs another hack to get past an eager execution error, but this too failed later.

import tensorflow as tf
tf.compat.v1.disable_eager_execution()

Another stream of errors & workarounds reminiscent of the pytorch errors followed.

TypeError: __init__() got an unexpected keyword argument 'experimental_custom_gradients'

Comment out experimental_custom_gradients

TypeError: vectorized_map() got an unexpected keyword argument 'warn'

Remove the warn argument

RuntimeError: Attempting to capture an EagerTensor without building a function.

Try re-enabling eager execution & commenting out the offending bits of keras

#      ema_var_dict = {
#          ema.average_name(var): opt_ema_fn(var) for var in ema_vars.values()
#      }
#      var_dict.update(ema_var_dict)

This eventually succeeded, leaving the conversion to ONNX. Inside TensorRT/samples/python/efficientdet run

OPENBLAS_CORETYPE=CORTEXA57 python3 create_onnx.py --input_size="640,640" --saved_model=/root/efficientdet-lite4.out --onnx=/root/efficientdet-lite4.out/efficientdet-lite4.onnx

This too needs a swap space. This worked, as it did with pytorch.

Then comes the tensorrt generator.

/usr/src/tensorrt/bin/trtexec --onnx=/root/efficientdet-lite4.out/efficientdet-lite4.onnx --saveEngine=/root/efficientdet-lite4.out/efficientdet-lite4.engine

The 1st error was

Error Code 4: Internal Error (Internal error: plugin node nms/non_maximum_suppression requires 165727488 bytes of scratch space, but only 16777216 is available. Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize().

Try setting a workspace size

/usr/src/tensorrt/bin/trtexec --fp16 --workspace=2048 --onnx=/root/efficientdet-lite4.out/efficientdet-lite4.onnx --saveEngine=/root/efficientdet-lite4.out/efficientdet-lite4.engine

This successfully created a tensorrt engine. Very important to include the --fp16

Padding tensors

Training efficientdet-lite4

Discussions

Become a Hackaday.io Member