As far as lions can tell, anything running efficientdet-lite on a jetson nano is upscaling INT8 weights from the lite model to FP16. It's always been a last resort if nothing else works, but it might be the only supported use case for the jetson nano.
https://github.com/NobuoTsukamoto/tensorrt-examples/blob/main/cpp/efficientdet/README.md
This one seems to be upscaling INT8 to FP16.
A final go with
https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch
entailed downloading efficientdet-d1 as a checkpoint & specifying 1 as the compound_coef which might be required for a 640x640 input size.
Download the checkpoint to the weights directory:
https://github.com/zylo117/Yet-Another-Efficient-Pytorch/releases/download/1.0/efficientdet-d1.pth
The training became:
python3 train.py -c 1 -p lion --head_only True --lr 1e-3 --batch_size 8 --load_weights weights/efficientdet-d1.pth --num_epochs 50 --save_interval 100
The ONNX export needed a hack to accept a -c option & became:
python3 export.py -c 1 -p lion -w logs/lion/efficientdet-d1_49_6250.pth -o efficientdet_lion.onnx
But tensorrt conversion ended in once again
Error Code 4: Miscellaneous (IShuffleLayer Reshape_1935: reshape changes volume. Reshaping [1,96,160,319] to [1,96,160,79].)
In the interest of just making something work, a conversion of efficientdet_lite to tensorrt seemed like the best move. It was also appealing because the training process was known to work.
Converting tflite to tensorrt involves writing a lot of custom software. Everyone has to write their own TFlite converter from scratch.
A test conversion began by downloading an example efficientdet-lite4 which supports 640x640. The example models are unlisted files.
wget https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco/efficientdet-lite4.tgz
This was decompressed into /root/efficientdet-lite4
It has to be converted into a bunch of protobuf files, then to an onnx file, & finally to the tensorrt engine. You have to download a bunch of repositories.
git clone --depth 1 https://github.com/google/automl
git clone --depth 1 https://github.com/NVIDIA/TensorRT
Install some dependencies:
pip3 install tf2onnx
Then you have to create a /root/efficientdet-lite4/efficientdet-lite4.yaml file describing the model.
--- image_size: 640x640 nms_configs: method: hard iou_thresh: 0.35 score_thresh: 0. sigma: 0.0 pyfunc: False max_nms_inputs: 0 max_output_size: 100
Inside automl/efficientdet/tf2/ run
OPENBLAS_CORETYPE=CORTEXA57 python3 inspector.py --mode=export --model_name=efficientdet-lite4 --model_dir=/root/efficientdet-lite4/ --saved_model_dir=/root/efficientdet-lite4.out --hparams=/root/efficientdet-lite4/efficientdet-lite4.yaml
The protobuf files end up in --saved_model_dir. It needs a swap space.
inspector.py needs a hack to access hparams_config.py
import sys
parent_directory = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
sys.path.append(parent_directory)
It needs another hack to get past an eager execution error, but this too failed later.
import tensorflow as tf
tf.compat.v1.disable_eager_execution()
Another stream of errors & workarounds reminiscent of the pytorch errors followed.
TypeError: __init__() got an unexpected keyword argument 'experimental_custom_gradients'
Comment out experimental_custom_gradients
TypeError: vectorized_map() got an unexpected keyword argument 'warn'
Remove the warn argument
RuntimeError: Attempting to capture an EagerTensor without building a function.
Try re-enabling eager execution & commenting out the offending bits of keras
# ema_var_dict = { # ema.average_name(var): opt_ema_fn(var) for var in ema_vars.values() # } # var_dict.update(ema_var_dict)
This eventually succeeded, leaving the conversion to ONNX. Inside TensorRT/samples/python/efficientdet run
OPENBLAS_CORETYPE=CORTEXA57 python3 create_onnx.py --input_size="640,640" --saved_model=/root/efficientdet-lite4.out --onnx=/root/efficientdet-lite4.out/efficientdet-lite4.onnx
This too needs a swap space. This worked, as it did with pytorch.
Then comes the tensorrt generator.
/usr/src/tensorrt/bin/trtexec --onnx=/root/efficientdet-lite4.out/efficientdet-lite4.onnx --saveEngine=/root/efficientdet-lite4.out/efficientdet-lite4.engine
The 1st error was
Error Code 4: Internal Error (Internal error: plugin node nms/non_maximum_suppression requires 165727488 bytes of scratch space, but only 16777216 is available. Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize().
Try setting a workspace size
/usr/src/tensorrt/bin/trtexec --fp16 --workspace=2048 --onnx=/root/efficientdet-lite4.out/efficientdet-lite4.onnx --saveEngine=/root/efficientdet-lite4.out/efficientdet-lite4.engine
This successfully created a tensorrt engine. Very important to include the --fp16
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.