After copying the example C++ version
https://github.com/NobuoTsukamoto/tensorrt-examples/blob/main/cpp/efficientdet/object_detector.cpp
lite4 went at a miserable 2.5fps. Lite4 + face detection went at 2fps. Lite4 + face detection only used 2GB of RAM.
Buried in the readme was a benchmark table confirming 2fps for this model.
https://github.com/NobuoTsukamoto/benchmarks/blob/main/tensorrt/jetson/detection/README.md
It wasn't very obvious because his video showed a full framerate. The inference must have been done offline.
It did show 320x320 lite0 hitting 20fps so it was back to a windowed lite0.
truckcam/label.py was rerun with 1280x720 output.
Then convert to tfrecords in automl-master/efficientdet
PYTHONPATH=. python3 dataset/create_coco_tfrecord.py --image_dir=../../train_lion --image_info_file=../../train_lion/instances_train.json --output_file_prefix=../../train_lion/pascal --num_shards=10
PYTHONPATH=. python3 dataset/create_coco_tfrecord.py --image_dir=../../val_lion --image_info_file=../../val_lion/instances_val.json --output_file_prefix=../../val_lion/pascal --num_shards=10
Then download a new starting checkpoint
https://storage.googleapis.com/cloud-tpu-checkpoints/efficientnet/ckptsaug/efficientnet-b0.tar.gz
Make a new output directory
mkdir ../../efficientlion-lite0
Then make a new training command for lite0
time python3 main.py \ --mode=train_and_eval \ --train_file_pattern='../../train_lion/pascal-00000-of-00010.tfrecord' \ --val_file_pattern='../../val_lion/pascal-00000-of-00010.tfrecord' \ --model_name=efficientdet-lite0 \ --model_dir=../../efficientlion-lite0/ \ --backbone_ckpt=efficientnet-b0 \ --train_batch_size=1 \ --eval_batch_size=1 \ --eval_samples=100 \ --num_examples_per_epoch=1000 \ --hparams="num_classes=1,moving_average_decay=0,mixed_precision=true" \ --num_epochs=300
Create the efficientlion-lite0.yaml file in ../../efficientlion-lite0/
--- image_size: 320x320 num_classes: 1 moving_average_decay: 0 nms_configs: method: hard iou_thresh: 0.35 score_thresh: 0. sigma: 0.0 pyfunc: False max_nms_inputs: 0 max_output_size: 100
Inside automl/efficientdet/tf2/ run
PYTHONPATH=.:.. python3 inspector.py --mode=export --model_name=efficientdet-lite0 --model_dir=../../../efficientlion-lite0/ --saved_model_dir=../../../efficientlion-lite0.out --hparams=../../../efficientlion-lite0/efficientlion-lite0.yaml
In TensorRT/samples/python/efficientdet run
time OPENBLAS_CORETYPE=CORTEXA57 python3 create_onnx.py --input_size="320,320" --saved_model=/root/efficientlion-lite0.out --onnx=/root/efficientlion-lite0.out/efficientlion-lite0.onnx
/usr/src/tensorrt/bin/trtexec --fp16 --workspace=2048 --onnx=/root/efficientlion-lite0.out/efficientlion-lite0.onnx --saveEngine=/root/efficientlion-lite0.out/efficientlion-lite0.engine
The original windowing algorithm scanned 1 cropped section per frame & hit 7fps on the raspberry pi. It had enough brains so the window followed the 1st body it detected. If it didn't detect a body, it cycled window positions.
The only evolution with the jetson is going to be face recognition on the full frame. If it matches a face, that always positions the body tracking window. If it detects a body with no current face, go for the body closest to the last face match. If it detects bodies with no previous face, position the tracking window on the largest body in the window. Only if there's no face & no body does it cycle window positions. The hope is 2 models give it a higher chance of getting the right hit.
Efficientdet-lite0 window + face detection ran at 7fps. Efficientdet-lite0 ran at 19fps on its own. Sadly, the custom trained model didn't detect anything while a stock efficientdet-d0 worked. Stock efficientdet-d0 was just as bad as lions remember. Retraining with 1 category was the key but lions believed changing the number of classes was causing it to detect nothing.
There was also a warning from create_onnx.py
Warning: Unsupported operator EfficientNMS_TRT. No schema registered for this operator.
Another attempt with efficientdet-d0 began
time python3 main.py \ --mode=train_and_eval \ --train_file_pattern='../../train_lion/pascal-00000-of-00010.tfrecord' \ --val_file_pattern='../../val_lion/pascal-00000-of-00010.tfrecord' \ --model_name=efficientdet-d0 \ --model_dir=../../efficientlion-d0/ \ --backbone_ckpt=efficientnet-b0 \ --train_batch_size=1 \ --eval_batch_size=1 \ --eval_samples=100 \ --num_examples_per_epoch=1000 \ --hparams="num_classes=1,moving_average_decay=0,mixed_precision=true" \ --num_epochs=300
This training process suffered from a serious memory leak, limiting it to 10 epochs between restarts if it was lucky. A 32 gig swap space bought it more epochs but it still got slower & slower until it ground to a stop at around 20GB RSS. It took many days to train. It might do better with a wrapper script, but it already does a full reload of the model during every epoch as if someone already knew about the memory leak.
A wrapper script would have to compute a new num_epochs argument from the directory contents or poll the directory contents until 10 epochs were complete. It also takes forever for python to start up. A wrapper might do 10 epochs at a time to reduce the python restarts.
The trained efficientdet-d0 yielded another no detection. Something in the training caused all the models to fail.
Another hit said delete moving_average_decay=0.
time python3 main.py \ --mode=train_and_eval \ --train_file_pattern='../../train_lion/pascal-00000-of-00010.tfrecord' \ --val_file_pattern='../../val_lion/pascal-00000-of-00010.tfrecord' \ --model_name=efficientdet-lite0 \ --model_dir=../../efficientlion-lite0/ \ --backbone_ckpt=efficientnet-b0 \ --train_batch_size=1 \ --eval_batch_size=1 \ --eval_samples=100 \ --num_examples_per_epoch=1000 \ --hparams="num_classes=1,mixed_precision=true" \ --num_epochs=300
Noted this command creates a config.yaml inside ../../efficientlion-lite0/ which a previous hit missed. The inspector command becomes
PYTHONPATH=.:.. python3 inspector.py --mode=export --model_name=efficientdet-lite0 --model_dir=../../../efficientlion-lite0/ --saved_model_dir=../../../efficientlion-lite0.out --hparams=../../../efficientlion-lite0/config.yaml
This somehow got to a .engine file without the previous ExponentialMovingAverage error. The .yaml file might have fixed it. Still no detections.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.