Efficientdet with no detections

After copying the example C++ version

https://github.com/NobuoTsukamoto/tensorrt-examples/blob/main/cpp/efficientdet/object_detector.cpp

lite4 went at a miserable 2.5fps. Lite4 + face detection went at 2fps. Lite4 + face detection only used 2GB of RAM.

Buried in the readme was a benchmark table confirming 2fps for this model.

https://github.com/NobuoTsukamoto/benchmarks/blob/main/tensorrt/jetson/detection/README.md

It wasn't very obvious because his video showed a full framerate. The inference must have been done offline.

It did show 320x320 lite0 hitting 20fps so it was back to a windowed lite0.

truckcam/label.py was rerun with 1280x720 output.

Then convert to tfrecords in automl-master/efficientdet

PYTHONPATH=. python3 dataset/create_coco_tfrecord.py --image_dir=../../train_lion --image_info_file=../../train_lion/instances_train.json --output_file_prefix=../../train_lion/pascal --num_shards=10

PYTHONPATH=. python3 dataset/create_coco_tfrecord.py --image_dir=../../val_lion --image_info_file=../../val_lion/instances_val.json --output_file_prefix=../../val_lion/pascal --num_shards=10

Then download a new starting checkpoint

https://storage.googleapis.com/cloud-tpu-checkpoints/efficientnet/ckptsaug/efficientnet-b0.tar.gz

Make a new output directory

mkdir ../../efficientlion-lite0

Then make a new training command for lite0

time python3 main.py \
--mode=train_and_eval \
--train_file_pattern='../../train_lion/pascal-00000-of-00010.tfrecord' \
--val_file_pattern='../../val_lion/pascal-00000-of-00010.tfrecord' \
--model_name=efficientdet-lite0  \
--model_dir=../../efficientlion-lite0/ \
--backbone_ckpt=efficientnet-b0  \
--train_batch_size=1  \
--eval_batch_size=1 \
--eval_samples=100 \
--num_examples_per_epoch=1000 \
--hparams="num_classes=1,moving_average_decay=0,mixed_precision=true" \
--num_epochs=300

Create the efficientlion-lite0.yaml file in ../../efficientlion-lite0/

---
image_size: 320x320
num_classes: 1 
moving_average_decay: 0
nms_configs: 
     method: hard
     iou_thresh: 0.35
     score_thresh: 0.
     sigma: 0.0
     pyfunc: False
     max_nms_inputs: 0
     max_output_size: 100

Inside automl/efficientdet/tf2/ run

PYTHONPATH=.:.. python3 inspector.py --mode=export --model_name=efficientdet-lite0 --model_dir=../../../efficientlion-lite0/ --saved_model_dir=../../../efficientlion-lite0.out --hparams=../../../efficientlion-lite0/efficientlion-lite0.yaml

In TensorRT/samples/python/efficientdet run

time OPENBLAS_CORETYPE=CORTEXA57 python3 create_onnx.py --input_size="320,320" --saved_model=/root/efficientlion-lite0.out --onnx=/root/efficientlion-lite0.out/efficientlion-lite0.onnx

/usr/src/tensorrt/bin/trtexec --fp16 --workspace=2048 --onnx=/root/efficientlion-lite0.out/efficientlion-lite0.onnx --saveEngine=/root/efficientlion-lite0.out/efficientlion-lite0.engine

The original windowing algorithm scanned 1 cropped section per frame & hit 7fps on the raspberry pi. It had enough brains so the window followed the 1st body it detected. If it didn't detect a body, it cycled window positions.

The only evolution with the jetson is going to be face recognition on the full frame. If it matches a face, that always positions the body tracking window. If it detects a body with no current face, go for the body closest to the last face match. If it detects bodies with no previous face, position the tracking window on the largest body in the window. Only if there's no face & no body does it cycle window positions. The hope is 2 models give it a higher chance of getting the right hit.

Efficientdet-lite0 window + face detection ran at 7fps. Efficientdet-lite0 ran at 19fps on its own. Sadly, the custom trained model didn't detect anything while a stock efficientdet-d0 worked. Stock efficientdet-d0 was just as bad as lions remember. Retraining with 1 category was the key but lions believed changing the number of classes was causing it to detect nothing.

There was also a warning from create_onnx.py

Warning: Unsupported operator EfficientNMS_TRT. No schema registered for this operator.

Another attempt with efficientdet-d0 began

time python3 main.py \
--mode=train_and_eval \
--train_file_pattern='../../train_lion/pascal-00000-of-00010.tfrecord' \
--val_file_pattern='../../val_lion/pascal-00000-of-00010.tfrecord' \
--model_name=efficientdet-d0  \
--model_dir=../../efficientlion-d0/ \
--backbone_ckpt=efficientnet-b0  \
--train_batch_size=1  \
--eval_batch_size=1 \
--eval_samples=100 \
--num_examples_per_epoch=1000 \
--hparams="num_classes=1,moving_average_decay=0,mixed_precision=true" \
--num_epochs=300

This training process suffered from a serious memory leak, limiting it to 10 epochs between restarts if it was lucky. A 32 gig swap space bought it more epochs but it still got slower & slower until it ground to a stop at around 20GB RSS. It took many days to train. It might do better with a wrapper script, but it already does a full reload of the model during every epoch as if someone already knew about the memory leak.

A wrapper script would have to compute a new num_epochs argument from the directory contents or poll the directory contents until 10 epochs were complete. It also takes forever for python to start up. A wrapper might do 10 epochs at a time to reduce the python restarts.

The trained efficientdet-d0 yielded another no detection. Something in the training caused all the models to fail.

Another hit said delete moving_average_decay=0.

time python3 main.py \
--mode=train_and_eval \
--train_file_pattern='../../train_lion/pascal-00000-of-00010.tfrecord' \
--val_file_pattern='../../val_lion/pascal-00000-of-00010.tfrecord' \
--model_name=efficientdet-lite0  \
--model_dir=../../efficientlion-lite0/ \
--backbone_ckpt=efficientnet-b0  \
--train_batch_size=1  \
--eval_batch_size=1 \
--eval_samples=100 \
--num_examples_per_epoch=1000 \
--hparams="num_classes=1,mixed_precision=true" \
--num_epochs=300

Noted this command creates a config.yaml inside ../../efficientlion-lite0/ which a previous hit missed. The inspector command becomes

This somehow got to a .engine file without the previous ExponentialMovingAverage error. The .yaml file might have fixed it. Still no detections.

Training automl efficientdet-lite4

Efficientdet dataset hack

Discussions

Become a Hackaday.io Member