In order to detect uncommon and original objects such as pylon(traffic cone), it is necessary to create custom models that are trained on their own. Roktrack uses yolov8 (nano model) to create custom models. The number of images for each class used for training is as follows
Model for mowing navigation
Pylon | 7000 |
Person | 13000 |
Roktrack | 1500 |

After training, I am exporting in onnx format with 320*320 and 640*640 image_size. The reason why I am exporting with two image_sizes is because of the difference in the time required for each inference. The former takes about 1 second on the Rasberry Pi 3A+, while the latter takes a little over 3 seconds. During actual mowing, I use a light model while the pylon is recognizable, and if it is lost, I use a heavy model to be able to detect objects in the distance. In my experiments, we were able to detect a pylon 50m away when using the 1280*1280 model; it takes about 10 seconds to infer one image, so it is impractical to use it for navigation while moving.
Model for number recognition
0 | 200 |
1 | 200 |
2 | 200 |
3 | 200 |
4 | 200 |
5 | 200 |
6 | 200 |
7 | 200 |
8 | 200 |
9 | 200 |
The number recognition model is exported with an image size of 96*96 to speed up processing. As explained in previous log, this model infers on cropped images, so low resolution is not a problem.
Models for Animal Detection
Bear | 3000 |
Deer | 3000 |
Monkey | 1000 |
Raccoon | 1000 |
Fox | 1000 |
Dog | 1000 |
Cat | 1000 |
Civet | 1000 |
Boar | 1000 |
Hare | 1000 |
Badger | 1000 |
This is also exported in 320*320 and 640*640 size, but it leaves some false positives. The nano model can not capture the characteristics of each class; a larger model, such as small, might be better.
I feel that at least 1,500 images per class, preferably 3,000, are needed to achieve satisfactory accuracy.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.