We run YOLO v5 vs YOLO v7 vs YOLO v8 state-of-the-art object detection models head-to-head on Jetson AGX Orin and RTX 4070 Ti to find the ones with the best speed-to-accuracy balance.
Object detection is an important and rapidly growing area of computer vision, and YOLO (You Only Look Once) is one of the most popular frameworks for object detection. YOLO v5, v7, and v8 are the latest versions of the YOLO framework, and in this blog post, we will compare their performance on the NVIDIA Jetson AGX Orin 32GB platform, the most powerful embedded AI computer, and on an RTX 4070 Ti desktop card. Keep reading to find out which version of YOLO is the best for your needs!
NVIDIA Jetson AGX Orin and ZED stereo camera
With the release of every new YOLO, the first question we all have is, should we upgrade to the latest version? Most of the time, benchmarks are done on high-end GPUs such as A100, far from the embedded GPUs we use in production. TensoRT is rarely used, which is known to optimize most neural networks dramatically. And few benchmarks compare YOLO v7 with the Ultralytics v5 and v8 ones.
So at Stereolabs, we’ve decided to release in 2023 a complete COCO benchmark of YOLO v5 vs. YOLO v7 vs. YOLO v8 on AGX Orin, with actual latencies using TensorRT 8.4 and JetPack 5. Here are the results.
Here are the detailed results for all YOLOv8 vs YOLOv5 vs YOLOv7 models in 640 resolution on both NVIDIA Jetson AGX Orin (JP5) and RTX 4070 Ti (Batch 1, TRT8.4, FP16):
MODEL | AP | AP0.5 | AGX ORIN (FPS) | RTX 4070 TI (FPS) |
v5n | 28 | 45.7 | 370 | 934 |
v8n | 37.3 | 52.5 | 383 | 1163 |
v7-tiny | 37.4 | 55.2 | 290 | 917 |
v5s | 37.4 | 56.8 | 277 | 877 |
v8s | 44.9 | 61.8 | 260 | 925 |
v5m | 45.4 | 64.1 | 160 | 586 |
v8m | 50.2 | 67.2 | 137 | 540 |
v5l | 49 | 67.3 | 116 | 446 |
v7 | 51.2 | 69.7 | 115 | 452 |
v8l | 52.9 | 69.8 | 95 | 391 |
v5x | 50.7 | 68.9 | 67 | 252 |
v7x | 52.9 | 71.1 | 77 | 294 |
v8x | 53.9 | 71.0 | 64 | 236 |
Some interesting findings:
The new YOLOv8 is a great improvement to the classic YOLOv5 object detector. A growing trend in several industries is to combine YOLO with a depth camera, such as the ZED 2i stereo camera. It allows localizing and tracking persons and objects in space for next-level awareness. Here is an example of a live digital twin creation using a Stereolabs ZED camera and the ZED SDK.
Digital twin of an indoor scene using YOLO and ZED 2i camera
In conclusion, all three versions of YOLO (v5, v7 and v8) show solid performance on the Jetson Orin platform. However, based on our testing, YOLO v8 seemed to have the best performance out of the three. It improves mAP on COCO for all the variants compared to YOLO v5 while reaching similar runtimes on Orin and RTX 4070 Ti. If you’re looking for a fast and reliable object detection framework, YOLO v8 may be the right choice for you.
You can also use the new YOLO v8 with ZED cameras to obtain 3D bounding boxes by ingesting Custom Objects into the ZED SDK.