Vehicle Detection and Classification with YOLO

Michał Jurzak, Albert Dańko

An object-detection study built for the Fundamentals of Artificial Intelligence course at AGH. The task is to locate and classify vehicles in street scenes, framed in the YOLO spirit as a single regression from image pixels to bounding-box coordinates and class probabilities.

Data

The Road Vehicle Images dataset provides 3,004 annotated images across 21 classes, each label giving a class and bounding-box coordinates. Exploratory analysis surfaced two practical problems:

Colour distributions were close to normal (confirmed by Kolmogorov–Smirnov and Shapiro–Wilk tests), so no normalisation was applied. The data was re-split into 70% train / 10% validation / 20% test to repair the broken class coverage.

Models

Detection quality is measured by the intersection-over-union between a predicted box B_p and the ground-truth box B_{gt},

\mathrm{IoU} = \frac{|B_p \cap B_{gt}|}{|B_p \cup B_{gt}|},

reported as mean average precision at a fixed threshold (\mathrm{mAP}_{50}) and averaged over thresholds (\mathrm{mAP}_{50\text{--}95}).

Several variants of YOLOv8 and YOLOv5 were trained, both pretrained and from scratch. YOLOv8’s architecture follows the usual backbone / neck / head split: a CNN backbone, a neck combining a Feature Pyramid Network and a Path Aggregation Network for multi-scale features, and a single anchor-free head that predicts object centres directly.

Results

Under a fixed 35–40 epoch budget (hardware-limited, so all models were undertrained but comparable):

Source

Notebooks and the full results are in the source repository.