2D Object Detection

Introduction

2D object detection is of paramount importance in autonomous vehicles as it enables them to perceive and interpret their surroundings. By accurately identifying and localizing objects such as vehicles, pedestrians, traffic signs, and obstacles, the vehicle can navigate safely and make informed decisions. This information helps in obstacle detection, lane tracking, traffic sign recognition, and predicting the movements of other road users. With reliable 2D object detection, autonomous vehicles can enhance their situational awareness, avoid collisions, and operate efficiently in complex traffic scenarios.

Moreover, 2D object detection plays a crucial role in ensuring the safety of vulnerable road users, such as pedestrians and cyclists. By accurately detecting and tracking their movements, autonomous vehicles can actively avoid accidents and adapt their behavior accordingly. This capability enhances the overall safety of autonomous vehicles and improves their interaction with human users and the surrounding environment. Ultimately, 2D object detection empowers autonomous vehicles to understand their environment, make critical decisions, and navigate the roads with increased safety and efficiency.

KITTI dataset

The KITTI dataset is a widely used benchmark dataset for autonomous driving research, particularly in the area of 2D object detection. It stands for "Karlsruhe Institute of Technology and Toyota Technological Institute" and was created by a collaboration between the two institutions.

The dataset contains diverse scenarios captured from a moving vehicle, including urban street scenes, highways, and rural areas, covering different weather conditions such as sunny, rainy, and cloudy. It comprises a large number of annotated images and point clouds, making it suitable for various computer vision tasks, including object detection, tracking, and scene understanding.

In terms of 2D object detection, the KITTI dataset provides detailed annotations for several object categories, which are: cars, pedestrians, cyclists, person_sitting, misc, truck, tram, and van. The annotations include precise bounding box coordinates around the objects of interest, as well as labels indicating the class of each object. The KITTI dataset annotation format for 2D object detection follows a specific structure and is represented in plain text files. Each annotation file corresponds to an image in the dataset and contains information about the objects present in the image. Here is an example of the KITTI annotation format:

The dataset also includes various additional information, such as camera calibration parameters, timestamps, and vehicle trajectories, which can be valuable for tasks like camera pose estimation and motion analysis.

KITTI has become a popular benchmark for evaluating the performance of 2D object detection algorithms due to its real-world nature and comprehensive annotations. Researchers and developers often use the dataset to compare the accuracy and efficiency of different detection methods, and it has contributed to significant advancements in the field of autonomous driving.

Although many Deep Learning models have been used on the KITTI dataset to evaluate its performance, YOLOv8 have not been tested on it yet.

YOLOv8

YOLOv8 is the lastest version of You Only Look Once (YOLO). You can see the architecture below:

Annotation format

Note that the KITTI annotation format is different from the YOLOv8 format. We had to convert the annotations to YOLOv8 format. The converter code is kitti_to_yolo_converter.py under tools.

Roboflow

Since the Test set of KITTI does not include labels, we used Roboflow (A website that allows you to manipulate your data) to split our training data into Train/Valid/Test. We then added different Data Augmentation techniques. Finally, our new dataset is ready to be used.

Optimization

Making KITTI Dataset on Yolov8 was the first part of the contribution. We needed after to optimize it in order to gain in efficiency and accuracy.

First of all, we increased the image size, in order to get a better resolution, so better results on the detection. It helped the model to train more precisely.

Regarding Data Augmentation, we tested the ones logic with our detection. There were all related to the final perspective of the camera (angle, noise, exposure, shear for perspective, etc...).

Final Results

Those our final results after 15 epochs + 15 epochs (done in two time due to lack of GPU):

Here two examples of the predictions made by the code:

To have a better visualization of the Data Augmentation performance, here is the result of the performance on some test videos. The model with data augmentation (Right) proved its efficiency compared to the original model (Left), where it can detect some occludedobjects and better differentiate between "Cyclist" and "Pedestrian".

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
DLAV		DLAV
tools		tools
README.md		README.md
main.ipynb		main.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2D Object Detection

Introduction

KITTI dataset

YOLOv8

Annotation format

Roboflow

Optimization

Final Results

About

Releases

Packages

Contributors 2

Languages

vita-student-projects/2DObjectDetection_YOLOv8_GR65

Folders and files

Latest commit

History

Repository files navigation

2D Object Detection

Introduction

KITTI dataset

YOLOv8

Annotation format

Roboflow

Optimization

Final Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages