Skip to content

uug-ai/object-classification-yolov8

Repository files navigation

Object Classification Utilising yoloV8

The repository includes all the scripts necessary for object classification, utilizing yoloV8 for object detection, segmentation, and classification. It integrates with RabbitMQ to receive messages and retrieves media from the Kerberos Vault based on the message details. Upon receiving the video, objects are detected, segmented, and classified, while the primary colors of the objects are simultaneously calculated. The frame is annotated with this information and can be saved locally. Additionally, the results are stored in a JSON object. More features are available and will be detailed in a subsequent paragraph.

Prerequisites

To correctly install the necessary dependencies, run the following command. It is recommended to use a virtual environment for this process:

python -m venv venv
source ./venv/bin/activate
pip install -r requirements.txt

Features & Corresponding .env Variables

This repository offers numerous options and additional features to optimally configure it to your needs. Below is a list of all available features and their corresponding .env variable names. These variables can be modified in the included .env file.

Utilized Model

YOLOv8 offers a range of models catering to various accuracy-performance trade-offs. Among these, yolov8n.pt is the most performance-focused, while yolov8x.pt emphasizes accuracy. Intermediate models such as yolov8s, yolov8m, and yolov8l progressively balance performance and accuracy. The aforementioned models are classification models only. Additionally, the object classification supports segmentation models, which have similar names but include '-seg' (e.g., yolov8n-seg.pt). Segmentation models provide the advantage of removing the background and overlapping objects for main color calculation, which will be detailed further in the color prediction feature description.

The utilised model can be altered at MODEL_NAME .env variable.

Queue Message Reader

The object classification system will automatically check for incoming messages and process them. If there is a queue build-up, it will continue to process media until the queue is empty. This functionality leverages the uugai-python-dynamic-queue dependency. More information can be found in the corresponding GitHub repository. Initialization is straightforward, as demonstrated in the code snippet below, which also lists the corresponding .env variables.

# Initialize a message broker using the python_queue_reader package
rabbitmq = RabbitMQ(
    queue_name = var.QUEUE_NAME, 
    target_queue_name = var.TARGET_QUEUE_NAME, 
    exchange = var.QUEUE_EXCHANGE, 
    host = var.QUEUE_HOST, 
    username = var.QUEUE_USERNAME,
    password = var.QUEUE_PASSWORD)

# Receive a message from the queue
message = rabbitmq.receive_message()

Kerberos Vault Integration

The incoming messages provide the necessary information to retrieve media from the Kerberos Vault. The received media can then be easily written to a video file, allowing it to be used as input for the model. This functionality leverages the uugai-python-kerberos-vault dependency. More information can be found in the corresponding GitHub repository, and additional details about Kerberos Vault itself can be found here. Initialization is straightforward, as demonstrated in the code snippet below, which also lists the corresponding .env variables.

# Initialize Kerberos Vault
kerberos_vault = KerberosVault(
    storage_uri = var.STORAGE_URI,
    storage_access_key = var.STORAGE_ACCESS_KEY,
    storage_secret_key = var.STORAGE_SECRET_KEY)

# Retrieve media from the Kerberos Vault, in this case a video-file
resp = kerberos_vault.retrieve_media(
        message = message, 
        media_type = 'video', 
        media_savepath = var.MEDIA_SAVEPATH)

Object Classification

The primary focus of this repository is object classification, achieved using YOLO's pretrained classification or segmentation models as described in the 'utilized model' subsection. Based on your preferences, there are configurable parameters that modify the classification process. These parameters are divided into performance-based and application-based categories. The available parameters are listed below:

Performance-based .env Variables

MODEL_NAME: As discussed in the 'utilized model' section, this parameter allows you to choose a model that balances performance and accuracy according to your needs. For more details, please refer to the earlier section.

CLASSIFICATION_FPS: This parameter allows you to adjust the number of frames sent for classification. Lowering the FPS can improve performance by reducing the number of classifications required. However, setting the FPS too low may result in missing fast-moving objects and decreased tracking accuracy.

MAX_NUMBER_OF_PREDICTIONS: This feature allows you to set a limit on the number of predictions performed, enabling you to shorten a video if desired. If no limit is needed, set this parameter to a high value.

Application-based .env Variables

MIN_DISTANCE: This parameter defines the minimum distance an object must travel before it is considered 'dynamic.' The distance is calculated as the sum of the distances between centroids for each classified frame. Note that this distance can be affected by shifting bounding boxes, especially for objects that are difficult to detect.

MIN_STATIC_DISTANCE: This parameter also defines the minimum distance an object must travel before being marked as dynamic. However, this distance is measured as the Euclidean distance between the centroids of the first and last bounding boxes. While this method is not sensitive to shifting bounding boxes, it may not detect dynamic objects that start and end in the same location.

MIN_DETECTIONS: This parameter specifies the minimum number of times an object must be detected before it is saved in the results. This feature is useful for filtering out unwanted sporadic background detections or faulty misclassifications.

ALLOWED_CLASSIFICATIONS: This parameter encompasses the classification model's configuration, specifying the classes to be included for detection and those to be excluded. The selection of classes is model-dependent. For the default pretrained YOLOv8 models, an 'id' and 'class' table is provided below.

ID Class ID Class ID Class ID Class ID Class
0 person 16 dog 32 sports ball 48 sandwich 64 mouse
1 bicycle 17 horse 33 kite 49 orange 65 remote
2 car 18 sheep 34 baseball bat 50 broccoli 66 keyboard
3 motorcycle 19 cow 35 baseball glove 51 carrot 67 cell phone
4 airplane 20 elephant 36 skateboard 52 hot dog 68 microwave
5 bus 21 bear 37 surfboard 53 pizza 69 oven
6 train 22 zebra 38 tennis racket 54 donut 70 toaster
7 truck 23 giraffe 39 bottle 55 cake 71 sink
8 boat 24 backpack 40 wine glass 56 chair 72 refrigerator
9 traffic light 25 umbrella 41 cup 57 couch 73 book
10 fire hydrant 26 handbag 42 fork 58 potted plant 74 clock
11 stop sign 27 tie 43 knife 59 bed 75 vase
12 parking meter 28 suitcase 44 spoon 60 dining table 76 scissors
13 bench 29 frisbee 45 bowl 61 toilet 77 teddy bear
14 bird 30 skis 46 banana 62 tv 78 hair drier
15 cat 31 snowboard 47 apple 63 laptop 79 toothbrush

In most standard use-cases, the ALLOWED_CLASSIFICATIONS parameter would conform to the following format:

ALLOWED_CLASSIFICATIONS = "0, 1, 2, 3, 4, 5, 6, 7, 8, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 26, 28"

Objects's Main Color Calculation

The FIND_DOMINANT_COLORS environment variable enables the calculation of the main colors of detected objects. This feature uses the uugai-python-color-prediction dependency to determine the primary colors. More information about its functionality and available parameters can be found in the corresponding GitHub repository. The main colors are saved in BGR and HLS formats, and they are also mapped to a string using a slightly customized version of the HSL-79 color naming system. Additional details about this color naming system can be found here.

The choice between a classification or segmentation model significantly impacts the performance of the main color calculation. For classification models, the color calculation includes everything inside the bounding box. This object can be cropped using a feature in the uugai-python-color-prediction dependency. However, this method does not support off-centered objects or overlapping bounding boxes. Segmentation models, on the other hand, provide the necessary mask to isolate the object from the background and exclude any overlapping objects, with only a slight decrease in performance. Depending on the video quality, downsampling can be adjusted within the function call.

The COLOR_PREDICTION_INTERVAL environment variable allows you to adjust the interval for color prediction. Setting this variable to 1 means that the dominant colors are calculated for every frame, ensuring high accuracy. Higher integer values reduce the frequency of dominant color calculations, which increases efficiency but may decrease accuracy.

Additionally, the MIN_CLUSTERS and MAX_CLUSTERS environment variables allow you to adjust the number of dominant colors to be found. For example, setting MIN_CLUSTERS to 1 and MAX_CLUSTERS to 8 enables the function to find the optimal number of clusters using the inertias of KMeans clustering, along with an elbow point finder to identify the best fit. This method is the most accurate but requires calculating many clusters for each object.

Alternatively, setting MIN_CLUSTERS and MAX_CLUSTERS to the same value dictates the exact number of dominant colors to calculate. For example, setting both to 3 will find exactly 3 main clusters. This approach is more performant but may be less accurate if the actual number of dominant colors differs from the specified value.

Several Other Features

Multiple additional features are available, each tailored to specific use-case scenarios. These encompass various verbose and saving functionalities.

Plotting

In accordance with various use-case scenarios, the annotated frame can be visually represented through plotting. This functionality can be modified by adjusting the environment variable PLOT. In situations where visual representation is unnecessary, such as when solely focusing on retrieving data without graphical output, this variable can be set to false as follows: PLOT = "False".

The annotated frame displays the bounding boxes of detected objects, along with their primary colors when color detection is activated. These bounding boxes are color-coded: green for dynamic objects and red for static ones. Additionally, their trajectories are plotted, accompanied by their class and confidence score.

  • Save Annotated Video

Another option is to save the annotated video. This can be achieved by configuring the environment variable SAVE_VIDEO to "True". Additionally, the save path for the video can be specified using OUTPUT_MEDIA_SAVEPATH = "path/to/your/output_video.mp4".

  • Bounding Box Static Trajectory Frame

An alternative option is to generate an image containing all bounding boxes and trajectories. This process involves utilizing the initial frame of the video to draw the first bounding box of the object and its respective trajectory. However, this feature is contingent upon the minimum detection criteria specified by the MIN_DETECTIONS parameter. Additionally, it provides insights into whether an object remained static or dynamic throughout the video duration.

The generation of this image can be enabled by setting the environment variable CREATE_BBOX_FRAME to "True". Moreover, you can specify whether to save the bounding box frame and its save path using SAVE_BBOX_FRAME = "True" and BBOX_FRAME_SAVEPATH = "path/to/your/output_bbox.jpg", respectively.

Return JSON-Object Creation

This parameter is typically left enabled; however, there is an option to refrain from creating a JSON data object containing all the classification data. If this repository is solely used for performing visual inspection without any subsequent post-processing, the creation of the JSON-object can be disabled using CREATE_RETURN_JSON = "False". Furthermore, you can customize the save path and decide whether to save this object by adjusting SAVE_RETURN_JSON = "True" and RETURN_JSON_SAVEPATH = "path/to/your/json.json". The JSON-object is structered as follows:

{
"operation": "classify",
"data": {
    "objectCount": int,
    "properties": [str],
    "details": [
        {
        "id": int,
        "classified": str,
        "distance": float,
        "staticDistance": float,
        "isStatic": bool,
        "frameWidth": int,
        "frameHeight": int,
        "frame": int,
        "frames": [int]
        "occurence": int,
        "traject": [[float]],
        "trajectCentroids": [[float]],
        "colorsBGR": [[[int]]],
        "colorsHLS": [[[int]]],
        "colorsStr": [[str]],
        "colorStr": [[str, int]
        "valid": true,
        "w": 0,
        "x": 0,
        "y": 0
        }

        ...
]}}

Time Verbose and Logging

The final two environment variables influence the verbosity options and are split into two categories: TIME_VERBOSE and LOGGING.

The LOGGING environment variable controls the output messages depending on the application's usage. The output can be one of the following:

  • If no message is received from RabbitMQ:
1) Receiving message from RabbitMQ
No message received, waiting for 3 seconds
...
  • If messages are being processed:
1) Receiving message from RabbitMQ
2) Retrieving media from Kerberos Vault
3) Using device: cpu
4) Opening video file: data/input/in_video.mp4
5) Classifying frames
6) Annotating bbox frame
7) Creating ReturnJSON object
         - 14 objects where detected. Of which 11 objects where detected more than 5 times.
         ... (optional time verbose output)
8) Releasing video writer and closing video capture

The TIME_VERBOSE environment variable includes extra time-related verbosity options, adding the following lines to the output:

- Classification took: 20.4 seconds, @ 5 fps.
        - 2.05s for preprocessing and initialisation
        - 18.35s for processing of which:
                - 12.48s for class prediction
                - 1.31s for color prediction
                - 4.56s for other processing
        - 0.0s for postprocessing
- Original video: 29.7 seconds, @ 25.0 fps @ 1280x720. File size of 1.2 MB

License

Contributors

This project exists thanks to all the people who contribute.