Object Classification Utilising yoloV8

The repository includes all the scripts necessary for object classification, utilizing yoloV8 for object detection, segmentation, and classification. It integrates with RabbitMQ to receive messages and retrieves media from the Kerberos Vault based on the message details. Upon receiving the video, objects are detected, segmented, and classified, while the primary colors of the objects are simultaneously calculated. The frame is annotated with this information and can be saved locally. Additionally, the results are stored in a JSON object. More features are available and will be detailed in a subsequent paragraph.

Prerequisites

To correctly install the necessary dependencies, run the following command. It is recommended to use a virtual environment for this process:

python -m venv venv
source ./venv/bin/activate
pip install -r requirements.txt

Features & Corresponding .env Variables

This repository offers numerous options and additional features to optimally configure it to your needs. Below is a list of all available features and their corresponding .env variable names. These variables can be modified in the included .env file.

Utilized Model

YOLOv8 offers a range of models catering to various accuracy-performance trade-offs. Among these, yolov8n.pt is the most performance-focused, while yolov8x.pt emphasizes accuracy. Intermediate models such as yolov8s, yolov8m, and yolov8l progressively balance performance and accuracy. The aforementioned models are classification models only. Additionally, the object classification supports segmentation models, which have similar names but include '-seg' (e.g., yolov8n-seg.pt). Segmentation models provide the advantage of removing the background and overlapping objects for main color calculation, which will be detailed further in the color prediction feature description.

The utilised model can be altered at MODEL_NAME .env variable.

Queue Message Reader

The object classification system will automatically check for incoming messages and process them. If there is a queue build-up, it will continue to process media until the queue is empty. This functionality leverages the uugai-python-dynamic-queue dependency. More information can be found in the corresponding GitHub repository. Initialization is straightforward, as demonstrated in the code snippet below, which also lists the corresponding .env variables.

# Initialize a message broker using the python_queue_reader package
rabbitmq = RabbitMQ(
    queue_name = var.QUEUE_NAME, 
    target_queue_name = var.TARGET_QUEUE_NAME, 
    exchange = var.QUEUE_EXCHANGE, 
    host = var.QUEUE_HOST, 
    username = var.QUEUE_USERNAME,
    password = var.QUEUE_PASSWORD)

# Receive a message from the queue
message = rabbitmq.receive_message()

Kerberos Vault Integration

The incoming messages provide the necessary information to retrieve media from the Kerberos Vault. The received media can then be easily written to a video file, allowing it to be used as input for the model. This functionality leverages the uugai-python-kerberos-vault dependency. More information can be found in the corresponding GitHub repository, and additional details about Kerberos Vault itself can be found here. Initialization is straightforward, as demonstrated in the code snippet below, which also lists the corresponding .env variables.

# Initialize Kerberos Vault
kerberos_vault = KerberosVault(
    storage_uri = var.STORAGE_URI,
    storage_access_key = var.STORAGE_ACCESS_KEY,
    storage_secret_key = var.STORAGE_SECRET_KEY)

# Retrieve media from the Kerberos Vault, in this case a video-file
resp = kerberos_vault.retrieve_media(
        message = message, 
        media_type = 'video', 
        media_savepath = var.MEDIA_SAVEPATH)

Object Classification

The primary focus of this repository is object classification, achieved using YOLO's pretrained classification or segmentation models as described in the 'utilized model' subsection. Based on your preferences, there are configurable parameters that modify the classification process. These parameters are divided into performance-based and application-based categories. The available parameters are listed below:

Performance-based .env Variables

MODEL_NAME: As discussed in the 'utilized model' section, this parameter allows you to choose a model that balances performance and accuracy according to your needs. For more details, please refer to the earlier section.

CLASSIFICATION_FPS: This parameter allows you to adjust the number of frames sent for classification. Lowering the FPS can improve performance by reducing the number of classifications required. However, setting the FPS too low may result in missing fast-moving objects and decreased tracking accuracy.

MAX_NUMBER_OF_PREDICTIONS: This feature allows you to set a limit on the number of predictions performed, enabling you to shorten a video if desired. If no limit is needed, set this parameter to a high value.

Application-based .env Variables

MIN_DISTANCE: This parameter defines the minimum distance an object must travel before it is considered 'dynamic.' The distance is calculated as the sum of the distances between centroids for each classified frame. Note that this distance can be affected by shifting bounding boxes, especially for objects that are difficult to detect.

MIN_STATIC_DISTANCE: This parameter also defines the minimum distance an object must travel before being marked as dynamic. However, this distance is measured as the Euclidean distance between the centroids of the first and last bounding boxes. While this method is not sensitive to shifting bounding boxes, it may not detect dynamic objects that start and end in the same location.

MIN_DETECTIONS: This parameter specifies the minimum number of times an object must be detected before it is saved in the results. This feature is useful for filtering out unwanted sporadic background detections or faulty misclassifications.

ALLOWED_CLASSIFICATIONS: This parameter encompasses the classification model's configuration, specifying the classes to be included for detection and those to be excluded. The selection of classes is model-dependent. For the default pretrained YOLOv8 models, an 'id' and 'class' table is provided below.

ID	Class	ID	Class	ID	Class	ID	Class	ID	Class
0	person	16	dog	32	sports ball	48	sandwich	64	mouse
1	bicycle	17	horse	33	kite	49	orange	65	remote
2	car	18	sheep	34	baseball bat	50	broccoli	66	keyboard
3	motorcycle	19	cow	35	baseball glove	51	carrot	67	cell phone
4	airplane	20	elephant	36	skateboard	52	hot dog	68	microwave
5	bus	21	bear	37	surfboard	53	pizza	69	oven
6	train	22	zebra	38	tennis racket	54	donut	70	toaster
7	truck	23	giraffe	39	bottle	55	cake	71	sink
8	boat	24	backpack	40	wine glass	56	chair	72	refrigerator
9	traffic light	25	umbrella	41	cup	57	couch	73	book
10	fire hydrant	26	handbag	42	fork	58	potted plant	74	clock
11	stop sign	27	tie	43	knife	59	bed	75	vase
12	parking meter	28	suitcase	44	spoon	60	dining table	76	scissors
13	bench	29	frisbee	45	bowl	61	toilet	77	teddy bear
14	bird	30	skis	46	banana	62	tv	78	hair drier
15	cat	31	snowboard	47	apple	63	laptop	79	toothbrush

In most standard use-cases, the ALLOWED_CLASSIFICATIONS parameter would conform to the following format:

ALLOWED_CLASSIFICATIONS = "0, 1, 2, 3, 4, 5, 6, 7, 8, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 26, 28"

Objects's Main Color Calculation

The FIND_DOMINANT_COLORS environment variable enables the calculation of the main colors of detected objects. This feature uses the uugai-python-color-prediction dependency to determine the primary colors. More information about its functionality and available parameters can be found in the corresponding GitHub repository. The main colors are saved in BGR and HLS formats, and they are also mapped to a string using a slightly customized version of the HSL-79 color naming system. Additional details about this color naming system can be found here.

The choice between a classification or segmentation model significantly impacts the performance of the main color calculation. For classification models, the color calculation includes everything inside the bounding box. This object can be cropped using a feature in the uugai-python-color-prediction dependency. However, this method does not support off-centered objects or overlapping bounding boxes. Segmentation models, on the other hand, provide the necessary mask to isolate the object from the background and exclude any overlapping objects, with only a slight decrease in performance. Depending on the video quality, downsampling can be adjusted within the function call.

The COLOR_PREDICTION_INTERVAL environment variable allows you to adjust the interval for color prediction. Setting this variable to 1 means that the dominant colors are calculated for every frame, ensuring high accuracy. Higher integer values reduce the frequency of dominant color calculations, which increases efficiency but may decrease accuracy.

Additionally, the MIN_CLUSTERS and MAX_CLUSTERS environment variables allow you to adjust the number of dominant colors to be found. For example, setting MIN_CLUSTERS to 1 and MAX_CLUSTERS to 8 enables the function to find the optimal number of clusters using the inertias of KMeans clustering, along with an elbow point finder to identify the best fit. This method is the most accurate but requires calculating many clusters for each object.

Alternatively, setting MIN_CLUSTERS and MAX_CLUSTERS to the same value dictates the exact number of dominant colors to calculate. For example, setting both to 3 will find exactly 3 main clusters. This approach is more performant but may be less accurate if the actual number of dominant colors differs from the specified value.

Several Other Features

Multiple additional features are available, each tailored to specific use-case scenarios. These encompass various verbose and saving functionalities.

Plotting

In accordance with various use-case scenarios, the annotated frame can be visually represented through plotting. This functionality can be modified by adjusting the environment variable PLOT. In situations where visual representation is unnecessary, such as when solely focusing on retrieving data without graphical output, this variable can be set to false as follows: PLOT = "False".

The annotated frame displays the bounding boxes of detected objects, along with their primary colors when color detection is activated. These bounding boxes are color-coded: green for dynamic objects and red for static ones. Additionally, their trajectories are plotted, accompanied by their class and confidence score.

Save Annotated Video

Another option is to save the annotated video. This can be achieved by configuring the environment variable SAVE_VIDEO to "True". Additionally, the save path for the video can be specified using OUTPUT_MEDIA_SAVEPATH = "path/to/your/output_video.mp4".

Bounding Box Static Trajectory Frame

An alternative option is to generate an image containing all bounding boxes and trajectories. This process involves utilizing the initial frame of the video to draw the first bounding box of the object and its respective trajectory. However, this feature is contingent upon the minimum detection criteria specified by the MIN_DETECTIONS parameter. Additionally, it provides insights into whether an object remained static or dynamic throughout the video duration.

The generation of this image can be enabled by setting the environment variable CREATE_BBOX_FRAME to "True". Moreover, you can specify whether to save the bounding box frame and its save path using SAVE_BBOX_FRAME = "True" and BBOX_FRAME_SAVEPATH = "path/to/your/output_bbox.jpg", respectively.

Return JSON-Object Creation

This parameter is typically left enabled; however, there is an option to refrain from creating a JSON data object containing all the classification data. If this repository is solely used for performing visual inspection without any subsequent post-processing, the creation of the JSON-object can be disabled using CREATE_RETURN_JSON = "False". Furthermore, you can customize the save path and decide whether to save this object by adjusting SAVE_RETURN_JSON = "True" and RETURN_JSON_SAVEPATH = "path/to/your/json.json". The JSON-object is structered as follows:

{
"operation": "classify",
"data": {
    "objectCount": int,
    "properties": [str],
    "details": [
        {
        "id": int,
        "classified": str,
        "distance": float,
        "staticDistance": float,
        "isStatic": bool,
        "frameWidth": int,
        "frameHeight": int,
        "frame": int,
        "frames": [int]
        "occurence": int,
        "traject": [[float]],
        "trajectCentroids": [[float]],
        "colorsBGR": [[[int]]],
        "colorsHLS": [[[int]]],
        "colorsStr": [[str]],
        "colorStr": [[str, int]
        "valid": true,
        "w": 0,
        "x": 0,
        "y": 0
        }

        ...
]}}

Time Verbose and Logging

The final two environment variables influence the verbosity options and are split into two categories: TIME_VERBOSE and LOGGING.

The LOGGING environment variable controls the output messages depending on the application's usage. The output can be one of the following:

If no message is received from RabbitMQ:

1) Receiving message from RabbitMQ
No message received, waiting for 3 seconds
...

If messages are being processed:

1) Receiving message from RabbitMQ
2) Retrieving media from Kerberos Vault
3) Using device: cpu
4) Opening video file: data/input/in_video.mp4
5) Classifying frames
6) Annotating bbox frame
7) Creating ReturnJSON object
         - 14 objects where detected. Of which 11 objects where detected more than 5 times.
         ... (optional time verbose output)
8) Releasing video writer and closing video capture

The TIME_VERBOSE environment variable includes extra time-related verbosity options, adding the following lines to the output:

- Classification took: 20.4 seconds, @ 5 fps.
        - 2.05s for preprocessing and initialisation
        - 18.35s for processing of which:
                - 12.48s for class prediction
                - 1.31s for color prediction
                - 4.56s for other processing
        - 0.0s for postprocessing
- Original video: 29.7 seconds, @ 25.0 fps @ 1280x720. File size of 1.2 MB

License

Contributors

This project exists thanks to all the people who contribute.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
.vscode		.vscode
assets		assets
utils		utils
.DS_Store		.DS_Store
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
k8s-deployment.yaml		k8s-deployment.yaml
object_classification_yolov8.py		object_classification_yolov8.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Object Classification Utilising yoloV8

Prerequisites

Features & Corresponding .env Variables

Utilized Model

Queue Message Reader

Kerberos Vault Integration

Object Classification

Performance-based .env Variables

Application-based .env Variables

Objects's Main Color Calculation

Several Other Features

Plotting

Save Annotated Video

Bounding Box Static Trajectory Frame

Return JSON-Object Creation

Time Verbose and Logging

License

Contributors

About

Releases 1

Packages

Contributors 2

Languages

uug-ai/object-classification-yolov8

Folders and files

Latest commit

History

Repository files navigation

Object Classification Utilising yoloV8

Prerequisites

Features & Corresponding .env Variables

Utilized Model

Queue Message Reader

Kerberos Vault Integration

Object Classification

Performance-based .env Variables

Application-based .env Variables

Objects's Main Color Calculation

Several Other Features

Plotting

Save Annotated Video

Bounding Box Static Trajectory Frame

Return JSON-Object Creation

Time Verbose and Logging

License

Contributors

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages