An Efficient Instance Segmentation Framework Using Segmentation Foundation Models with Oriented Bounding Box Prompts
Instance segmentation in unmanned aerial vehicle (UAV) measurement is a long-standing challenge. Since horizontal bounding boxes (HBBs) introduce many interference objects, oriented bounding boxes (OBBs) are usually used for instance identification. However, based on ``segmentation within bounding box'' paradigm, current instance segmentation methods using OBBs are overly dependent on bounding box detection performance. To tackle this, this paper proposes OBSeg, an efficient instance segmentation framework using OBBs. OBSeg is based on box prompt-based segmentation foundation models (BSMs), e.g., Segment Anything Model. Specifically, OBSeg first detects OBBs to distinguish instances and provide coarse localization information. Then, it predicts OBB prompt-related masks for fine segmentation. Since OBBs only serve as prompts, OBSeg alleviates the over-dependence on bounding box detection performance of current instance segmentation methods using OBBs. In addition, to enable BSMs to handle OBB prompts, we propose a novel OBB prompt encoder. To make OBSeg more lightweight and further improve the performance of lightweight distilled BSMs, a Gaussian smoothing-based knowledge distillation method is introduced. Experiments demonstrate that OBSeg outperforms current instance segmentation methods on multiple public datasets.
For instance segmentation in UAV measurement, (a): HBB introduces many interference objects. (b): The ``segmentation within bounding box'' paradigm limits the segmentation to be performed mainly within the detected OBB, making the segmentation performance overly dependent on the OBB detection performance. Once the OBB detection is inaccurate, the mask segmentation will also be affected. (c) The proposed OBSeg only uses OBB as a prompt to guide object segmentation, so the segmentation result is less dependent on OBB detection performance. Although the OBB detection is inaccurate, the mask can be segmented accurately.
- OBSeg
Architecture of the proposed OBSeg. It is mainly composed of four parts: an OBB detection module, an image encoder, an OBB prompt encoder, and a mask decoder. OBSeg first detects OBBs to distinguish instances, identify classes, and provide coarse localization information. Then, the mask decoder utilizes the image embeddings generated by the image encoder and the OBB prompt embeddings generated by the OBB prompt encoder to generate segmentation masks. In addition, Gaussian smoothing-based knowledge distillation is performed on the OBB prompt encoder and the mask decoder to make OBSeg more lightweight.
- OBB Prompt Encoder
Architecture of the proposed OBB prompt encoder. The input is an OBB (
- Knowledge Distillation on the OBB Prompt Encoder and Mask Decoder
The process of knowledge distillation for the OBB prompt encoder and mask decoder. ``TE``, ``BE`` and ``OE`` represent encoded feature embeddings with respect to the top-left point, bottom-right point and orientation of an OBB, respectively. ``GS`` stands for Gaussian smoothing.
pip install lightning
pip install pytorch
pip install opencv-python pycocotools matplotlib onnxruntime onnx
pip install -U openmim
mim install mmcv-full
mim install mmdet\<3.0.0
pip install mmrotate
# Train OBB detection module (e.g., Oriented R-CNN with ResNet-18 as the backbone)
python OBB_Detection_Module/tools/train.py
# Train OBB prompt-based segmentation module (``OSM'' for short, we use it to train the teacher model)
python OBB_Prompt_based_Segmentation_Module/OSM/train.py
# Train OBB prompt-based segmentation module with knowledge distillation (``OSM_KD'' for short, we use it to train the student model)
python OBB_Prompt_based_Segmentation_Module/OSM_KD/train.py
# Test oriented bounding box detection module (e.g., Oriented R-CNN with ResNet-18 as the backbone)
python OBB_Detection_Module/tools/test.py
# Test OBB prompt-based segmentation module (``OSM'' for short, we use it to test the teacher model)
python OBB_Prompt_based_Segmentation_Module/OSM/inference.py
# Test OBB prompt-based segmentation module with knowledge distillation (``OSM_KD'' for short, we use it to test the student model)
python OBB_Prompt_based_Segmentation_Module/OSM_KD/inference.py