open-mmlab · yuyi1005 · Dec 6, 2023 · Dec 6, 2023 · Jan 3, 2024 · Jan 3, 2024
diff --git a/README.md b/README.md
@@ -173,6 +173,7 @@ A summary can be found in the [Model Zoo](docs/en/model_zoo.md) page.
 - [x] [PSC](configs/psc/README.md) (CVPR'2023)
 - [x] [RTMDet](configs/rotated_rtmdet/README.md) (arXiv)
 - [x] [H2RBox-v2](configs/h2rbox_v2/README.md) (NeurIPS'2023)
+- [x] [Point2RBox](configs/point2rbox/README.md) (CVPR'2024)
 
 </details>
 

diff --git a/configs/h2rbox_v2/README.md b/configs/h2rbox_v2/README.md
@@ -44,9 +44,9 @@ HRSC
 
 ```
 @inproceedings{yu2023h2rboxv2,
-      title={H2RBox-v2: Incorporating Symmetry for Boosting Horizontal Box Supervised Oriented Object Detection}, 
-      author={Yi Yu and Xue Yang and Qingyun Li and Yue Zhou and and Feipeng Da and Junchi Yan},
-      year={2023},
-      booktitle={Advances in Neural Information Processing Systems}
+title={H2RBox-v2: Incorporating Symmetry for Boosting Horizontal Box Supervised Oriented Object Detection},
+author={Yi Yu and Xue Yang and Qingyun Li and Yue Zhou and and Feipeng Da and Junchi Yan},
+year={2023},
+booktitle={Advances in Neural Information Processing Systems}
 }
 ```
diff --git a/configs/point2rbox/README.md b/configs/point2rbox/README.md
@@ -0,0 +1,48 @@
+# Point2RBox
+
+> [Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision](https://arxiv.org/pdf/2311.14758)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+<div align=center>
+<img src="https://raw.githubusercontent.com/zytx121/image-host/main/imgs/point2rbox.png" width="800"/>
+</div>
+
+With the rapidly increasing demand for oriented object detection (OOD), recent research involving weakly-supervised detectors for learning rotated box (RBox) from the horizontal box (HBox) has attracted more and more attention. In this paper, we explore a more challenging yet label-efficient setting, namely single point-supervised OOD, and present our approach called Point2RBox. Specifically, we propose to leverage two principles: 1) Synthetic pattern knowledge combination: By sampling around each labelled point on the image, we transfer the object feature to synthetic visual patterns with the known bounding box to provide the knowledge for box regression. 2) Transform self-supervision: With a transformed input image (e.g. scaled/rotated), the output RBoxes are trained to follow the same transformation so that the network can perceive the relative size/rotation between objects. The detector is further enhanced by a few devised techniques to cope with peripheral issues, e.g. the anchor/layer assignment as the size of the object is not available in our point supervision setting. To our best knowledge, Point2RBox is the first end-to-end solution for point-supervised OOD. In particular, our method uses a lightweight paradigm, yet it achieves a competitive performance among point-supervised alternatives, 41.05%/27.62%/80.01% on DOTA/DIOR/HRSC datasets.
+
+## Basic patterns
+
+Extract [basic_patterns.zip](https://github.com/open-mmlab/mmrotate/files/13816461/basic_patterns.zip) to data folder. The path can also be modified in config files.
+
+## Results and models
+
+DOTA1.0
+
+|         Backbone         | AP50  | lr schd | Mem (GB) | Inf Time (fps) | Aug | Batch Size |                       Configs                       |                                                                                                                    Download                                                                                                                    |
+| :----------------------: | :---: | :-----: | :------: | :------------: | :-: | :--------: | :-------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| ResNet50 (1024,1024,200) | 41.87 |   1x    |  16.12   |     111.7      |  -  |     2      | [point2rbox-yolof-dota](./point2rbox-yolof-dota.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/point2rbox/point2rbox-yolof-dota/point2rbox-yolof-dota-c94da82d.pth)   \| [log](https://download.openmmlab.com/mmrotate/v1.0/point2rbox/point2rbox-yolof-dota/point2rbox-yolof-dota.json) |
+
+DIOR
+
+|      Backbone      | AP50  | lr schd | Mem (GB) | Inf Time (fps) | Aug | Batch Size |                       Configs                       |                                                                                                                    Download                                                                                                                    |
+| :----------------: | :---: | :-----: | :------: | :------------: | :-: | :--------: | :-------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| ResNet50 (800,800) | 27.34 |   1x    |  10.38   |     127.3      |  -  |     2      | [point2rbox-yolof-dior](./point2rbox-yolof-dior.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/point2rbox/point2rbox-yolof-dior/point2rbox-yolof-dior-f4f724df.pth)   \| [log](https://download.openmmlab.com/mmrotate/v1.0/point2rbox/point2rbox-yolof-dior/point2rbox-yolof-dior.json) |
+
+HRSC
+
+|      Backbone      | AP50  | lr schd | Mem (GB) | Inf Time (fps) | Aug | Batch Size |                       Configs                       |                                                                                                                   Download                                                                                                                    |
+| :----------------: | :---: | :-----: | :------: | :------------: | :-: | :--------: | :-------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| ResNet50 (800,800) | 79.40 |   6x    |   9.60   |     136.9      |  -  |     2      | [point2rbox-yolof-hrsc](./point2rbox-yolof-hrsc.py) | [model](https://download.openmmlab.com/mmrotate/v1.0/point2rbox/point2rbox-yolof-hrsc/point2rbox-yolof-hrsc-9d096323.pth)  \| [log](https://download.openmmlab.com/mmrotate/v1.0/point2rbox/point2rbox-yolof-hrsc/point2rbox-yolof-hrsc.json) |
+
+## Citation
+
+```
+@inproceedings{yu2024point2rbox,
+title={Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision},
+author={Yi Yu and Xue Yang and Qingyun Li and Feipeng Da and Jifeng Dai and Yu Qiao and Junchi Yan},
+year={2024},
+booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition}
+}
+```
diff --git a/configs/point2rbox/metafile.yml b/configs/point2rbox/metafile.yml
@@ -0,0 +1,50 @@
+Collections:
+- Name: point2rbox
+  Metadata:
+    Training Data: DOTAv1.0
+    Training Techniques:
+      - AdamW
+    Training Resources: 1x GeForce RTX 4090
+    Architecture:
+      - ResNet
+  Paper:
+    URL: https://arxiv.org/pdf/2311.14758.pdf
+    Title: 'Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision'
+  README: configs/point2rbox/README.md
+
+Models:
+  - Name: point2rbox-yolof-dota
+    In Collection: point2rbox
+    Config: configs/point2rbox/point2rbox-yolof-dota.py
+    Metadata:
+      Training Data: DOTAv1.0
+    Results:
+      - Task: Oriented Object Detection
+        Dataset: DOTAv1.0
+        Metrics:
+          mAP: 41.87
+    Weights: https://download.openmmlab.com/mmrotate/v1.0/point2rbox/point2rbox-yolof-dota/point2rbox-yolof-dota-c94da82d.pth
+
+  - Name: point2rbox-yolof-dior
+    In Collection: point2rbox
+    Config: configs/point2rbox/point2rbox-yolof-dior.py
+    Metadata:
+      Training Data: DIOR
+    Results:
+      - Task: Oriented Object Detection
+        Dataset: DIOR
+        Metrics:
+          mAP: 27.34
+    Weights: https://download.openmmlab.com/mmrotate/v1.0/point2rbox/point2rbox-yolof-dior/point2rbox-yolof-dior-f4f724df.pth
+
+  - Name: point2rbox-yolof-hrsc
+    In Collection: point2rbox
+    Config: configs/point2rbox/point2rbox-yolof-hrsc.py
+    Metadata:
+      Training Data: HRSC
+    Results:
+      - Task: Oriented Object Detection
+        Dataset: HRSC
+        Metrics:
+          mAP: 79.40
+    Weights: https://download.openmmlab.com/mmrotate/v1.0/point2rbox/point2rbox-yolof-hrsc/point2rbox-yolof-hrsc-9d096323.pth
diff --git a/configs/point2rbox/point2rbox-yolof-dior.py b/configs/point2rbox/point2rbox-yolof-dior.py
@@ -0,0 +1,156 @@
+_base_ = [
+    '../_base_/datasets/dior.py', '../_base_/schedules/schedule_1x.py',
+    '../_base_/default_runtime.py'
+]
+model = dict(
+    type='Point2RBoxYOLOF',
+    crop_size=(800, 800),
+    prob_rot=0.95 * 0.7,
+    prob_flp=0.05 * 0.7,
+    sca_fact=1.0,
+    sca_range=(0.5, 1.5),
+    basic_pattern='data/basic_patterns/dior',
+    dense_cls=[],
+    use_setrc=False,
+    use_setsk=True,
+    data_preprocessor=dict(
+        type='mmdet.DetDataPreprocessor',
+        mean=[103.530, 116.280, 123.675],
+        std=[1.0, 1.0, 1.0],
+        bgr_to_rgb=False,
+        pad_size_divisor=32),
+    backbone=dict(
+        type='mmdet.ResNet',
+        depth=50,
+        num_stages=4,
+        strides=(1, 2, 2, 1),  # DC5
+        dilations=(1, 1, 1, 2),
+        out_indices=(3, ),
+        frozen_stages=1,
+        norm_cfg=dict(type='BN', requires_grad=False),
+        norm_eval=True,
+        style='caffe',
+        init_cfg=dict(
+            type='Pretrained',
+            checkpoint='open-mmlab://detectron/resnet50_caffe')),
+    neck=dict(
+        type='mmdet.DilatedEncoder',
+        in_channels=2048,
+        out_channels=512,
+        block_mid_channels=128,
+        num_residual_blocks=4,
+        block_dilations=[2, 4, 6, 8]),
+    bbox_head=dict(
+        type='Point2RBoxYOLOFHead',
+        num_classes=20,
+        in_channels=512,
+        reg_decoded_bbox=True,
+        num_cls_convs=4,
+        num_reg_convs=8,
+        use_objectness=False,
+        agnostic_cls=[2, 5, 9, 14, 15],
+        square_cls=[],
+        anchor_generator=dict(
+            type='mmdet.AnchorGenerator',
+            ratios=[1.0],
+            scales=[8, 8, 8, 8, 8, 8, 8],
+            strides=[16]),
+        bbox_coder=dict(
+            type='mmdet.DeltaXYWHBBoxCoder',
+            target_means=[.0, .0, .0, .0],
+            target_stds=[1., 1., 1., 1.],
+            add_ctr_clamp=True,
+            ctr_clamp=16),
+        loss_cls=dict(
+            type='mmdet.FocalLoss',
+            use_sigmoid=True,
+            gamma=2.0,
+            alpha=0.25,
+            loss_weight=1.0),
+        loss_bbox=dict(type='mmdet.GIoULoss', loss_weight=1.0),
+        loss_angle=dict(type='mmdet.L1Loss', loss_weight=0.3),
+        loss_scale_ss=dict(type='mmdet.GIoULoss', loss_weight=0.02)),
+    # training and testing settings
+    train_cfg=dict(
+        assigner=dict(
+            type='Point2RBoxAssigner',
+            pos_ignore_thr=0.15,
+            neg_ignore_thr=0.7,
+            match_times=4),
+        allowed_border=-1,
+        pos_weight=-1,
+        debug=False),
+    test_cfg=dict(
+        nms_pre=2000,
+        min_bbox_size=0,
+        score_thr=0.05,
+        nms=dict(type='nms_rotated', iou_threshold=0.1),
+        max_per_img=2000))
+
+# optimizer
+optim_wrapper = dict(
+    optimizer=dict(
+        _delete_=True,
+        type='AdamW',
+        lr=0.00005,
+        betas=(0.9, 0.999),
+        weight_decay=0.05),
+    paramwise_cfg=dict(
+        norm_decay_mult=0., custom_keys={'backbone': dict(lr_mult=1. / 3)}))
+
+train_pipeline = [
+    dict(type='mmdet.LoadImageFromFile', backend_args={{_base_.backend_args}}),
+    dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
+    dict(type='mmdet.FixShapeResize', width=800, height=800, keep_ratio=True),
+    dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
+    dict(type='RBox2Point'),
+    dict(
+        type='mmdet.RandomFlip',
+        prob=0.75,
+        direction=['horizontal', 'vertical', 'diagonal']),
+    dict(type='RandomRotate', prob=1, angle_range=180),
+    dict(type='mmdet.RandomShift', prob=0.5, max_shift_px=16),
+    dict(type='mmdet.PackDetInputs')
+]
+
+dataset_type = 'DIORDataset'
+data_root = 'data/dior/'
+train_dataloader = dict(
+    batch_size=4,
+    num_workers=4,
+    persistent_workers=True,
+    sampler=dict(type='DefaultSampler', shuffle=True),
+    batch_sampler=None,
+    dataset=dict(
+        type='ConcatDataset',
+        ignore_keys=['DATASET_TYPE'],
+        datasets=[
+            dict(
+                type=dataset_type,
+                data_root=data_root,
+                ann_file='ImageSets/Main/train.txt',
+                data_prefix=dict(img_path='JPEGImages-trainval'),
+                filter_cfg=dict(filter_empty_gt=True),
+                pipeline=train_pipeline),
+            dict(
+                type=dataset_type,
+                data_root=data_root,
+                ann_file='ImageSets/Main/val.txt',
+                data_prefix=dict(img_path='JPEGImages-trainval'),
+                filter_cfg=dict(filter_empty_gt=True),
+                pipeline=train_pipeline,
+                backend_args=_base_.backend_args)
+        ]))
+
+train_cfg = dict(type='EpochBasedTrainLoop', val_interval=12)
+
+val_dataloader = dict(batch_size=4, num_workers=4)
+
+val_evaluator = dict(type='DOTAMetric', metric='mAP', iou_thrs=[0.25, 0.5])
+
+# default_hooks = dict(logger=dict(type='LoggerHook', interval=30))
+
+# NOTE: `auto_scale_lr` is for automatically scaling LR,
+# USER SHOULD NOT CHANGE ITS VALUES.
+# base_batch_size = (8 GPUs) x (8 samples per GPU)
+auto_scale_lr = dict(base_batch_size=64)