open-mmlab · makecent · Jul 20, 2023 · Jul 20, 2023 · Aug 8, 2023 · Aug 10, 2023
diff --git a/configs/localization/bsn/metafile.yml b/configs/localization/bsn/metafile.yml
@@ -8,16 +8,18 @@ Collections:
 Models:
   - Name: bsn_400x100_1xb16_20e_activitynet_feature (cuhk_mean_100)
     Config:
-      - configs/localization/bsn/bsn_tem_1xb16-400x100-20e_activitynet-feature.py
-      - configs/localization/bsn/bsn_pgm_400x100_activitynet-feature.py
-      - configs/localization/bsn/bsn_pem_1xb16-400x100-20e_activitynet-feature.py
+      configs/localization/bsn/bsn_pem_1xb16-400x100-20e_activitynet-feature.py
     In Collection: BSN
     Metadata:
       Batch Size: 16
       Epochs: 20
       Training Data: ActivityNet v1.3
       Training Resources: 1 GPU
       feature: cuhk_mean_100
+      configs:
+        - configs/localization/bsn/bsn_tem_1xb16-400x100-20e_activitynet-feature.py
+        - configs/localization/bsn/bsn_pgm_400x100_activitynet-feature.py
+        - configs/localization/bsn/bsn_pem_1xb16-400x100-20e_activitynet-feature.py
     Modality: RGB
     Results:
       - Dataset: ActivityNet v1.3

diff --git a/docs/en/user_guides/finetune.md b/docs/en/user_guides/finetune.md
@@ -45,7 +45,7 @@ model = dict(
 MMAction2 supports UCF101, Kinetics-400, Moments in Time, Multi-Moments in Time, THUMOS14,
 Something-Something V1&V2, ActivityNet Dataset.
 The users may need to adapt one of the above datasets to fit their special datasets.
-You could refer to [Prepare Dataset](prepare_dataset.md) and [Customize Datast](../advanced_guides/customize_dataset.md) for more details.
+You could refer to [Prepare Dataset](prepare_dataset.md) and [Customize Dataset](../advanced_guides/customize_dataset.md) for more details.
 In our case, UCF101 is already supported by various dataset types, like `VideoDataset`,
 so we change the config as follows.
 

diff --git a/projects/__init__.py b/projects/__init__.py
diff --git a/projects/basic_tad/README.md b/projects/basic_tad/README.md
@@ -0,0 +1,133 @@
+# BasicTAD
+
+This project implement the BasicTAD model in MMAction2. Please refer to the [official repo](https://github.com/MCG-NJU/BasicTAD) and [paper](https://arxiv.org/abs/2205.02717) for details.
+
+
+## Usage
+
+### Setup Environment
+
+Please refer to [Get Started](https://mmaction2.readthedocs.io/en/latest/get_started/installation.html) to install MMAction2 and MMDetection.
+
+At first, add the current folder to `PYTHONPATH`, so that Python can find your code. Run command in the current directory to add it.
+
+> Please run it every time after you opened a new shell.
+
+```shell
+export PYTHONPATH=`pwd`:$PYTHONPATH
+```
+
+### Data Preparation
+
+Prepare the THUMOS14 dataset according to the [instruction](https://github.com/open-mmlab/mmaction2/blob/main/tools/data/thumos14/README.md).
+
+### Training commands
+
+**To train with single GPU:**
+
+```bash
+mim train mmaction configs/basicTAD_slowonly_96x10_1200e_thumos14_rgb.py
+```
+
+**To train with multiple GPUs:**
+
+```bash
+mim train mmaction configs/basicTAD_slowonly_96x10_1200e_thumos14_rgb.py --launcher pytorch --gpus 8
+```
+
+**To train with multiple GPUs by slurm:**
+
+```bash
+mim train mmaction configs/basicTAD_slowonly_96x10_1200e_thumos14_rgb.py --launcher slurm \
+    --gpus 8 --gpus-per-node 8 --partition $PARTITION
+```
+
+### Testing commands
+
+**To test with single GPU:**
+
+```bash
+mim test mmaction configs/basicTAD_slowonly_96x10_1200e_thumos14_rgb.py --checkpoint $CHECKPOINT
+```
+
+**To test with multiple GPUs:**
+
+```bash
+mim test configs/basicTAD_slowonly_96x10_1200e_thumos14_rgb.py --checkpoint $CHECKPOINT --launcher pytorch --gpus 8
+```
+
+**To test with multiple GPUs by slurm:**
+
+```bash
+mim test mmaction configs/basicTAD_slowonly_96x10_1200e_thumos14_rgb.py --checkpoint $CHECKPOINT --launcher slurm \
+    --gpus 8 --gpus-per-node 8 --partition $PARTITION
+```
+
+> Replace the $CHECKPOINT with the trained model path, e.g., work_dirs/basicTAD_slowonly_96x10_1200e_thumos14_rgb/latest.pth.
+
+## Results
+### THMOS14
+| frame sampling strategy | resolution | gpus | backbone | pretrain | [email protected] | avg. mAP |  testing protocol  |                    config                     |                                   ckpt |                            log |
+| :---------------------: | :--------: | :--: | :------: | :------: |:-------:|:--------:| :----------------: | :-------------------------------------------: | -------------------------------------: | -----------------------------: |
+|          1x96x10          |  112x112   |  2   | SlowOnly | Kinetics |  50.4   |   47.9   | 1 clips x 1 crop | [config](./configs/basicTAD_slowonly_96x10_1200e_thumos14_rgb.py) | todo | todo |
+
+> Due to the limit of the computing resources, we only train the model in a simple setting (in terms of spatial-temporal resolution, testing augmentation, etc.). To reproduce the results in the paper, please refer to [setting](https://github.com/MCG-NJU/BasicTAD/blob/main/configs/trainval/basictad/thumos14/basictad_slowonly_e700_thumos14_rgb_192win_anchor_based.py) used the official repo.
+
+> In fact, the main idea of [BasicTAD](https://arxiv.org/abs/2205.02717) lies on its modular design rather than innovating some sophisticated architecture/modules.
+
+> Currently we only support anchor-based basicTAD model on THUMOS14. The anchor-free version is in the plan.
+
+> `avg. mAP` refer to the averaged mAP over IoU=(0.3, 0.4, 0.5, 0.6, 0.7).
+## Citation
+
+<!-- Replace to the citation of the paper your project refers to. -->
+
+```bibtex
+@article{yang2023basictad,
+  title={Basictad: an astounding rgb-only baseline for temporal action detection},
+  author={Yang, Min and Chen, Guo and Zheng, Yin-Dong and Lu, Tong and Wang, Limin},
+  journal={Computer Vision and Image Understanding},
+  volume={232},
+  pages={103692},
+  year={2023},
+  publisher={Elsevier}
+}
+```
+
+## Checklist
+
+Here is a checklist of this project's progress, and you can ignore this part if you don't plan to contribute to MMAction2 projects.
+
+- [x] Milestone 1: PR-ready, and acceptable to be one of the `projects/`.
+
+  - [x] Finish the code
+
+    <!-- The code's design shall follow existing interfaces and convention. For example, each model component should be registered into `mmaction.registry.MODELS` and configurable via a config file. -->
+
+  - [x] Basic docstrings & proper citation
+
+    <!-- Each major class should contains a docstring, describing its functionality and arguments. If your code is copied or modified from other open-source projects, don't forget to cite the source project in docstring and make sure your behavior is not against its license. Typically, we do not accept any code snippet under GPL license. [A Short Guide to Open Source Licenses](https://medium.com/nationwide-technology/a-short-guide-to-open-source-licenses-cf5b1c329edd) -->
+
+  - [ ] Converted checkpoint and results (Only for reproduction)
+
+    <!-- If you are reproducing the result from a paper, make sure the model in the project can match that results. Also please provide checkpoint links or a checkpoint conversion script for others to get the pre-trained model. -->
+
+- [x] Milestone 2: Indicates a successful model implementation.
+
+  - [x] Training results
+
+    <!-- If you are reproducing the result from a paper, train your model from scratch and verified that the final result can match the original result. Usually, ±0.1% is acceptable for the action recognition task on Kinetics400. -->
+
+- [ ] Milestone 3: Good to be a part of our core package!
+
+  - [ ] Unit tests
+
+    <!-- Unit tests for the major module are required. [Example](https://github.com/open-mmlab/mmaction2/blob/main/tests/models/backbones/test_resnet.py) -->
+
+  - [ ] Code style
+
+    <!-- Refactor your code according to reviewer's comment. -->
+
+  - [ ] `metafile.yml` and `README.md`
+
+    <!-- It will used for MMAction2 to acquire your models. [Example](https://github.com/open-mmlab/mmaction2/blob/main/configs/recognition/swin/metafile.yml). In particular, you may have to refactor this README into a standard one. [Example](https://github.com/open-mmlab/mmaction2/blob/main/configs/recognition/swin/README.md) -->
diff --git a/projects/basic_tad/configs/basicTAD_slowonly_192x5_1200e_thumos14_rgb.py b/projects/basic_tad/configs/basicTAD_slowonly_192x5_1200e_thumos14_rgb.py
@@ -0,0 +1,68 @@
+_base_ = ['./basicTAD_slowonly_96x10_1200e_thumos14_rgb.py']
+# model settings
+model = dict(
+    neck=[
+        dict(type='MaxPool3d', kernel_size=(2, 1, 1), stride=(2, 1, 1)),
+        dict(type='VDM',
+             in_channels=2048,
+             out_channels=512,
+             conv_cfg=dict(type='Conv3d'),
+             norm_cfg=dict(type='SyncBN'),
+             kernel_sizes=(3, 1, 1),
+             strides=(2, 1, 1),
+             paddings=(1, 0, 0),
+             stage_layers=(1, 1, 1, 1),
+             out_indices=(0, 1, 2, 3, 4),
+             out_pooling=True),
+        dict(type='mmdet.FPN',
+             in_channels=[2048, 512, 512, 512, 512],
+             out_channels=256,
+             num_outs=5,
+             conv_cfg=dict(type='Conv1d'),
+             norm_cfg=dict(type='SyncBN'))],
+    bbox_head=dict(anchor_generator=dict(strides=[2, 4, 8, 16, 32])))
+
+clip_len = 192
+frame_interval = 5
+img_shape = (112, 112)
+img_shape_test = (128, 128)
+
+train_pipeline = [
+    dict(type='Time2Frame'),
+    dict(type='TemporalRandomCrop',
+         clip_len=clip_len,
+         frame_interval=frame_interval,
+         iof_th=0.75),
+    dict(type='RawFrameDecode'),
+    dict(type='Resize', scale=(128, -1), keep_ratio=True),  # scale images' short-side to 128, keep aspect ratio
+    dict(type='SpatialRandomCrop', crop_size=img_shape),
+    dict(type='Flip', flip_ratio=0.5),
+    dict(type='PhotoMetricDistortion',
+         brightness_delta=32,
+         contrast_range=(0.5, 1.5),
+         saturation_range=(0.5, 1.5),
+         hue_delta=18,
+         p=0.5),
+    dict(type='Rotate',
+         limit=(-45, 45),
+         border_mode='reflect_101',
+         p=0.5),
+    dict(type='Pad', size=(clip_len, *img_shape)),
+    dict(type='FormatShape', input_format='NCTHW'),
+    dict(type='PackTadInputs',
+         meta_keys=('img_id', 'img_shape', 'pad_shape', 'scale_factor',))
+]
+
+val_pipeline = [
+    dict(type='RawFrameDecode'),
+    dict(type='Resize', scale=(128, -1), keep_ratio=True),
+    dict(type='SpatialCenterCrop', crop_size=img_shape_test),
+    dict(type='Pad', size=(clip_len, *img_shape_test)),
+    dict(type='FormatShape', input_format='NCTHW'),
+    dict(type='PackTadInputs',
+         meta_keys=('img_id', 'img_shape', 'scale_factor', 'offset_sec'))
+]
+
+train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
+val_dataloader = dict(dataset=dict(clip_len=clip_len, frame_interval=frame_interval, pipeline=val_pipeline))
+test_dataloader = val_dataloader