[Refactor] Refactor auto-generated model zoo and dataset zoo(#2552)

open-mmlab · Jun 27, 2023 · e28ec68 · e28ec68
1 parent 227a49c
commit e28ec68
Show file tree

Hide file tree

Showing 52 changed files with 657 additions and 393 deletions.
diff --git a/.gitignore b/.gitignore
@@ -64,8 +64,16 @@ instance/
 # Scrapy stuff:
 .scrapy
 
-# Sphinx documentation
+# Auto generate documentation
 docs/*/_build/
+docs/*/model_zoo/
+docs/*/dataset_zoo/
+docs/*/_model_zoo.rst
+docs/*/modelzoo_statistics.md
+docs/*/datasetzoo_statistics.md
+docs/*/projectzoo.md
+docs/*/papers/
+docs/*/api/generated/
 
 # PyBuilder
 target/

diff --git a/.readthedocs.yml b/.readthedocs.yml
@@ -3,7 +3,7 @@ version: 2
 build:
   os: ubuntu-22.04
   tools:
-    python: "3.7"
+    python: "3.9"
 
 formats:
     - epub

diff --git a/configs/detection/videomae/README.md b/configs/detection/videomae/README.md
@@ -44,7 +44,7 @@ python tools/train.py configs/detection/ava_kinetics/vit-base-p16_videomae-k400-
     --cfg-options randomness.seed=0 randomness.deterministic=True
 ```
 
-For more details, you can refer to the **Training** part in the [Training and Test Tutorial](/docs/en/user_guides/4_train_test.md).
+For more details, you can refer to the **Training** part in the [Training and Test Tutorial](/docs/en/user_guides/train_test.md).
 
 ## Test
 
@@ -61,7 +61,7 @@ python tools/test.py configs/detection/ava_kinetics/vit-base-p16_videomae-k400-p
     checkpoints/SOME_CHECKPOINT.pth --dump result.pkl
 ```
 
-For more details, you can refer to the **Test** part in the [Training and Test Tutorial](/docs/en/user_guides/4_train_test.md).
+For more details, you can refer to the **Test** part in the [Training and Test Tutorial](/docs/en/user_guides/train_test.md).
 
 ## Citation
 

diff --git a/configs/localization/bsn/metafile.yml b/configs/localization/bsn/metafile.yml
@@ -1,5 +1,5 @@
 Collections:
-- Name: BMN
+- Name: BSN
   README: configs/localization/bsn/README.md
   Paper:
     URL: https://arxiv.org/abs/1806.02964

diff --git a/configs/recognition/mvit/README.md b/configs/recognition/mvit/README.md
@@ -63,7 +63,7 @@ the corresponding result without repeat augment is as follows:
 
 | frame sampling strategy | resolution | backbone | pretrain | top1 acc | top5 acc |      reference top1 acc       |       reference top5 acc       | testing protocol | FLOPs | params |       config       |       ckpt       |       log       |
 | :---------------------: | :--------: | :------: | :------: | :------: | :------: | :---------------------------: | :----------------------------: | :--------------: | :---: | :----: | :----------------: | :--------------: | :-------------: |
-|       uniform 16        |  224x224   | MViTv2-S |   K400   |   68.2   |   91.3   | [68.2](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | [91.4](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | 1 clips x 3 crop |  64G  | 34.4M  | [config](/configs/recognition/mvit/mvit-small-p244_k400-pre_16xb16-u16-100e_sthv2-rgb) | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/mvit-small-p244_k400-pre_16xb16-u16-100e_sthv2-rgb/mvit-small-p244_k400-pre_16xb16-u16-100e_sthv2-rgb_20230201-4065c1b9.pth) | [log](https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/mvit-small-p244_k400-pre_16xb16-u16-100e_sthv2-rgb/mvit-small-p244_k400-pre_16xb16-u16-100e_sthv2-rgb.log) |
+|       uniform 16        |  224x224   | MViTv2-S |   K400   |   68.2   |   91.3   | [68.2](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | [91.4](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | 1 clips x 3 crop |  64G  | 34.4M  | [config](/configs/recognition/mvit/mvit-small-p244_k400-pre_16xb16-u16-100e_sthv2-rgb.py) | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/mvit-small-p244_k400-pre_16xb16-u16-100e_sthv2-rgb/mvit-small-p244_k400-pre_16xb16-u16-100e_sthv2-rgb_20230201-4065c1b9.pth) | [log](https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/mvit-small-p244_k400-pre_16xb16-u16-100e_sthv2-rgb/mvit-small-p244_k400-pre_16xb16-u16-100e_sthv2-rgb.log) |
 
 For more details on data preparation, you can refer to
 

diff --git a/configs/recognition/tin/README.md b/configs/recognition/tin/README.md
@@ -50,7 +50,11 @@ Here, we use `finetune` to indicate that we use [TSM model](https://download.ope
 
 :::
 
-For more details on data preparation, you can refer to Kinetics400, Something-Something V1 and Something-Something V2 in [Prepare Datasets](/docs/en/user_guides/2_data_prepare.md).
+For more details on data preparation, you can refer to
+
+- [Kinetics](/tools/data/kinetics/README.md)
+- [Something-something V1](/tools/data/sthv1/README.md)
+- [Something-something V2](/tools/data/sthv2/README.md)
 
 ## Train
 

diff --git a/configs/recognition/tpn/README.md b/configs/recognition/tpn/README.md
@@ -41,7 +41,11 @@ Visual tempo characterizes the dynamics and the temporal scale of an action. Mod
 
 :::
 
-For more details on data preparation, you can refer to Kinetics400, Something-Something V1 and Something-Something V2 in [Data Preparation](/docs/data_preparation.md).
+For more details on data preparation, you can refer to
+
+- [Kinetics](/tools/data/kinetics/README.md)
+- [Something-something V1](/tools/data/sthv1/README.md)
+- [Something-something V2](/tools/data/sthv2/README.md)
 
 ## Train
 

diff --git a/configs/recognition/tpn/metafile.yml b/configs/recognition/tpn/metafile.yml
@@ -23,7 +23,7 @@ Models:
   - Dataset: Kinetics-400
     Metrics:
       Top 1 Accuracy: 74.20
-      top5 accuracy: 91.48
+      Top 5 Accuracy: 91.48
     Task: Action Recognition
   Training Log: https://download.openmmlab.com/mmaction/v1.0/recognition/tpn/tpn-slowonly_r50_8xb8-8x8x1-150e_kinetics400-rgb/tpn-slowonly_r50_8xb8-8x8x1-150e_kinetics400-rgb.log
   Weights: https://download.openmmlab.com/mmaction/v1.0/recognition/tpn/tpn-slowonly_r50_8xb8-8x8x1-150e_kinetics400-rgb/tpn-slowonly_r50_8xb8-8x8x1-150e_kinetics400-rgb_20220913-97d0835d.pth
@@ -45,7 +45,7 @@ Models:
   - Dataset: Kinetics-400
     Metrics:
       Top 1 Accuracy: 76.74
-      top5 accuracy: 92.57
+      Top 5 Accuracy: 92.57
     Task: Action Recognition
   Training Log: https://download.openmmlab.com/mmaction/v1.0/recognition/tpn/tpn-slowonly_imagenet-pretrained-r50_8xb8-8x8x1-150e_kinetics400-rgb/tpn-slowonly_imagenet-pretrained-r50_8xb8-8x8x1-150e_kinetics400-rgb.log
   Weights: https://download.openmmlab.com/mmaction/v1.0/recognition/tpn/tpn-slowonly_imagenet-pretrained-r50_8xb8-8x8x1-150e_kinetics400-rgb/tpn-slowonly_imagenet-pretrained-r50_8xb8-8x8x1-150e_kinetics400-rgb_20220913-fed3f4c1.pth

diff --git a/configs/recognition/uniformer/README.md b/configs/recognition/uniformer/README.md
@@ -20,11 +20,11 @@ It is a challenging task to learn rich and multi-scale spatiotemporal semantics
 
 ### Kinetics-400
 
-| frame sampling strategy |   resolution   |  backbone   | top1 acc | top5 acc | [reference](<(https://github.com/Sense-X/UniFormer/blob/main/video_classification/README.md)>) top1 acc | [reference](<(https://github.com/Sense-X/UniFormer/blob/main/video_classification/README.md)>) top5 acc | mm-Kinetics top1 acc | mm-Kinetics top5 acc | testing protocol | FLOPs | params |                                              config                                               |                                                                           ckpt                                                                           |
-| :---------------------: | :------------: | :---------: | :------: | :------: | :-----------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------: | :------------------: | :------------------: | :--------------: | :---: | :----: | :-----------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------: |
-|         16x4x1          | short-side 320 | UniFormer-S |   80.9   |   94.6   |                                                  80.8                                                   |                                                  94.7                                                   |         80.9         |         94.6         | 4 clips x 1 crop | 41.8G | 21.4M  | [config](/configs/recognition/uniformer/uniformer-small_imagenet1k-pre_16x4x1_kinetics400-rgb.py) | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/uniformerv1/uniformer-small_imagenet1k-pre_16x4x1_kinetics400-rgb_20221219-c630a037.pth) |
-|         16x4x1          | short-side 320 | UniFormer-B |   82.0   |   95.0   |                                                  82.0                                                   |                                                  95.1                                                   |         82.0         |         95.0         | 4 clips x 1 crop | 96.7G | 49.8M  | [config](/configs/recognition/uniformer/uniformer-base_imagenet1k-pre_16x4x1_kinetics400-rgb.py)  | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/uniformerv1/uniformer-base_imagenet1k-pre_16x4x1_kinetics400-rgb_20221219-157c2e66.pth)  |
-|         32x4x1          | short-side 320 | UniFormer-B |   83.1   |   95.3   |                                                  82.9                                                   |                                                  95.4                                                   |         83.0         |         95.3         | 4 clips x 1 crop |  59G  | 49.8M  | [config](/configs/recognition/uniformer/uniformer-base_imagenet1k-pre_32x4x1_kinetics400-rgb.py)  | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/uniformerv1/uniformer-base_imagenet1k-pre_32x4x1_kinetics400-rgb_20221219-b776322c.pth)  |
+| frame sampling strategy |   resolution   |  backbone   | top1 acc | top5 acc | [reference](https://github.com/Sense-X/UniFormer/blob/main/video_classification/README.md) top1 acc | [reference](https://github.com/Sense-X/UniFormer/blob/main/video_classification/README.md) top5 acc | mm-Kinetics top1 acc | mm-Kinetics top5 acc | testing protocol | FLOPs | params |                                              config                                               |                                                                           ckpt                                                                           |
+| :---------------------: | :------------: | :---------: | :------: | :------: | :-------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------: | :------------------: | :------------------: | :--------------: | :---: | :----: | :-----------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------: |
+|         16x4x1          | short-side 320 | UniFormer-S |   80.9   |   94.6   |                                                80.8                                                 |                                                94.7                                                 |         80.9         |         94.6         | 4 clips x 1 crop | 41.8G | 21.4M  | [config](/configs/recognition/uniformer/uniformer-small_imagenet1k-pre_16x4x1_kinetics400-rgb.py) | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/uniformerv1/uniformer-small_imagenet1k-pre_16x4x1_kinetics400-rgb_20221219-c630a037.pth) |
+|         16x4x1          | short-side 320 | UniFormer-B |   82.0   |   95.0   |                                                82.0                                                 |                                                95.1                                                 |         82.0         |         95.0         | 4 clips x 1 crop | 96.7G | 49.8M  | [config](/configs/recognition/uniformer/uniformer-base_imagenet1k-pre_16x4x1_kinetics400-rgb.py)  | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/uniformerv1/uniformer-base_imagenet1k-pre_16x4x1_kinetics400-rgb_20221219-157c2e66.pth)  |
+|         32x4x1          | short-side 320 | UniFormer-B |   83.1   |   95.3   |                                                82.9                                                 |                                                95.4                                                 |         83.0         |         95.3         | 4 clips x 1 crop |  59G  | 49.8M  | [config](/configs/recognition/uniformer/uniformer-base_imagenet1k-pre_32x4x1_kinetics400-rgb.py)  | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/uniformerv1/uniformer-base_imagenet1k-pre_32x4x1_kinetics400-rgb_20221219-b776322c.pth)  |
 
 The models are ported from the repo [UniFormer](https://github.com/Sense-X/UniFormer/blob/main/video_classification/README.md) and tested on our data. Currently, we only support the testing of UniFormer models, training will be available soon.