Skip to content

Latest commit

 

History

History
44 lines (31 loc) · 1.69 KB

README.md

File metadata and controls

44 lines (31 loc) · 1.69 KB

Multi-modality Video Understanding

Datasets

You can find the dataset instructions in DATASET. We have provide all the metadata files of our data.

Model ZOO

You can find all the models and the scripts in MODEL_ZOO.

Pre-Training

We use CLIP pretrained models as the unmasked teachers by default:

For training, you can simply run the pretraining scripts as follows:

# masked pretraining
bash ./exp_pt/videomamba_middle_5m/run.sh
# further unmasked pretraining for 1 epoch
bash ./exp_pt/videomamba_middle_5m_unmasked/run.sh

Notes:

  1. Set data_dir and your_data_path like your_webvid_path in data.py before running the scripts.
  2. Set vision_encoder.pretrained in vision_encoder.pretrained in the corresponding config files.
  3. Set --rdzv_endpoint to your MASTER_NODE:MASTER_PORT in torchrun.sh.
  4. save_latest=True will automatically save the latest checkpoint while training.
  5. auto_resume=True will automatically loaded the best or latest checkpoint while training.
  6. For unmasked pretraining, please set pretrained_path to load the masked pretrained epoch.

Zero-shot Evaluation

For zero-shot evaluation, you can simply run the pretraining scripts as follows:

bash ./exp_zs/msrvtt/run.sh

Notes:

  1. Set pretrained_path in the running scripts before running the scripts.
  2. Set zero_shot=True and evaluate=True for zero-shot evaluation