Do you plan to release a notebook demo ? #2

ooza · 2024-09-06T12:31:50Z

No description provided.

ooza · 2024-09-06T12:35:15Z

Thanks for this great job! I have a small dataset of 30 video clips and I want to make zero-shot action recognition using your model. Do you have a simple demo file that I can use? or could you tell me which function/script/config should I update to work on custom videos?

byminji · 2024-09-06T16:06:04Z

Hi @ooza, thank you for your interest in our work!

I will share a sample notebook demo in the upcoming days.

But if you want to use your custom datasets before that, please follow the below instructions. (Please also refer to the example instructions for public datasets in DATASETS.md).

Put all your custom videos under /PATH/TO/VIDEOS folder.
Create a label file intc-clip/labels/custom_dataset_labels.csv. The format should be like:

id,name
0,abseiling
1,air drumming
2,answering questions
3,applauding
...

Create an annotation file that contains a list of video filenames and their corresponding labels in tc-clip/datasets_splits/custom_dataset_anns.txt. Each line of the txt file should be <filename> <class id>. For example, suppose that we have aaa.mp4, bbb.mp4, ..., zzz.mp4 under /PATH/TO/VIDEOS folder:

aaa.mp4 0
bbb.mp4 0
...
zzz.mp4 3

Create a dataset yaml file for your custom dataset in tc-clip/configs/data/custom_dataset.yaml. Below is an example of the inference-only case:

#@package _global_
data:
  test:
    - name: custom_dataset
      protocol: top1
      dataset_list:
      - dataset_name: custom_dataset
        root: /PATH/TO/VIDEOS
        num_classes: <YOUR_ACTUAL_NUM_CLASSES>
        label_file: tc-clip/labels/custom_dataset_labels.csv
        ann_file: tc-clip/datasets_splits/custom_dataset_anns.txt

Now run the below command. Note the data=custom_dataset part:

torchrun --nproc_per_node=4 main.py -cn zero_shot \
data=custom_dataset output=/PATH/TO/OUTPUT \
trainer=tc_clip eval=test resume=/PATH/TO/CHECKPOINTS/zero_shot_k400_tc_clip.pth

If you have any follow-up questions, feel free to ask. I will also mention you after adding a sample notebook.

ooza · 2024-09-11T12:31:26Z

Thanks @byminji for your quick reply.
I had to modify some source files to avoid using apex's amp because it is deprecated! I used the autocast from PyTorch's amp.
torch and torchvision versions: 2.1.2+cu118, 0.16.2+cu118
cuda version: 12.2
updated source files:
File: tc_clip.py
Function: forward
Update:
import torch.cuda.amp as amp
...

    ` with amp.autocast():
               image_features, context_tokens, attn, source = self.image_encoder(image.type(self.dtype),
                                                                      return_layer_num=self.return_layer_num,
                                                                      return_attention=return_attention,
                                                                    return_source=return_source)`

File: engine.py
Function: validate
Update :

  `with amp.autocast():
            output = model(image_input)`

File: main.py
Function: main_testing
Update:

    `with amp.autocast(): 
            test_stats = validate(val_loader, model, logger, config)`

As I said before I have a small dataset of less than 30 short videos and 3 classes.
So, I updated the accuracy_top1_top5 function in tools.py to handle fewer number of classes dynamically.
I got this result:

My question is how to depict / analyze the predicted classes for each video ?
BTW in the results the only output is log_rank0.txt
thanks again

byminji · 2024-09-12T11:31:10Z

Hi @ooza, You can check individual filenames and predictions by modifying some parts of the code. You can get the file id metadata by running your command with ++gather_filename=true (See datasets/build.py#L174.) Below is a code snippet that I've used before.

from utils.print_utils import colorstr

@torch.no_grad()
def print_individual_predictions(val_loader, model, logger, config):
    """ Code snippet to print individual predictions """

    assert config.num_clip == 1     # Only supports single-view sampling case
    assert config.get("gather_filename")    # Run command with "++gather_filename=true" override

    model.eval()
    num_classes = len(val_loader.dataset.classes)
    class_mapping = {idx: cls for idx, cls in val_loader.dataset.classes}
    metric_logger = MetricLogger(delimiter="  ")
    header = 'Val:'

    logger.info(f"{config.num_clip * config.num_crop} views inference")
    for idx, batch_data in enumerate(metric_logger.log_every(val_loader, config.print_freq, logger, header)):
        image = batch_data['imgs'].cuda(non_blocking=True)
        image = image.view((-1, config.num_frames, 3) + image.size()[-2:])
        label_id = batch_data['label'].cuda(non_blocking=True)
        label_id = label_id.reshape(-1)  # [b]

        # Get file id metadata
        file_id = batch_data['file_id']

        b, t, c, h, w = image.size()
        tot_similarity = torch.zeros((b, num_classes)).cuda()

        # Forward
        output = model(image)
        logits = output["logits"]
        similarity = logits.view(b, -1).softmax(dim=-1)
        tot_similarity += similarity

        # Classification score
        acc1, acc5, indices_1, _ = accuracy_top1_top5(tot_similarity, label_id)
        metric_logger.meters['acc1'].update(float(acc1) / b * 100, n=b)
        metric_logger.meters['acc5'].update(float(acc5) / b * 100, n=b)

        # Print individual predictions
        for batch_idx in range(b):
            filename = val_loader.dataset.video_infos[file_id[batch_idx]]['filename']
            foldername, videoname = filename.split("/")[-2], filename.split("/")[-1]
            gt_label, pred_label = label_id[batch_idx].item(), indices_1[batch_idx].item()
            gt_cls, pred_cls = class_mapping[gt_label], class_mapping[pred_label]
            flag = colorstr("blue", "Correct") if gt_label == pred_label else colorstr("red", "Wrong")
            print(f"{videoname}: [{flag}] GT {gt_cls}, Pred {pred_cls}")

    metric_logger.synchronize_between_processes()
    logger.info(f' * Acc@1 {metric_logger.acc1.global_avg:.3f} Acc@5 {metric_logger.acc5.global_avg:.3f}')
    return metric_logger.get_stats()

Thank you.

ooza · 2024-09-13T15:24:27Z

Thanks @byminji !
I added the print_individual_predictions function just before the main_testing. Then, I modified main_testing to include a check on the gather_filename flag:

# If gather_filename is true, print individual predictions
        with amp.autocast():
            if config.get("gather_filename", False):
                logger.info("Using print_individual_predictions function.")
                test_stats = print_individual_predictions(val_loader, model, logger, config)
            else:
                logger.info("Using validate function.")
                test_stats = validate(val_loader, model, logger, config)

I had to add this at the beginning of the function:

if config.get("gather_filename", False):
    config.num_clip = 1

Otherwise I got this error:
File "/home/vlm/tc-clip/main.py", line 151, in print_individual_predictions
assert config.num_clip == 1 # Only supports single-view sampling case
AssertionError

The issue now is a mismatch between the size of the preds and the targets:

More details:

But when I modified the existing multi-view inference logic by setting the config.num_clip =1 instead of 2 (elif config.protocol == 'zero_shot' and config.multi_view_inference: config.num_clip = 1) it works!
Is this safe and correct? or is there another more generic way to do it? any further explanations or details well be much appreciated.
Thanks

byminji · 2024-09-14T08:46:45Z

Hi @ooza, Multi-view inference is a common strategy for increasing the accuracy of video recognition models by ensembling multiple predictions from differently sampled frames. Our paper used a 16 frames x 2 clips setting for comparison with 32 frame sampling models. You can either remove the multi-view inference or modify the code snippet to show results from multiple predictions. I simply implemented the single-view case only because it was for analysis, not for evaluation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do you plan to release a notebook demo ? #2

Do you plan to release a notebook demo ? #2

ooza commented Sep 6, 2024

ooza commented Sep 6, 2024

byminji commented Sep 6, 2024 •

edited

Loading

ooza commented Sep 11, 2024 •

edited

Loading

byminji commented Sep 12, 2024 •

edited

Loading

ooza commented Sep 13, 2024 •

edited

Loading

byminji commented Sep 14, 2024

Do you plan to release a notebook demo ? #2

Do you plan to release a notebook demo ? #2

Comments

ooza commented Sep 6, 2024

ooza commented Sep 6, 2024

byminji commented Sep 6, 2024 • edited Loading

ooza commented Sep 11, 2024 • edited Loading

byminji commented Sep 12, 2024 • edited Loading

ooza commented Sep 13, 2024 • edited Loading

byminji commented Sep 14, 2024

byminji commented Sep 6, 2024 •

edited

Loading

ooza commented Sep 11, 2024 •

edited

Loading

byminji commented Sep 12, 2024 •

edited

Loading

ooza commented Sep 13, 2024 •

edited

Loading