-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do you plan to release a notebook demo ? #2
Comments
Thanks for this great job! I have a small dataset of 30 video clips and I want to make zero-shot action recognition using your model. Do you have a simple demo file that I can use? or could you tell me which function/script/config should I update to work on custom videos? |
Hi @ooza, thank you for your interest in our work! I will share a sample notebook demo in the upcoming days. But if you want to use your custom datasets before that, please follow the below instructions. (Please also refer to the example instructions for public datasets in DATASETS.md).
If you have any follow-up questions, feel free to ask. I will also mention you after adding a sample notebook. |
Thanks @byminji for your quick reply.
File:
File:
As I said before I have a small dataset of less than 30 short videos and 3 classes. |
Hi @ooza, You can check individual filenames and predictions by modifying some parts of the code. You can get the file id metadata by running your command with from utils.print_utils import colorstr
@torch.no_grad()
def print_individual_predictions(val_loader, model, logger, config):
""" Code snippet to print individual predictions """
assert config.num_clip == 1 # Only supports single-view sampling case
assert config.get("gather_filename") # Run command with "++gather_filename=true" override
model.eval()
num_classes = len(val_loader.dataset.classes)
class_mapping = {idx: cls for idx, cls in val_loader.dataset.classes}
metric_logger = MetricLogger(delimiter=" ")
header = 'Val:'
logger.info(f"{config.num_clip * config.num_crop} views inference")
for idx, batch_data in enumerate(metric_logger.log_every(val_loader, config.print_freq, logger, header)):
image = batch_data['imgs'].cuda(non_blocking=True)
image = image.view((-1, config.num_frames, 3) + image.size()[-2:])
label_id = batch_data['label'].cuda(non_blocking=True)
label_id = label_id.reshape(-1) # [b]
# Get file id metadata
file_id = batch_data['file_id']
b, t, c, h, w = image.size()
tot_similarity = torch.zeros((b, num_classes)).cuda()
# Forward
output = model(image)
logits = output["logits"]
similarity = logits.view(b, -1).softmax(dim=-1)
tot_similarity += similarity
# Classification score
acc1, acc5, indices_1, _ = accuracy_top1_top5(tot_similarity, label_id)
metric_logger.meters['acc1'].update(float(acc1) / b * 100, n=b)
metric_logger.meters['acc5'].update(float(acc5) / b * 100, n=b)
# Print individual predictions
for batch_idx in range(b):
filename = val_loader.dataset.video_infos[file_id[batch_idx]]['filename']
foldername, videoname = filename.split("/")[-2], filename.split("/")[-1]
gt_label, pred_label = label_id[batch_idx].item(), indices_1[batch_idx].item()
gt_cls, pred_cls = class_mapping[gt_label], class_mapping[pred_label]
flag = colorstr("blue", "Correct") if gt_label == pred_label else colorstr("red", "Wrong")
print(f"{videoname}: [{flag}] GT {gt_cls}, Pred {pred_cls}")
metric_logger.synchronize_between_processes()
logger.info(f' * Acc@1 {metric_logger.acc1.global_avg:.3f} Acc@5 {metric_logger.acc5.global_avg:.3f}')
return metric_logger.get_stats() Thank you. |
Thanks @byminji !
I had to add this at the beginning of the function:
Otherwise I got this error: The issue now is a mismatch between the size of the preds and the targets: But when I modified the existing multi-view inference logic by setting the |
Hi @ooza, Multi-view inference is a common strategy for increasing the accuracy of video recognition models by ensembling multiple predictions from differently sampled frames. Our paper used a 16 frames x 2 clips setting for comparison with 32 frame sampling models. You can either remove the multi-view inference or modify the code snippet to show results from multiple predictions. I simply implemented the single-view case only because it was for analysis, not for evaluation. |
No description provided.
The text was updated successfully, but these errors were encountered: