Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Settings for experiment with ViT-B/32 and ViT-L/14 #5

Open
andylin12 opened this issue Nov 2, 2022 · 3 comments
Open

Settings for experiment with ViT-B/32 and ViT-L/14 #5

andylin12 opened this issue Nov 2, 2022 · 3 comments

Comments

@andylin12
Copy link

Thanks for the wonderful paper and repo.

I was able to reproduce MaskClip and MaskClip+ with ViT-B/16 + R101 on Pascal context dataset. The result mAp is 25.45 and 29.48 respecitively.

However, when I tried to change the model to ViT-B/32 and ViT-L/14 the result is not good, less than half of ViT-B/16 and the quanlitative result shows that the predicted dense label is generally a mess.

What I did was:

  1. convert weight and backbone and extract text embeddings for ViT-B/32 and ViT-L/14
  2. create a config accoding to ViT-B/16, with modifications:
    • change the patch size to 32 for ViT-B/32
    • change the pathc size to 14, embed_dims to 1024, num_layers to 24 for ViT-L/14

Is there anything I've done wrong or misunderstood? Do you have any suggestions on why the result is bad?

Thanks in advance.

@hewenbin
Copy link

Same observation here. Any thoughts?

@111chengxuyuan
Copy link

111chengxuyuan commented Feb 8, 2023

Hello,I want to ask you a question,Which version of mmsegmentation should I install to run this code properly? I installed 0.20.0 but couldn't run it

@ngfuong
Copy link

ngfuong commented Mar 6, 2023

Can you please share with me the configuration file of your reproduced MaskCLIP+ ViT16?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants