Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request Demo #1

Open
jbeomlee93 opened this issue Jul 27, 2022 · 7 comments
Open

Request Demo #1

jbeomlee93 opened this issue Jul 27, 2022 · 7 comments

Comments

@jbeomlee93
Copy link

Dear authors.

Thank you for your nice work, and congratulations that MaskCLIP is accepted to ECCV as an oral paper!

I have tried your code to run a single image with a set of object classes, but actually failed to obtain meaningful localization results from MaskCLIP.

I'm sorry to bother you, but could you provide a demo file? Batman example would be nice.

Thank you so much!

@zhao1f
Copy link

zhao1f commented Oct 5, 2022

I have the same issue, is there any update? It seems the pretrained CLIP model fails to provide meaningful results.

@qdd1234
Copy link

qdd1234 commented Oct 6, 2022

I also meet the same problem. I use the following code to infer a new image. However, the result is unsatisfactory.

from mmseg.apis import inference_segmentor, init_segmentor, show_result_pyplot
from mmseg.core.evaluation import get_palette

if __name__== "__main__":

    configs = "configs/maskclip/maskclip_vit16_1024x512_cityscapes.py"
#    checkpoint_file = '/home/fyj/zky/Seg/MaskCLIP/pretrain/ViT16_clip_backbone.pth'
    checkpoint_file = None

    model = init_segmentor(configs, checkpoint_file, device='cuda:0')
    img = 'demo/munster_000024_000019_leftImg8bit.png'
    result = inference_segmentor(model, img)
    print("result:",result[0].shape)
    show_result_pyplot(model, img, result,get_palette('cityscapes'),out_file = 'fine.jpg')

@TheoPis
Copy link

TheoPis commented Nov 11, 2022

Hi and thanks for sharing the code of this interesting work. I have drawn similar conclusions with the above comments. I reimplemented the MaskCLIP baseline (i.e using pretrained clip features only) in my own codebase. I have then tested it on images from ADE20K and PASCAL Context. It seems to frequently be able to recognize salient semantics in the image (as expected given CLIPS zero-shot classification capabilities). For that prompt denoising/key smoothing is particularly helpfull to limit predictions to salient classes. However it clearly fails in terms of obtaining a segmentation mask.

Thus, I wonder if the authors can provide the requested demo on PASCAL-context.

Also it is unclear in the paper (to the best of my understanding) how the final segmentation of MASKCLIP is obtained? Are the CLIP features bilinearly upsampled or is the low-resolution segmentation (i.e after class-wise argmax) upsampled using nearest neighbours interpolation?

Many thanks for any help with this.

@111chengxuyuan
Copy link

Hello,I want to ask you a question,Which version of mmsegmentation should I install to run this code properly? I installed 0.20.0 but couldn't run it

@kahnchana
Copy link

I am facing similar issues trying to replicate this. Sharing a demo notebook could be really useful.

@comicsans-02
Copy link

Hi and thanks for sharing the code of this interesting work. I have drawn similar conclusions with the above comments. I reimplemented the MaskCLIP baseline (i.e using pretrained clip features only) in my own codebase. I have then tested it on images from ADE20K and PASCAL Context. It seems to frequently be able to recognize salient semantics in the image (as expected given CLIPS zero-shot classification capabilities). For that prompt denoising/key smoothing is particularly helpfull to limit predictions to salient classes. However it clearly fails in terms of obtaining a segmentation mask.

Thus, I wonder if the authors can provide the requested demo on PASCAL-context.

Also it is unclear in the paper (to the best of my understanding) how the final segmentation of MASKCLIP is obtained? Are the CLIP features bilinearly upsampled or is the low-resolution segmentation (i.e after class-wise argmax) upsampled using nearest neighbours interpolation?

Many thanks for any help with this.

Hi, I am working on a project involving MaskCLIP. Please do help me understand how to reproduce baseline results as shown in the paper!

@KingOceaning
Copy link

I don't know how to solve the following dataset problem,hope help:
ImportError: cannot import name 'Detail' from 'detail' (/home/asc005/anaconda3/envs/MaskCLIP/lib/python3.8/site-packages/detail/init.py)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants