Replies: 8 comments
-
Or is this performance boost is a result of pretraining on Imagenet 22k? What is the pretraining dataset for segformer-b5 and convnext-XL? Why is there a big gap between their performance and those on the leader board (~60 mIoU on ADE20k val)? Thanks! |
Beta Was this translation helpful? Give feedback.
-
It's a good question that the different pretrained models will help the results on ade20k, and many excellent scientists are trying to figure it. In fact, we are planning to do some experiments on this problem, but suffer to the limitation of human resources and computation resources, this plan has not been carried out. I remembered, from the paper, the pretrained dataset of segformer-b5 is imagenet-1k and convnext-xl is imagenet-22k, and you can check the train schedule details in their paper or mmclassification which is the pretrained backbone model zoo and codebase in OpenMMLab. |
Beta Was this translation helpful? Give feedback.
-
I doubt some of the pretraining dataset may have visual similarities as ADE20K. To reproduce the entire experiment including pretraining on Imagenet-22k requires large dataset download, many training tricks (even including random seeds), and most importantly, long machine and human time to train. For me, it is not realistic. However, benchmarks such as
I'm sorry, their results are 55 mIoU on ADE20k, not dramatically higher than convnext. |
Beta Was this translation helpful? Give feedback.
-
Thanks for your reminder. |
Beta Was this translation helpful? Give feedback.
-
I notice that for Beit, by training 320k iterations, the model can achieve 56+ mIoU. Training for longer time seems to be a potential cause. But this is not guaranteed, as no validation is available for early stopping. |
Beta Was this translation helpful? Give feedback.
-
Hi, have you reproduced such high mIoU? I think that the fluctuation of semantic segmentation datasets (Cityscapes, Ade20k) is very huge compared to object detection (COCO). It means that different random seeds may bring obvious performance variation. At least, that's what my experiments with MMsegmentation say. I don't know how those papers report the performance. Maybe running many times and reporting the best mIoU. |
Beta Was this translation helpful? Give feedback.
-
I estimate that the variance of some sota methods such as segmenter and BEiT is more than 1% mIoU on ade20k. |
Beta Was this translation helpful? Give feedback.
-
Hi, no I did not reproduce this score. There are variance, indeed. However, I believe the comparison should at least be using the same random seed to ensure the initialization of the networks is the same. Running the model with multiple random seeds are too wasteful for calculation resources. |
Beta Was this translation helpful? Give feedback.
-
Mask DINO says they pretrained on Object365 detection dataset, and the results on semantic segmentation on ADE20k gets to 60.8, which is way higher than backbones such as convnext and segformer. Is there an easy way to reproduce this result using mmseg?
Beta Was this translation helpful? Give feedback.
All reactions