Evaluting SOTA Methods on $D^3$

Leaderboard

In this directory, we keep the scripts or github links (official or custom) to evaluate SOTA methods (REC/OVD/DOD/MLLM) on $D^3$:

Name	Paper	Original Tasks	Training Data	Evaluation Code	Intra-FULL/PRES/ABS/Inter-FULL/PRES/ABS	Source	Note
OFA-large	OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework (ICML 2022)	REC	-	-	4.2/4.1/4.6/0.1/0.1/0.1	DOD paper	-
CORA-R50	CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching (CVPR 2023)	OVD	-	-	6.2/6.7/5.0/2.0/2.2/1.3	DOD paper	-
OWL-ViT-large	Simple Open-Vocabulary Object Detection with Vision Transformers (ECCV 2022)	OVD	-	DOD official	9.6/10.7/6.4/2.5/2.9/2.1	DOD paper	Post-processing hyper-parameters may affect the performance and the result may not exactly match the paper
SPHINX-7B	SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models (arxiv 2023)	MLLM capable of REC	-	DOD official	10.6/11.4/7.9/-/-/-	DOD authors	A lot of contribution from Jie Li
GLIP-T	Grounded Language-Image Pre-training (CVPR 2022)	OVD & PG	-	-	19.1/18.3/21.5/-/-/-	GEN paper	-
UNINEXT-huge	Universal Instance Perception as Object Discovery and Retrieval (CVPR 2023)	OVD & REC	-	DOD official	20.0/20.6/18.1/3.3/3.9/1.6	DOD paper	-
Grounding-DINO-base	Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection (arxiv 2023)	OVD & REC	-	DOD official	20.7/20.1/22.5/2.7/2.4/3.5	DOD paper	Post-processing hyper-parameters may affect the performance and the result may not exactly match the paper
OFA-DOD-base	Described Object Detection: Liberating Object Detection with Flexible Expressions (NeurIPS 2023)	DOD	-	-	21.6/23.7/15.4/5.7/6.9/2.3	DOD paper	-
FIBER-B	Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone (NeurIPS 2022)	OVD & REC	-	-	22.7/21.5/26.0/-/-/-	GEN paper	-
MM-Grounding-DINO	An Open and Comprehensive Pipeline for Unified Object Grounding and Detection (arxiv 2024)	DOD & OVD & REC	O365, GoldG, GRIT, V3Det	MM-GDINO official	22.9/21.9/26.0/-/-/-	MM-GDINO paper	-
GEN (FIBER-B)	Generating Enhanced Negatives for Training Language-Based Object Detectors (arxiv 2024	DOD	-	-	26.0/25.2/28.1/-/-/-	GEN paper	Enhancement based on FIBER-B
APE-large (D)	Aligning and Prompting Everything All at Once for Universal Visual Perception (arxiv 2023)	DOD & OVD & REC	COCO, LVIS, O365, OpenImages, Visual Genome, RefCOCO/+/g, SA-1B, GQA, PhraseCut, Flickr30k	APE official	37.5/38.8/33.9/21.0/22.0/17.9	APE paper	Extra training data helps for this amazing performance

Some extra notes:

Each method is currently recorded by the variant with the highest performance in this table, if there are multiple variants available, so it's only a leaderboard, not meant for fair comparison.
Methods like GLIP, FIBER, etc. are actually not evaluated on OVD benchmarks. For zero-shot eval on DOD, We currently do not distinguish between methods for OVD benchmarks and methods for ZS-OD, as long as it is verified with open-set detection capability.

For other variants (e.g. for a fair comparison regarding data, backbone, etc.), please refer to the papers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Evaluting SOTA Methods on $D^3$

Leaderboard

Files

README.md

Latest commit

History

README.md

File metadata and controls

Evaluting SOTA Methods on $D^3$

Leaderboard