A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future.

Chaoyang Zhu , Long Chen

IEEE Trans Pattern Anal Mach Intell

Published: December 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

As the most fundamental scene understanding tasks, object detection and segmentation have made tremendous progress in deep learning era. Due to the expensive manual labeling cost, the annotated categories in existing datasets are often small-scale and pre-defined, i.e., state-of-the-art fully-supervised detectors and segmentors fail to generalize beyond the closed vocabulary. To resolve this limitation, in the last few years, the community has witnessed an increasing attention toward Open-Vocabulary Detection (OVD) and Segmentation (OVS). By "open-vocabulary", we mean that the models can classify objects beyond pre-defined categories. In this survey, we provide a comprehensive review on recent developments of OVD and OVS. A taxonomy is first developed to organize different tasks and methodologies. We find that the permission and usage of weak supervision signals can well discriminate different methodologies, including: visual-semantic space mapping, novel visual feature synthesis, region-aware training, pseudo-labeling, knowledge distillation, and transfer learning. The proposed taxonomy is universal across different tasks, covering object detection, semantic/instance/panoptic segmentation, 3D and video understanding. The main design principles, key challenges, development routes, methodology strengths, and weaknesses are thoroughly analyzed.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TPAMI.2024.3413013	DOI Listing

Publication Analysis

Top Keywords

open-vocabulary detection

detection segmentation

object detection

survey open-vocabulary

detection

segmentation

segmentation future

future fundamental

fundamental scene

scene understanding

Similar Publications

UpGen: Unleashing Potential of Foundation Models for Training-Free Camouflage Detection via Generative Models.

IEEE Trans Image Process

August 2025

Ji Du , Jiesheng Wu , Desheng Kong , Weiyun Liang , Fangwei Hao

Camouflaged Object Detection (COD) aims to segment objects resembling their environment. To address the challenges of extensive annotations and complex optimizations in supervised learning, recent prompt-based segmentation methods excavate insightful prompts from Large Vision-Language Models (LVLMs) and refine them using various foundation models. These are subsequently fed into the Segment Anything Model (SAM) for segmentation.

View Article and Find Full Text PDF

Similar Publications

SKDF: A Simple Knowledge Distillation Framework for Distilling Open-Vocabulary Knowledge to Open-world Object Detector.

IEEE Trans Pattern Anal Mach Intell

August 2025

Shuailei Ma , Yuefeng Wang , Ying Wei , Enming Zhang , Jiaqi Fan

Open World Object Detection (OWOD) is a novel computer vision task with a considerable challenge, bridging the gap between classic object detection (OD) and real-world object detection. In addition to detecting and classifying seen/known objects, OWOD algorithms are expected to localize all potential unseen/unknown objects and incrementally learn them. The large pre-trained vision-language grounding models (VLM, e.

View Article and Find Full Text PDF

Similar Publications

Open vocabulary detection for concealed object detection in AMMW image.

Sci Rep

August 2025

Shanghai Key Laboratory of Crime Scene Evidence, Shanghai Research Institute of Criminal Science and Technology, Shanghai 200072, China.

Chenjiang Jiang , Chunyu Li , Xuejun Zhao

Currently, millimeter-wave imaging system plays a central role in security detection systems. Existing concealed object detectors for millimeter-wave images can only detect pre-trained categories and fail when encountering new, unseen categories. Accurately identifying the increasingly diverse types and shapes of concealed objects is a pressing challenge.

View Article and Find Full Text PDF

Similar Publications

Associate Everything Detected: Facilitating Tracking-by-Detection to the Unknown.

IEEE Trans Image Process

January 2025

Zimeng Fang , Chao Liang , Xue Zhou , Shuyuan Zhu , Xi Li

Multi-object tracking (MOT) emerges as a pivotal and highly promising branch in the field of computer vision. Classical closed-vocabulary MOT (CV-MOT) methods aim to track objects of predefined categories. Recently, some open-vocabulary MOT (OV-MOT) methods have successfully addressed the problem of tracking unknown categories.

View Article and Find Full Text PDF

Similar Publications

Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection.

IEEE Trans Pattern Anal Mach Intell

July 2025

Yang Cao , Yihan Zeng , Hang Xu , Dan Xu

Open-vocabulary 3D Object Detection (OV-3DDet) addresses the detection of objects from an arbitrary list of novel categories in 3D scenes, which remains a very challenging problem. In this work, we propose CoDAv2, a unified framework designed to innovatively tackle both the localization and classification of novel 3D objects, under the condition of limited base categories. For localization, the proposed 3D Novel Object Discovery (3D-NOD) strategy utilizes 3D geometries and 2D open-vocabulary semantic priors to discover pseudo labels for novel objects during training.

View Article and Find Full Text PDF

Similar Publications