Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Open World Object Detection (OWOD) is a novel computer vision task with a considerable challenge, bridging the gap between classic object detection (OD) and real-world object detection. In addition to detecting and classifying seen/known objects, OWOD algorithms are expected to localize all potential unseen/unknown objects and incrementally learn them. The large pre-trained vision-language grounding models (VLM, e.g., GLIP) have rich knowledge about the open world, but are limited by text prompts and cannot localize indescribable objects. However, there are many detection scenarios in which pre-defined language descriptions are unavailable during inference. In this paper, we attempt to specialize the VLM model for OWOD tasks by distilling its open-world knowledge into a language-agnostic detector. Surprisingly, we observe that the simple knowledge distillation approach leads to unexpected performance for unknown object detection, even with a small amount of data. Unfortunately, knowledge distillation for unknown objects severely affects the learning of detectors with conventional structures, leading to catastrophic damage to the model's ability to learn about known objects. To alleviate these problems, we propose the down-weight training strategy for knowledge distillation from vision-language model to single visual modality one. Meanwhile, we propose the cascade decoupled decoders that decouple the learning of localization and recognition to reduce the impact of category interactions of known and unknown objects on the localization learning process. Ablation experiments demonstrate that both of them are effective in mitigating the impact of open-world knowledge distillation on the learning of known objects. Additionally, to alleviate the current lack of comprehensive benchmarks for evaluating the ability of the open-world detector to detect unknown objects in the open world, we refine the benchmark for evaluating the performance of unknown object detection by augmenting annotations for unknown objects which we name"IntensiveSet$\spadesuit$". Comprehensive experiments performed on OWOD, MS-COCO, and our proposed benchmarks demonstrate the effectiveness of our methods.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2025.3600435DOI Listing

Publication Analysis

Top Keywords

knowledge distillation
20
object detection
20
unknown objects
16
objects
9
knowledge
8
simple knowledge
8
open-world knowledge
8
performance unknown
8
unknown object
8
object
6

Similar Publications

Molecular property prediction has become essential in accelerating advancements in drug discovery and materials science. Graph Neural Networks have recently demonstrated remarkable success in molecular representation learning; however, their broader adoption is impeded by two significant challenges: (1) data scarcity and constrained model generalization due to the expensive and time-consuming task of acquiring labeled data and (2) inadequate initial node and edge features that fail to incorporate comprehensive chemical domain knowledge, notably orbital information. To address these limitations, we introduce a Knowledge-Guided Graph (KGG) framework employing self-supervised learning to pretrain models using orbital-level features in order to mitigate reliance on extensive labeled data sets.

View Article and Find Full Text PDF

Adaptive metric for knowledge distillation by deep Bregman divergence.

Neural Netw

August 2025

Inspur Electronic Information Industry Co., Ltd, China.

Knowledge distillation (KD) makes it possible to deploy high-accuracy models on devices with limited resources and is an effective means of achieving lightweight models. With the advancement of technology, the methods of knowledge distillation are also continuously developing and improving to adapt to different application scenarios and needs. To facilitate the transfer of knowledge from larger networks to smaller and lighter networks, KD has been employed to bridge the gap in probability outputs or middle-layer representations between teacher and student networks.

View Article and Find Full Text PDF

Federated Learning (FL) offers a promising privacy-preserving framework for collaborative global model training without exposing local private data. However, system heterogeneity in FL causes the straggler issue, where resource-limited edge devices delay global model aggregation. Existing approaches, such as asynchronous mechanisms or kick-out methods, primarily focus on optimizing model convergence efficiency but often overlook edge resource constraints, potentially resulting in model bias toward high-performance devices or omission of critical data.

View Article and Find Full Text PDF

The generalization ability of foundation models in the field of computational pathology (CPath) is crucial for their clinical success. However, current foundation models have only been evaluated on a limited type and number of tasks, leaving their generalization ability unclear. We establish a comprehensive benchmark to evaluate the performance of off-the-shelf foundation models across six distinct clinical task types, encompassing a total of 72 specific tasks.

View Article and Find Full Text PDF

Early detection of Alzheimer's Disease (AD) is crucial for timely interventions and optimizing treatment outcomes. Integrating multimodal neuroimaging datasets can enhance the early detection of AD. However, models must address the challenge of incomplete modalities, a common issue in real-world scenarios, as not all patients have access to all modalities due to practical constraints such as cost and availability.

View Article and Find Full Text PDF