SKDF: A Simple Knowledge Distillation Framework for Distilling Open-Vocabulary Knowledge to Open-world Object Detector.

Shuailei Ma , Yuefeng Wang , Ying Wei , Enming Zhang , Jiaqi Fan , Xinyu Sun , Peihao Chen

IEEE Trans Pattern Anal Mach Intell

Published: August 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Open World Object Detection (OWOD) is a novel computer vision task with a considerable challenge, bridging the gap between classic object detection (OD) and real-world object detection. In addition to detecting and classifying seen/known objects, OWOD algorithms are expected to localize all potential unseen/unknown objects and incrementally learn them. The large pre-trained vision-language grounding models (VLM, e.g., GLIP) have rich knowledge about the open world, but are limited by text prompts and cannot localize indescribable objects. However, there are many detection scenarios in which pre-defined language descriptions are unavailable during inference. In this paper, we attempt to specialize the VLM model for OWOD tasks by distilling its open-world knowledge into a language-agnostic detector. Surprisingly, we observe that the simple knowledge distillation approach leads to unexpected performance for unknown object detection, even with a small amount of data. Unfortunately, knowledge distillation for unknown objects severely affects the learning of detectors with conventional structures, leading to catastrophic damage to the model's ability to learn about known objects. To alleviate these problems, we propose the down-weight training strategy for knowledge distillation from vision-language model to single visual modality one. Meanwhile, we propose the cascade decoupled decoders that decouple the learning of localization and recognition to reduce the impact of category interactions of known and unknown objects on the localization learning process. Ablation experiments demonstrate that both of them are effective in mitigating the impact of open-world knowledge distillation on the learning of known objects. Additionally, to alleviate the current lack of comprehensive benchmarks for evaluating the ability of the open-world detector to detect unknown objects in the open world, we refine the benchmark for evaluating the performance of unknown object detection by augmenting annotations for unknown objects which we name"IntensiveSet$\spadesuit$". Comprehensive experiments performed on OWOD, MS-COCO, and our proposed benchmarks demonstrate the effectiveness of our methods.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TPAMI.2025.3600435	DOI Listing

Publication Analysis

Top Keywords

knowledge distillation

object detection

unknown objects

objects

knowledge

simple knowledge

open-world knowledge

performance unknown

unknown object

object

Similar Publications

KGG: Knowledge-Guided Graph Self-Supervised Learning to Enhance Molecular Property Predictions.

J Chem Inf Model

September 2025

Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City, 41 Dinh Tien Hoang, District 1, Ho Chi Minh City 700000, Vietnam.

Van-Thinh To , Phuoc-Chung Van Nguyen , Gia-Bao Truong , Tuyet-Minh Phan , Tieu-Long Phan

Molecular property prediction has become essential in accelerating advancements in drug discovery and materials science. Graph Neural Networks have recently demonstrated remarkable success in molecular representation learning; however, their broader adoption is impeded by two significant challenges: (1) data scarcity and constrained model generalization due to the expensive and time-consuming task of acquiring labeled data and (2) inadequate initial node and edge features that fail to incorporate comprehensive chemical domain knowledge, notably orbital information. To address these limitations, we introduce a Knowledge-Guided Graph (KGG) framework employing self-supervised learning to pretrain models using orbital-level features in order to mitigate reliance on extensive labeled data sets.

View Article and Find Full Text PDF

Similar Publications

Adaptive metric for knowledge distillation by deep Bregman divergence.

Neural Netw

August 2025

Inspur Electronic Information Industry Co., Ltd, China.

Tongtong Yuan , Zixuan Xu , Bo Liu , Yinan Tang

Knowledge distillation (KD) makes it possible to deploy high-accuracy models on devices with limited resources and is an effective means of achieving lightweight models. With the advancement of technology, the methods of knowledge distillation are also continuously developing and improving to adapt to different application scenarios and needs. To facilitate the transfer of knowledge from larger networks to smaller and lighter networks, KD has been employed to bridge the gap in probability outputs or middle-layer representations between teacher and student networks.

View Article and Find Full Text PDF

Similar Publications

Heterogeneity-aware high-efficiency federated learning with hybrid synchronous-asynchronous splitting strategy.

Neural Netw

August 2025

Information Systems Technology and Design, Singapore University of Technology and Design, Singapore. Electronic address:

Zijian Li , Boyuan Li , Kunyu Zhang , Bingcai Wei , Hongbo Liu

Federated Learning (FL) offers a promising privacy-preserving framework for collaborative global model training without exposing local private data. However, system heterogeneity in FL causes the straggler issue, where resource-limited edge devices delay global model aggregation. Existing approaches, such as asynchronous mechanisms or kick-out methods, primarily focus on optimizing model convergence efficiency but often overlook edge resource constraints, potentially resulting in model bias toward high-performance devices or omission of critical data.

View Article and Find Full Text PDF

Similar Publications

A generalizable pathology foundation model using a unified knowledge distillation pretraining framework.

Nat Biomed Eng

September 2025

Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China.

Jiabo Ma , Zhengrui Guo , Fengtao Zhou , Yihui Wang , Yingxue Xu

The generalization ability of foundation models in the field of computational pathology (CPath) is crucial for their clinical success. However, current foundation models have only been evaluated on a limited type and number of tasks, leaving their generalization ability unclear. We establish a comprehensive benchmark to evaluate the performance of off-the-shelf foundation models across six distinct clinical task types, encompassing a total of 72 specific tasks.

View Article and Find Full Text PDF

Similar Publications

A Cross-Modal Mutual Knowledge Distillation Framework for Alzheimer's Disease Diagnosis: Addressing Incomplete Modalities.

IEEE Trans Autom Sci Eng

March 2025

H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.

Min Gu Kwak , Lingchao Mao , Zhiyang Zheng , Yi Su , Fleming Lure

Early detection of Alzheimer's Disease (AD) is crucial for timely interventions and optimizing treatment outcomes. Integrating multimodal neuroimaging datasets can enhance the early detection of AD. However, models must address the challenge of incomplete modalities, a common issue in real-world scenarios, as not all patients have access to all modalities due to practical constraints such as cost and availability.

View Article and Find Full Text PDF

Similar Publications