Structure-Induced Gradient Regulation for Generalizable Vision-Language Models.

Juncheng Li , Minghe Gao , Siliang Tang , Longhui Wei , Jun Xiao , Fei Wu , Richang Hong , Meng Wang , Qi Tian

IEEE Trans Pattern Anal Mach Intell

Published: September 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Prompt tuning, a recently emerging paradigm, adapts vision-language pre-trained models to new tasks efficiently by learning "soft prompts" for frozen models. However, in few-shot scenarios, its effectiveness is limited by sensitivity to the initialization and the time-consuming search for optimal initialization, hindering rapid adaptation. Additionally, prompt tuning risks reducing the models' generalizability due to overfitting on scarce training samples. To overcome these challenges, we introduce a novel Gradient-RegulAted Meta-prompt learning (GRAM) framework that jointly meta-learns an efficient soft prompt initialization for better adaptation and a lightweight gradient regulating function for strong cross-domain generalizability in a meta-learning paradigm using only the weakly labeled image-text pre-training data. This is achieved through a Cross-Modal Hierarchical Clustering algorithm that organizes extensive image-text data into a structured hierarchy, facilitating robust meta-learning across diverse domains. Rather than designing a specific prompt tuning method, our GRAM can be easily incorporated into various prompt tuning methods in a model-agnostic way and bring about consistent improvement for them. Further, we consider a more practical but challenging setting: test-time prompt tuning with only unlabeled test samples and propose an improved structure-induced gradient regulating function to leverage the structured semantics of the meta-learning data for zero-shot generalization. This novel approach exploits the hierarchically clustered meta-learning data to model relationships between test-time data and meta-learning prototypes, facilitating the transfer of invariant knowledge without explicit annotations. Meanwhile, we introduce a structure complexity-informed strategy for adaptively constructing meta-training tasks and generating prototypes, which fully considers the diverse semantics within hierarchical clusters of different complexities. Comprehensive experiments demonstrate the state-of-the-art few- and zero-shot generalizability of our method.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TPAMI.2025.3604454	DOI Listing

Publication Analysis

Top Keywords

prompt tuning

structure-induced gradient

gradient regulating

regulating function

meta-learning data

prompt

tuning

meta-learning

data

gradient regulation

Similar Publications

Temporal Modeling With Frozen Vision-Language Foundation Models for Parameter-Efficient Text-Video Retrieval.

IEEE Trans Neural Netw Learn Syst

September 2025

Leqi Shen , Tianxiang Hao , Tao He , Yifeng Zhang , Pengzhang Liu

Temporal modeling plays an important role in the effective adaption of the powerful pretrained text-image foundation model into text-video retrieval. However, existing methods often rely on additional heavy trainable modules, such as transformer or BiLSTM, which are inefficient. In contrast, we avoid introducing such heavy components by leveraging frozen foundation models.

View Article and Find Full Text PDF

Similar Publications

Early Detection of Lung Metastases in Breast Cancer Using YOLOv10 and Transfer Learning: A Diagnostic Accuracy Study.

Med Sci Monit

September 2025

Department of Radiology, Faculty of Medicine, Erzincan Binali Yildirim University, Erzincan, Turkey.

Hakan Gokalp Taş , Mehmet Bilge Han Taş , Eyyup Yildiz , Sonay Aydin

BACKGROUND This study used CT imaging analyzed with deep learning techniques to assess the diagnostic accuracy of lung metastasis detection in patients with breast cancer. The aim of the research was to create and verify a system for detecting malignant and metastatic lung lesions that uses YOLOv10 and transfer learning. MATERIAL AND METHODS From January 2023 to 2024, CT scans of 16 patients with breast cancer who had confirmed lung metastases were gathered retrospectively from Erzincan Mengücek Gazi Training and Research Hospital.

View Article and Find Full Text PDF

Similar Publications

Development and evaluation of a lightweight large language model chatbot for medication enquiry.

PLOS Digit Health

September 2025

Singapore Health Services, Artificial Intelligence Office, Singapore.

Kabilan Elangovan , Jasmine Chiat Ling Ong , Liyuan Jin , Benjamin Jun Jie Seng , Yu Heng Kwan

Large Language Models (LLMs) show promise in augmenting digital health applications. However, development and scaling of large models face computational constraints, data security concerns and limitations of internet accessibility in some regions. We developed and tested Med-Pal, a medical domain-specific LLM-chatbot fine-tuned with a fine-grained, expert curated medication-enquiry dataset consisting of 1,100 question and answer pairs.

View Article and Find Full Text PDF

Similar Publications

Pay more attention to the robustness of LLMs on adversarial prompt for instruction data mining.

Neural Netw

August 2025

National Key Laboratory of Parallel and Distributed Computing, College of Computer Science and Technology, National University of Defense Technology, Hunan Changsha, 410073, China. Electronic address:

Qiang Wang , Dawei Feng , Xu Zhang , Ao Shen , Yang Xu

Instruction tuning has emerged as a paramount method for tailoring the behaviors of LLMs. Recent studies have unveiled the potential for LLMs to achieve high performance through fine-tuning with a limited quantity of high-quality instruction data. Instruction-Following Difficulty is one of the most representative approaches in instruction data mining, which involves selecting samples where LLMs fail to generate response that align with the provided instructions as the high-quality instruction data.

View Article and Find Full Text PDF

Similar Publications

Medical Entity Linking in Low-Resource Settings with Fine-Tuning-Free LLMs.

Stud Health Technol Inform

September 2025

Chair of Medical Informatics, Institute of AI and Informatics in Medicine (AIIM), TUM University Hospital, Technical University of Munich, Munich, Germany.

Suteera Seeha , Martin Boeker , Luise Modersohn

Introduction: Medical entity linking is an important task in biomedical natural language processing, aiming to align textual mentions of medical concepts with standardized concepts in ontologies. Most existing approaches rely on supervised models or domain-specific embeddings, which require large datasets and significant computational resources.

Objective: The objective of this work is (1) to investigate the effectiveness of large language models (LLMs) in improving both candidate generation and disambiguation for medical entity linking through synonym expansion and in-context learning, and (2) to evaluate this approach against traditional string-matching and supervised methods.

View Article and Find Full Text PDF

Similar Publications