Adapting vision-language AI models to cardiology tasks.

Nat Med

Department of Medicine, Division of Cardiology, Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA.

Published: May 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Download full-text PDF	Source
http://dx.doi.org/10.1038/s41591-024-02956-1	DOI Listing

Publication Analysis

Top Keywords

adapting vision-language

vision-language models

models cardiology

cardiology tasks

adapting

models

cardiology

tasks

Similar Publications

Grounding DINO-US-SAM: Text-Prompted Multi-Organ Segmentation in Ultrasound with LoRA-Tuned Vision-Language Models.

IEEE Trans Ultrason Ferroelectr Freq Control

September 2025

Hamza Rasaee , Taha Koleilat , Hassan Rivaz

Accurate and generalizable object segmentation in ultrasound imaging remains a significant challenge due to anatomical variability, diverse imaging protocols, and limited annotated data. In this study, we propose a prompt-driven vision-language model (VLM) that integrates Grounding DINO with SAM2 to enable object segmentation across multiple ultrasound organs. A total of 18 public ultrasound datasets, encompassing the breast, thyroid, liver, prostate, kidney, and paraspinal muscle, were utilized.

View Article and Find Full Text PDF

Similar Publications

Structure-Induced Gradient Regulation for Generalizable Vision-Language Models.

IEEE Trans Pattern Anal Mach Intell

September 2025

Juncheng Li , Minghe Gao , Siliang Tang , Longhui Wei , Jun Xiao

Prompt tuning, a recently emerging paradigm, adapts vision-language pre-trained models to new tasks efficiently by learning "soft prompts" for frozen models. However, in few-shot scenarios, its effectiveness is limited by sensitivity to the initialization and the time-consuming search for optimal initialization, hindering rapid adaptation. Additionally, prompt tuning risks reducing the models' generalizability due to overfitting on scarce training samples.

View Article and Find Full Text PDF

Similar Publications

LLaVA-Pose: Keypoint-Integrated Instruction Tuning for Human Pose and Action Understanding.

Sensors (Basel)

August 2025

Department of Informatics, Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo 182-8585, Japan.

Dewen Zhang , Tahir Hussain , Wangpeng An , Hayaru Shouno

Current vision-language models (VLMs) are well-adapted for general visual understanding tasks. However, they perform inadequately when handling complex visual tasks related to human poses and actions due to the lack of specialized vision-language instruction-following data. We introduce a method for generating such data by integrating human keypoints with traditional visual features such as captions and bounding boxes, enabling more precise understanding of human-centric scenes.

View Article and Find Full Text PDF

Similar Publications

Efficient High-Order Spatial Interactions for Visual Perception.

IEEE Trans Pattern Anal Mach Intell

August 2025

Zuyan Liu , Yongming Rao , Wenliang Zhao , Jie Zhou , Jiwen Lu

Recent progress in vision Transformers exhibits great success in various tasks driven by the new spatial modeling mechanism based on dot-product self-attention. In this paper, we show that the key ingredients behind the vision Transformers, namely input-adaptive, long-range and high-order spatial interactions, can also be efficiently implemented with a convolution-based framework. We present the Recursive Gated Convolution (g nConv) that performs high-order spatial interactions with gated convolutions and recursive designs.

View Article and Find Full Text PDF

Similar Publications

AbVLM-Q: intelligent quality assessment for abdominal ultrasound standard planes via vision-language modeling.

BMC Med Imaging

August 2025

Department of Ultrasound Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province, China.

Baohua Wang , Yaqian Wang , Yanhua Chu , Ke Zhang , Lei Liu

Background: Abdominal ultrasound is non-invasive and efficient, yet acquiring standard planes remains challenging due to operator dependency and procedural complexity. We propose AbVLM-Q, a vision-language framework for automated quality assessment of abdominal ultrasound standard planes.

Methods: In this study, we assembled a multi-center dataset comprising 7,766 abdominal ultrasound scans, which were randomly divided into training (70%), validation (15%), and testing (15%) subsets.

View Article and Find Full Text PDF

Similar Publications