Cross-Modal Conditioned Reconstruction for Language-Guided Medical Image Segmentation.

Xiaoshuang Huang , Hongxiang Li , Meng Cao , Long Chen , Chenyu You , Dong An

IEEE Trans Med Imaging

Published: April 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Recent developments underscore the potential of textual information in enhancing learning models for a deeper understanding of medical visual semantics. However, language-guided medical image segmentation still faces a challenging issue. Previous works employ implicit architectures to embed textual information. This leads to segmentation results that are inconsistent with the semantics represented by the language, sometimes even diverging significantly. To this end, we propose a novel cross-modal conditioned Reconstruction for Language-guided Medical Image Segmentation (RecLMIS) to explicitly capture cross-modal interactions, which assumes that well-aligned medical visual features and medical notes can effectively reconstruct each other. We introduce conditioned interaction to adaptively predict patches and words of interest. Subsequently, they are utilized as conditioning factors for mutual reconstruction to align with regions described in the medical notes. Extensive experiments demonstrate the superiority of our RecLMIS, surpassing LViT by 3.74% mIoU on the MosMedData+ dataset and 1.89% mIoU on the QATA-CoV19 dataset. More importantly, we achieve a relative reduction of 20.2% in parameter count and a 55.5% decrease in computational load. The code will be available at https://github.com/ShawnHuang497/RecLMIS.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TMI.2024.3523333	DOI Listing

Publication Analysis

Top Keywords

language-guided medical

medical image

image segmentation

cross-modal conditioned

conditioned reconstruction

reconstruction language-guided

medical visual

medical notes

medical

segmentation

Similar Publications

Language-guided multimodal domain generalization for outcome prediction of head and neck cancer.

Comput Biol Med

August 2025

Department of Radiation Oncology, UTSW, United States of America. Electronic address:

Rongfang Wang , JiaSheng Chen , Xinlong Zhang , Hui Liu , Zhiguo Zhou

Accurate prediction of head and neck cancer recurrence across medical institutions remains challenging due to inherent domain shifts in imaging data. Current domain generalization methods primarily focus on learning domain-invariant features from medical images, often overlooking structured clinical information that inherently exhibits cross-institutional consistency. To leverage clinical data and enhance the model's generalization, we propose an end-to-end Language-Guided Multimodal Domain Generalization (LGMDG) method.

View Article and Find Full Text PDF

Similar Publications

A Language-Guided Progressive Fusion Network with semantic density alignment for Medical Visual Question Answering.

J Biomed Inform

May 2025

School of Biomedical Engineering, Capital Medical University, Beijing, China; Laboratory for Clinical Medicine, Capital Medical University, Beijing, China; Beijing Key Laboratory of Fundamicationental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, China. Elect

Shuxian Du , Shuang Liang , Yu Gu

Medical Visual Question Answering (Med-VQA) is a critical multimodal task with the potential to address the scarcity and imbalance of medical resources. However, most existing studies overlook the limitations of the inconsistency in information density between medical images and text, as well as the long-tail distribution in datasets, which continue to make Med-VQA an open challenge. To overcome these issues, this study proposes a Language-Guided Progressive Fusion Network (LGPFN) with three key modules: Question-Guided Progressive Multimodal Fusion (QPMF), Language-Gate Mechanism (LGM), and Triple Semantic Feature Alignment (TriSFA).

View Article and Find Full Text PDF

Similar Publications

Cross-Modal Conditioned Reconstruction for Language-Guided Medical Image Segmentation.

IEEE Trans Med Imaging

April 2025

Xiaoshuang Huang , Hongxiang Li , Meng Cao , Long Chen , Chenyu You

View Article and Find Full Text PDF

Similar Publications

Smart microscopes of the future.

Nat Methods

July 2023

Center for Quantitative Cell Imaging, University of Wisconsin-Madison, Madison, WI, USA.

Anne E Carpenter , Beth A Cimini , Kevin W Eliceiri

We dream of a future where light microscopes have new capabilities: language-guided image acquisition, automatic image analysis based on extensive prior training from biologist experts, and language-guided image analysis for custom analyses. Most capabilities have reached the proof-of-principle stage, but implementation would be accelerated by efforts to gather appropriate training sets and make user-friendly interfaces.

View Article and Find Full Text PDF

Similar Publications

Comparative Effectiveness of Community-Based vs Clinic-Based Healthy Choices Motivational Intervention to Improve Health Behaviors Among Youth Living With HIV: A Randomized Clinical Trial.

JAMA Netw Open

August 2020

Health Psychology and Clinical Science Program, The Graduate Center, City University of New York, New York.

Sylvie Naar , Gabriel Robles , Karen Kolmodin MacDonell , Veronica Dinaj-Koci , Kit N Simpson

Importance: Youth living with HIV make up one-quarter of new infections and have high rates of risk behaviors but are significantly understudied. Effectiveness trials in real-world settings are needed to inform program delivery.

Objective: To compare the effectiveness of the Healthy Choices intervention delivered in a home or community setting vs a medical clinic.

View Article and Find Full Text PDF

Similar Publications