Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Recent developments underscore the potential of textual information in enhancing learning models for a deeper understanding of medical visual semantics. However, language-guided medical image segmentation still faces a challenging issue. Previous works employ implicit architectures to embed textual information. This leads to segmentation results that are inconsistent with the semantics represented by the language, sometimes even diverging significantly. To this end, we propose a novel cross-modal conditioned Reconstruction for Language-guided Medical Image Segmentation (RecLMIS) to explicitly capture cross-modal interactions, which assumes that well-aligned medical visual features and medical notes can effectively reconstruct each other. We introduce conditioned interaction to adaptively predict patches and words of interest. Subsequently, they are utilized as conditioning factors for mutual reconstruction to align with regions described in the medical notes. Extensive experiments demonstrate the superiority of our RecLMIS, surpassing LViT by 3.74% mIoU on the MosMedData+ dataset and 1.89% mIoU on the QATA-CoV19 dataset. More importantly, we achieve a relative reduction of 20.2% in parameter count and a 55.5% decrease in computational load. The code will be available at https://github.com/ShawnHuang497/RecLMIS.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TMI.2024.3523333DOI Listing

Publication Analysis

Top Keywords

language-guided medical
12
medical image
12
image segmentation
12
cross-modal conditioned
8
conditioned reconstruction
8
reconstruction language-guided
8
medical visual
8
medical notes
8
medical
7
segmentation
4

Similar Publications

Language-guided multimodal domain generalization for outcome prediction of head and neck cancer.

Comput Biol Med

August 2025

Department of Radiation Oncology, UTSW, United States of America. Electronic address:

Accurate prediction of head and neck cancer recurrence across medical institutions remains challenging due to inherent domain shifts in imaging data. Current domain generalization methods primarily focus on learning domain-invariant features from medical images, often overlooking structured clinical information that inherently exhibits cross-institutional consistency. To leverage clinical data and enhance the model's generalization, we propose an end-to-end Language-Guided Multimodal Domain Generalization (LGMDG) method.

View Article and Find Full Text PDF

A Language-Guided Progressive Fusion Network with semantic density alignment for Medical Visual Question Answering.

J Biomed Inform

May 2025

School of Biomedical Engineering, Capital Medical University, Beijing, China; Laboratory for Clinical Medicine, Capital Medical University, Beijing, China; Beijing Key Laboratory of Fundamicationental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing, China. Elect

Medical Visual Question Answering (Med-VQA) is a critical multimodal task with the potential to address the scarcity and imbalance of medical resources. However, most existing studies overlook the limitations of the inconsistency in information density between medical images and text, as well as the long-tail distribution in datasets, which continue to make Med-VQA an open challenge. To overcome these issues, this study proposes a Language-Guided Progressive Fusion Network (LGPFN) with three key modules: Question-Guided Progressive Multimodal Fusion (QPMF), Language-Gate Mechanism (LGM), and Triple Semantic Feature Alignment (TriSFA).

View Article and Find Full Text PDF

Recent developments underscore the potential of textual information in enhancing learning models for a deeper understanding of medical visual semantics. However, language-guided medical image segmentation still faces a challenging issue. Previous works employ implicit architectures to embed textual information.

View Article and Find Full Text PDF

Smart microscopes of the future.

Nat Methods

July 2023

Center for Quantitative Cell Imaging, University of Wisconsin-Madison, Madison, WI, USA.

We dream of a future where light microscopes have new capabilities: language-guided image acquisition, automatic image analysis based on extensive prior training from biologist experts, and language-guided image analysis for custom analyses. Most capabilities have reached the proof-of-principle stage, but implementation would be accelerated by efforts to gather appropriate training sets and make user-friendly interfaces.

View Article and Find Full Text PDF

Importance: Youth living with HIV make up one-quarter of new infections and have high rates of risk behaviors but are significantly understudied. Effectiveness trials in real-world settings are needed to inform program delivery.

Objective: To compare the effectiveness of the Healthy Choices intervention delivered in a home or community setting vs a medical clinic.

View Article and Find Full Text PDF