Leveraging Multi-Text Joint Prompts in SAM for Robust Medical Image Segmentation.

Xu Zhang , Huangxuan Zhao , Lefei Zhang , Yuan Xiong

IEEE J Biomed Health Inform

Published: September 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

The Segment Anything Model (SAM) has attracted considerable attention due to its impressive performance and demonstrates potential in medical image segmentation. Compared to SAM's native point and bounding box prompts, text prompts offer a simpler and more efficient alternative in the medical field, yet this approach remains relatively underexplored. In this paper, we propose a SAM-based framework that integrates a pre-trained vision-language model to generate referring prompts, with SAM handling the segmentation task. The outputs from multimodal models such as CLIP serve as input to SAM's prompt encoder. A critical challenge stems from the inherent complexity of medical text descriptions: they typically encompass anatomical characteristics, imaging modalities, and diagnostic priorities, resulting in information redundancy and semantic ambiguity. To address this, we propose a text decomposition-recomposition strategy. First, clinical narratives are parsed into atomic semantic units (appearance, location, pathology, and so on). These elements are then recombined into optimized text expressions. We employ a cross-attention module among multiple texts to interact with the joint features, ensuring that the model focuses on features corresponding to effective descriptions. To validate the effectiveness of our method, we conducted experiments on several datasets. Compared to the native SAM based on geometric prompts, our model shows improved performance and usability.

Download full-text PDF	Source
http://dx.doi.org/10.1109/JBHI.2025.3607023	DOI Listing

Publication Analysis

Top Keywords

prompts sam

medical image

image segmentation

prompts

leveraging multi-text

multi-text joint

joint prompts

sam

sam robust

medical

Similar Publications

Leveraging Multi-Text Joint Prompts in SAM for Robust Medical Image Segmentation.

IEEE J Biomed Health Inform

September 2025

Xu Zhang , Huangxuan Zhao , Lefei Zhang , Yuan Xiong

View Article and Find Full Text PDF

Similar Publications

Fine-grained multiclass nuclei segmentation with molecular empowered all-in-SAM model.

J Med Imaging (Bellingham)

September 2025

Vanderbilt University, Data Science Institute, Nashville, Tennessee, United States.

Xueyuan Li , Can Cui , Ruining Deng , Yucheng Tang , Quan Liu

Purpose: Recent developments in computational pathology have been driven by advances in vision foundation models (VFMs), particularly the Segment Anything Model (SAM). This model facilitates nuclei segmentation through two primary methods: prompt-based zero-shot segmentation and the use of cell-specific SAM models for direct segmentation. These approaches enable effective segmentation across a range of nuclei and cells.

View Article and Find Full Text PDF

Similar Publications

DVF-YOLO-Seg: A two-stage breast mass segmentation model with enhanced feature extraction and small lesion detection.

Digit Health

September 2025

Department of Respiratory and Critical Care Medicine, The Sixth Affiliated Hospital of Xinjiang Medical University, Urumqi, China.

Halidanmu Abudukelimu , Yuxin Gao , Abudukelimu Abulizi , Mayilamu Musideke , Shuqin Wu

Objective: Accurate segmentation of breast lesions, especially small ones, remains challenging in digital mammography due to complex anatomical structures and low-contrast boundaries. This study proposes DVF-YOLO-Seg, a two-stage segmentation framework designed to improve feature extraction and enhance small-lesion detection performance in mammographic images.

Methods: The proposed method integrates an enhanced YOLOv10-based detection module with a segmentation stage based on the Visual Reference Prompt Segment Anything Model (VRP-SAM).

View Article and Find Full Text PDF

Similar Publications

Grounding DINO-US-SAM: Text-Prompted Multi-Organ Segmentation in Ultrasound with LoRA-Tuned Vision-Language Models.

IEEE Trans Ultrason Ferroelectr Freq Control

September 2025

Hamza Rasaee , Taha Koleilat , Hassan Rivaz

Accurate and generalizable object segmentation in ultrasound imaging remains a significant challenge due to anatomical variability, diverse imaging protocols, and limited annotated data. In this study, we propose a prompt-driven vision-language model (VLM) that integrates Grounding DINO with SAM2 to enable object segmentation across multiple ultrasound organs. A total of 18 public ultrasound datasets, encompassing the breast, thyroid, liver, prostate, kidney, and paraspinal muscle, were utilized.

View Article and Find Full Text PDF

Similar Publications

Intelligent Detection and Description of Foreign Object Debris on Airport Pavements via Enhanced YOLOv7 and GPT-Based Prompt Engineering.

Sensors (Basel)

August 2025

School of Transportation, Southeast University, Nanjing 211189, China.

Hanglin Cheng , Ruoxi Zhang , Ruiheng Zhang , Yihao Li , Yang Lei

Foreign Object Debris (FOD) on airport pavements poses a serious threat to aviation safety, making accurate detection and interpretable scene understanding crucial for operational risk management. This paper presents an integrated multi-modal framework that combines an enhanced YOLOv7-X detector, a cascaded YOLO-SAM segmentation module, and a structured prompt engineering mechanism to generate detailed semantic descriptions of detected FOD. Detection performance is improved through the integration of Coordinate Attention, Spatial-Depth Conversion (SPD-Conv), and a Gaussian Similarity IoU (GSIoU) loss, leading to a 3.

View Article and Find Full Text PDF

Similar Publications