ProtoSAM-3D: Interactive semantic segmentation in volumetric medical imaging via a Segment Anything Model and mask-level prototypes.

Yiqing Shen , David Dreizin , Blanca Inigo , Mathias Unberath

Comput Med Imaging Graph

Johns Hopkins University, Baltimore, USA.

Published: April 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Semantic segmentation of volumetric medical images is essential for accurate delineation of anatomic structures and pathology, enabling quantitative analysis in precision medicine applications. While volumetric segmentation has been extensively studied, most existing methods require full supervision and struggle to generalize to new classes at inference time, particularly for irregular, ill-defined targets such as tumors, where fine-grained, high-salience segmentation is required. Consequently, conventional semantic segmentation methods cannot easily offer zero/few-shot generalization to segment objects of interest beyond their closed training set. Foundation models, such as the Segment Anything Model (SAM), have demonstrated promising zero-shot generalization for interactive instance segmentation based on user prompts. However, these models sacrifice semantic knowledge for generalization capabilities that largely rely on collaborative user prompting to inject semantics. For volumetric medical image analysis, a unified approach that combines the semantic understanding of conventional segmentation methods with the flexible, prompt-driven capabilities of SAM is essential for comprehensive anatomical delineation On the one hand, it is natural to exploit anatomic knowledge to enable semantic segmentation without any user interaction. On the other hand, SAM-like approaches to segment unknown classes via prompting provide the needed flexibility to segment structures beyond the closed training set, enabling quantitative analysis. To address these needs in a unified framework, we introduce ProtoSAM-3D, which extends SAMs to semantic segmentation of volumetric data via a novel mask-level prototype prediction approach while retaining the flexibility of SAM Our model utilizes an innovative spatially-aware Transformer to fuse instance-specific intermediate representations from the SAM encoder and decoder, obtaining a comprehensive feature embedding for each mask. These embeddings are then classified by computing similarity with learned prototypes. By predicting prototypes instead of classes directly, ProtoSAM-3D gains the flexibility to rapidly adapt to new classes with minimal retraining. Furthermore, we introduce an auto-prompting method to enable semantic segmentation of known classes without user interaction. We demonstrate state-of-the-art zero/few-shot performance on multi-organ segmentation in CT and MRI. Experimentally, ProtoSAM-3D achieves competitive performance compared to fully supervised methods. Our work represents a step towards interactive semantic segmentation models with SAM for volumetric medical image processing.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11875884	PMC
http://dx.doi.org/10.1016/j.compmedimag.2025.102501	DOI Listing

Publication Analysis

Top Keywords

semantic segmentation

volumetric medical

segmentation

segmentation volumetric

semantic

interactive semantic

segment model

enabling quantitative

quantitative analysis

segmentation methods

Similar Publications

Incremental Learning for Defect Segmentation With Efficient Transformer Semantic Complement.

IEEE Trans Neural Netw Learn Syst

September 2025

Xiqi Li , Zhifu Huang , Ge Ma , Yu Liu

In industrial scenarios, semantic segmentation of surface defects is vital for identifying, localizing, and delineating defects. However, new defect types constantly emerge with product iterations or process updates. Existing defect segmentation models lack incremental learning capabilities, and direct fine-tuning (FT) often leads to catastrophic forgetting.

View Article and Find Full Text PDF

Similar Publications

Large-Scale Dermatopathology Dataset for Lesion Segmentation: Model Development and Analysis.

J Korean Med Sci

September 2025

Department of Transdisciplinary Medicine, Seoul National University Hospital, Seoul, Korea.

Yosep Chong , Daseul Park , Youngbin Ahn , Yoonjin Kwak , Seyeon Park

Background: With the increasing incidence of skin cancer, the workload for pathologists has surged. The diagnosis of skin samples, especially for complex lesions such as malignant melanomas and melanocytic lesions, has shown higher diagnostic variability compared to other organ samples. Consequently, artificial intelligence (AI)-based diagnostic assistance programs are increasingly needed to support dermatopathologists in achieving more consistent diagnoses.

View Article and Find Full Text PDF

Similar Publications

Cherry-Net: real-time segmentation algorithm of cherry maturity based on improved PIDNet.

Front Plant Sci

September 2025

College of Big Data, Yunnan Agricultural University, Kunming, China.

Jie Cui , Lilian Zhang , Lutao Gao , Chunhui Bai , Linnan Yang

Introduction: Accurate identification of cherry maturity and precise detection of harvestable cherry contours are essential for the development of cherry-picking robots. However, occlusion, lighting variation, and blurriness in natural orchard environments present significant challenges for real-time semantic segmentation.

Methods: To address these issues, we propose a machine vision approach based on the PIDNet real-time semantic segmentation framework.

View Article and Find Full Text PDF

Similar Publications

GESur_Net: attention-guided network for surgical instrument segmentation in gastrointestinal endoscopy.

Med Biol Eng Comput

September 2025

Key Laboratory of Mechanism Theory and Equipment Design of Ministry of Education, Tianjin University, Tianjin, 300072, China.

Yaru Ma , Yuying Liu , Xin Chen , Zhongqing Zheng , Yufeng Wang

Surgical instrument segmentation plays an important role in robotic autonomous surgical navigation systems as it can accurately locate surgical instruments and estimate their posture, which helps surgeons understand the position and orientation of the instruments. However, there are still some problems affecting segmentation accuracy, like insufficient attention to the edges and center of surgical instruments, insufficient usage of low-level feature details, etc. To address these issues, a lightweight network for surgical instrument segmentation in gastrointestinal (GI) endoscopy (GESur_Net) is proposed.

View Article and Find Full Text PDF

Similar Publications

Improving Generalized Visual Grounding with Instance-aware Joint Learning.

IEEE Trans Pattern Anal Mach Intell

September 2025

Ming Dai , Wenxuan Cheng , Jiang-Jiang Liu , Lingfeng Yang , Zhenhua Feng

Generalized visual grounding tasks, including Generalized Referring Expression Comprehension (GREC) and Segmentation (GRES), extend the classical visual grounding paradigm by accommodating multi-target and non-target scenarios. Specifically, GREC focuses on accurately identifying all referential objects at the coarse bounding box level, while GRES aims for achieve fine-grained pixel-level perception. However, existing approaches typically treat these tasks independently, overlooking the benefits of jointly training GREC and GRES to ensure consistent multi-granularity predictions and streamline the overall process.

View Article and Find Full Text PDF

Similar Publications