98%
921
2 minutes
20
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1038/s41591-024-02956-1 | DOI Listing |
IEEE Trans Ultrason Ferroelectr Freq Control
September 2025
Accurate and generalizable object segmentation in ultrasound imaging remains a significant challenge due to anatomical variability, diverse imaging protocols, and limited annotated data. In this study, we propose a prompt-driven vision-language model (VLM) that integrates Grounding DINO with SAM2 to enable object segmentation across multiple ultrasound organs. A total of 18 public ultrasound datasets, encompassing the breast, thyroid, liver, prostate, kidney, and paraspinal muscle, were utilized.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
September 2025
Prompt tuning, a recently emerging paradigm, adapts vision-language pre-trained models to new tasks efficiently by learning "soft prompts" for frozen models. However, in few-shot scenarios, its effectiveness is limited by sensitivity to the initialization and the time-consuming search for optimal initialization, hindering rapid adaptation. Additionally, prompt tuning risks reducing the models' generalizability due to overfitting on scarce training samples.
View Article and Find Full Text PDFSensors (Basel)
August 2025
Department of Informatics, Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo 182-8585, Japan.
Current vision-language models (VLMs) are well-adapted for general visual understanding tasks. However, they perform inadequately when handling complex visual tasks related to human poses and actions due to the lack of specialized vision-language instruction-following data. We introduce a method for generating such data by integrating human keypoints with traditional visual features such as captions and bounding boxes, enabling more precise understanding of human-centric scenes.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
August 2025
Recent progress in vision Transformers exhibits great success in various tasks driven by the new spatial modeling mechanism based on dot-product self-attention. In this paper, we show that the key ingredients behind the vision Transformers, namely input-adaptive, long-range and high-order spatial interactions, can also be efficiently implemented with a convolution-based framework. We present the Recursive Gated Convolution (g nConv) that performs high-order spatial interactions with gated convolutions and recursive designs.
View Article and Find Full Text PDFBMC Med Imaging
August 2025
Department of Ultrasound Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province, China.
Background: Abdominal ultrasound is non-invasive and efficient, yet acquiring standard planes remains challenging due to operator dependency and procedural complexity. We propose AbVLM-Q, a vision-language framework for automated quality assessment of abdominal ultrasound standard planes.
Methods: In this study, we assembled a multi-center dataset comprising 7,766 abdominal ultrasound scans, which were randomly divided into training (70%), validation (15%), and testing (15%) subsets.