A Language Vision Model Approach for Automated Tumor Contouring in Radiation Oncology.

Yi Luo , Hamed Hooshangnejad , Xue Feng , Gaofeng Huang , Xiaojian Chen , Rui Zhang , Quan Chen , Wil Ngwa , Kai Ding

Bioengineering (Basel)

Department of Radiation Oncology and Molecular Radiation Sciences, Johns Hopkins University, Baltimore, MD 21287, USA.

Published: July 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Lung cancer ranks as the leading cause of cancer-related mortality worldwide. The complexity of tumor delineation, crucial for radiation therapy, requires expertise often unavailable in resource-limited settings. Artificial Intelligence (AI), particularly with advancements in deep learning (DL) and natural language processing (NLP), offers potential solutions yet is challenged by high false positive rates. The Oncology Contouring Copilot (OCC) system is developed to leverage oncologist expertise for precise tumor contouring using textual descriptions, aiming to increase the efficiency of oncological workflows by combining the strengths of AI with human oversight. Our OCC system initially identifies nodule candidates from CT scans. Employing Language Vision Models (LVMs) like GPT-4V, OCC then effectively reduces false positives with clinical descriptive texts, merging textual and visual data to automate tumor delineation, designed to elevate the quality of oncology care by incorporating knowledge from experienced domain experts. The deployment of the OCC system resulted in a 35.0% reduction in the false discovery rate, a 72.4% decrease in false positives per scan, and an F1-score of 0.652 across our dataset for unbiased evaluation. OCC represents a significant advance in oncology care, particularly through the use of the latest LVMs, improving contouring results by (1) streamlining oncology treatment workflows by optimizing tumor delineation and reducing manual processes; (2) offering a scalable and intuitive framework to reduce false positives in radiotherapy planning using LVMs; (3) introducing novel medical language vision prompt techniques to minimize LVM hallucinations with ablation study; and (4) conducting a comparative analysis of LVMs, highlighting their potential in addressing medical language vision challenges.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12383427	PMC
http://dx.doi.org/10.3390/bioengineering12080835	DOI Listing

Publication Analysis

Top Keywords

language vision

tumor delineation

occ system

false positives

tumor contouring

oncology care

medical language

language

tumor

oncology

Similar Publications

Performance of vision language models for optic disc swelling identification on fundus photographs.

Front Digit Health

August 2025

Department of Ophthalmology, Stanford University, Palo Alto, CA, United States.

Kelvin Zhenghao Li , Tuyet Thao Nguyen , Heather E Moss

Introduction: Vision language models (VLMs) combine image analysis capabilities with large language models (LLMs). Because of their multimodal capabilities, VLMs offer a clinical advantage over image classification models for the diagnosis of optic disc swelling by allowing a consideration of clinical context. In this study, we compare the performance of non-specialty-trained VLMs with different prompts in the classification of optic disc swelling on fundus photographs.

View Article and Find Full Text PDF

Similar Publications

Efficient spatio-temporal modeling for sign language recognition using CNN and RNN architectures.

Front Artif Intell

August 2025

School of Computation and Communication Science and Engineering, The Nelson Mandela African Institution of Science and Technology, Arusha, Tanzania.

Kasian Myagila , Devotha Godfrey Nyambo , Mussa Ally Dida

Computer vision has been identified as one of the solutions to bridge communication barriers between speech-impaired populations and those without impairment as most people are unaware of the sign language used by speech-impaired individuals. Numerous studies have been conducted to address this challenge. However, recognizing word signs, which are usually dynamic and involve more than one frame per sign, remains a challenge.

View Article and Find Full Text PDF

Similar Publications

Temporal Modeling With Frozen Vision-Language Foundation Models for Parameter-Efficient Text-Video Retrieval.

IEEE Trans Neural Netw Learn Syst

September 2025

Leqi Shen , Tianxiang Hao , Tao He , Yifeng Zhang , Pengzhang Liu

Temporal modeling plays an important role in the effective adaption of the powerful pretrained text-image foundation model into text-video retrieval. However, existing methods often rely on additional heavy trainable modules, such as transformer or BiLSTM, which are inefficient. In contrast, we avoid introducing such heavy components by leveraging frozen foundation models.

View Article and Find Full Text PDF

Similar Publications

Generative AI tools in reflective essays: Moderating moral injuries and epistemic injustices.

S Afr Fam Pract (2004)

August 2025

School of Public Health, Faculty of Health Sciences, University of Cape Town, Cape Town.

Nontsikelelo O Mapukata

The emergence of large language models such as ChatGPT is already influencing health care delivery, research and training for the next cohort of health care professionals. In a consumer-driven market, their capabilities to generate new forms of knowing and doing for experts and novices present both promises and threats to the livelihood of patients. This article explores burdens imposed by the use of generative artificial intelligence tools in reflective essays submitted by a fifth of first-year health sciences students.

View Article and Find Full Text PDF

Similar Publications

Artificial intelligence in wearable biosensing: Enhancing data analysis and decision-making.

Prog Mol Biol Transl Sci

September 2025

Institute of Intelligent Machines, Chinese Academy of Science, Hefei, Anhui, P.R. China. Electronic address:

Zenghui Ding , Wenhui Fang , Jixue Zhang , Changguo Fang , Yining Sun

The convergence of artificial intelligence (AI) and wearable biosensors is revolutionizing personalized healthcare, enabling continuous monitoring, early detection of health issues, which enhances the efficiency of data processing and real-time decision-making. Multimodal Large Language Models (MLLMs) play a pivotal role in this ecosystem by offering advanced capabilities in analyzing complex health data, understanding nuanced health contexts, and generating tailored health recommendations instantaneously. This study provides insights into how machine learning, deep learning algorithms, and MLLM can work together to facilitate the analysis of physiologic data for real-time monitoring and early warning systems as well as complex decision support mechanisms.

View Article and Find Full Text PDF

Similar Publications