Publications by Danielle S Bitterman

Publications by authors named "Danielle S Bitterman"

Page 1 of 4

Regulating medical AI before midnight strikes: Addressing bias, data fidelity, and implementation challenges.

Cameron J Sabet , Ketan Tamirisa , Danielle Sara Bitterman , Leo Anthony Celi

PLOS Digit Health

August 2025

View Article and Find Full Text PDF

Stereotactic MRI Guided Adaptive Radiotherapy: a pooled analysis of a master prospective trial.

Jonathan E Leeman , Kee-Young Shin , Alexander Droznin , Paul Catalano , Daniel N Cagney , Danielle S Bitterman

J Natl Cancer Inst

August 2025

Background: Master clinical trial protocol structures offer administrative, procedural and statistical advantages but have not been applied in assessing new radiotherapy devices. Herein, we report on a pooled analysis from a first-of-kind master trial evaluating stereotactic MRI-guided adaptive radiotherapy (SMART).

Methods: Subjects were enrolled on a prospective master protocol evaluating SMART for multiple oncologic indications.

View Article and Find Full Text PDF

One shot at trust: building credible evidence for medical artificial intelligence.

Ramez Kouzy , Julian C Hong , Danielle S Bitterman

Lancet Digit Health

July 2025

View Article and Find Full Text PDF

The TRIPOD-LLM reporting guideline for studies using large language models: a Korean translation.

Jack Gallifant , Majid Afshar , Saleem Ameen , Yindalon Aphinyanaphongs , Shan Chen , Danielle S Bitterman

Ewha Med J

July 2025

View Article and Find Full Text PDF

Imaging biomarkers of ageing: a review of artificial intelligence-based approaches for age estimation.

Fridolin Haugg , Grace Lee , John He , Justin Johnson , Anna Zapaishchykova , Danielle S Bitterman

Lancet Healthy Longev

July 2025

Chronological age, although commonly used in clinical practice, fails to capture individual variations in rates of ageing and physiological decline. Recent advances in artificial intelligence (AI) have transformed the estimation of biological age using various imaging techniques. This Review consolidates AI developments in age prediction across brain, chest, abdominal, bone, and facial imaging using diverse methods, including MRI, CT, x-ray, and photographs.

View Article and Find Full Text PDF

International partnership for governing generative artificial intelligence models in medicine.

Jasmine Chiat Ling Ong , Yilin Ning , Gary S Collins , Danielle S Bitterman , Ashley N Beecy

Nat Med

June 2025

View Article and Find Full Text PDF

Baseline atrial volume indices and major adverse cardiac events following thoracic radiotherapy.

Edmund M Qiao , John He , Katrina D Silos , Jordan O Gasho , Patrick Belen , Danielle S Bitterman

Front Cardiovasc Med

June 2025

Introduction: Patients receiving thoracic radiotherapy (RT) have an increased risk of major adverse cardiac events (MACE) posttreatment. We utilized machine learning (ML) to discover novel predictors of MACE and validated them on an external cohort.

Methods: This multi-institutional retrospective study included 984 patients [ = 803 non-small cell lung cancer (NSCLC), = 181 breast cancer] treated with radiotherapy.

View Article and Find Full Text PDF

Reliability of Large Language Model Knowledge Across Brand and Generic Cancer Drug Names.

Jack Gallifant , Shan Chen , Sandeep K Jain , Pedro Moreira , Umit Topaloglu , Danielle S Bitterman

JCO Clin Cancer Inform

June 2025

Purpose: To evaluate the performance and consistency of large language models (LLMs) across brand and generic oncology drug names in various clinical tasks, addressing concerns about potential fluctuations in LLM performance because of subtle phrasing differences that could affect patient care.

Methods: This study evaluated three LLMs (GPT-3.5-turbo-0125, GPT-4-turbo, and GPT-4o) using drug names from HemOnc ontology.

View Article and Find Full Text PDF

Large language models in oncology: a review.

David Chen , Rod Parsa , Karl Swanson , John-Jose Nunez , Andrew Critch , Danielle S Bitterman

BMJ Oncol

May 2025

Large language models (LLMs) have demonstrated emergent human-like capabilities in natural language processing, leading to enthusiasm about their integration in healthcare environments. In oncology, where synthesising complex, multimodal data is essential, LLMs offer a promising avenue for supporting clinical decision-making, enhancing patient care, and accelerating research. This narrative review aims to highlight the current state of LLMs in medicine; applications of LLMs in oncology for clinicians, patients, and translational research; and future research directions.

View Article and Find Full Text PDF

FaceAge, a deep learning system to estimate biological age from face photographs to improve prognostication: a model development and validation study.

Dennis Bontempi , Osbert Zalay , Danielle S Bitterman , Nicolai Birkbak , Derek Shyr

Lancet Digit Health

June 2025

Background: As humans age at different rates, physical appearance can yield insights into biological age and physiological health more reliably than chronological age. In medicine, however, appearance is incorporated into medical judgements in a subjective and non-standardised way. In this study, we aimed to develop and validate FaceAge, a deep learning system to estimate biological age from easily obtainable and low-cost face photographs.

View Article and Find Full Text PDF

When Helpfulness Backfires: LLMs and the Risk of Misinformation Due to Sycophantic Behavior.

Shan Chen , Mingye Gao , Kuleen Sasse , Thomas Hartvigsen , Brian Anthony , Danielle S Bitterman

Res Sq

April 2025

Large language models (LLMs) exhibit a critical vulnerability arising from being trained to be helpful: a tendency to comply with illogical requests that would generate misinformation, even when they have the knowledge to identify the request as illogical. This study investigated this vulnerability in the medical domain, evaluating five frontier LLMs using prompts that misrepresent equivalent drug relationships. We tested baseline compliance, the impact of prompts allowing rejection and emphasizing factual recall, and the effects of fine-tuning on a dataset of illogical requests, including out-of-distribution generalization.

View Article and Find Full Text PDF

The use of large language models to enhance cancer clinical trial educational materials.

Mingye Gao , Aman Varshney , Shan Chen , Vikram Goddla , Jack Gallifant , Danielle S Bitterman

JNCI Cancer Spectr

March 2025

Background: Adequate patient awareness and understanding of cancer clinical trials is essential for trial recruitment, informed decision making, and protocol adherence. Although large language models (LLMs) have shown promise for patient education, their role in enhancing patient awareness of clinical trials remains unexplored. This study explored the performance and risks of LLMs in generating trial-specific educational content for potential participants.

View Article and Find Full Text PDF

Collaborative large language models for automated data extraction in living systematic reviews.

Muhammad Ali Khan , Umair Ayub , Syed Arsalan Ahmed Naqvi , Kaneez Zahra Rubab Khakwani , Zaryab Bin Riaz Sipra , Danielle S Bitterman

J Am Med Inform Assoc

April 2025

Objective: Data extraction from the published literature is the most laborious step in conducting living systematic reviews (LSRs). We aim to build a generalizable, automated data extraction workflow leveraging large language models (LLMs) that mimics the real-world 2-reviewer process.

Materials And Methods: A dataset of 10 trials (22 publications) from a published LSR was used, focusing on 23 variables related to trial, population, and outcomes data.

View Article and Find Full Text PDF

Preventing unrestricted and unmonitored AI experimentation in healthcare through transparency and accountability.

Donnella S Comeau , Danielle S Bitterman , Leo Anthony Celi

NPJ Digit Med

January 2025

The integration of large language models (LLMs) into electronic health records offers potential benefits but raises significant ethical, legal, and operational concerns, including unconsented data use, lack of governance, and AI-related malpractice accountability. Sycophancy, feedback loop bias, and data reuse risk amplifying errors without proper oversight. To safeguard patients, especially the vulnerable, clinicians must advocate for patient-centered education, ethical practices, and robust oversight to prevent harm.

View Article and Find Full Text PDF

Uncertainty estimation in diagnosis generation from large language models: next-word probability is not pre-test probability.

Yanjun Gao , Skatje Myers , Shan Chen , Dmitriy Dligach , Timothy Miller , Danielle S Bitterman

JAMIA Open

February 2025

Objective: To evaluate large language models (LLMs) for pre-test diagnostic probability estimation and compare their uncertainty estimation performance with a traditional machine learning classifier.

Materials And Methods: We assessed 2 instruction-tuned LLMs, Mistral-7B-Instruct and Llama3-70B-chat-hf, on predicting binary outcomes for Sepsis, Arrhythmia, and Congestive Heart Failure (CHF) using electronic health record (EHR) data from 660 patients. Three uncertainty estimation methods-Verbalized Confidence, Token Logits, and LLM Embedding+XGB-were compared against an eXtreme Gradient Boosting (XGB) classifier trained on raw EHR data.

View Article and Find Full Text PDF

The TRIPOD-LLM reporting guideline for studies using large language models.

Jack Gallifant , Majid Afshar , Saleem Ameen , Yindalon Aphinyanaphongs , Shan Chen , Danielle S Bitterman

Nat Med

January 2025

Large language models (LLMs) are rapidly being adopted in healthcare, necessitating standardized reporting guidelines. We present transparent reporting of a multivariable model for individual prognosis or diagnosis (TRIPOD)-LLM, an extension of the TRIPOD + artificial intelligence statement, addressing the unique challenges of LLMs in biomedical applications. TRIPOD-LLM provides a comprehensive checklist of 19 main items and 50 subitems, covering key aspects from title to discussion.

View Article and Find Full Text PDF

LCD benchmark: long clinical document benchmark on mortality prediction for language models.

WonJin Yoon , Shan Chen , Yanjun Gao , Zhanzhan Zhao , Dmitriy Dligach , Danielle S Bitterman

J Am Med Inform Assoc

February 2025

Objectives: The application of natural language processing (NLP) in the clinical domain is important due to the rich unstructured information in clinical documents, which often remains inaccessible in structured data. When applying NLP methods to a certain domain, the role of benchmark datasets is crucial as benchmark datasets not only guide the selection of best-performing models but also enable the assessment of the reliability of the generated outputs. Despite the recent availability of language models capable of longer context, benchmark datasets targeting long clinical document classification tasks are absent.

View Article and Find Full Text PDF

Artificial intelligence research in radiation oncology: a practical guide for the clinician on concepts and methods.

Frank J P Hoebers , Leonard Wee , Jirapat Likitlersuang , Raymond H Mak , Danielle S Bitterman

BJR Open

January 2024

The use of artificial intelligence (AI) holds great promise for radiation oncology, with many applications being reported in the literature, including some of which are already in clinical use. These are mainly in areas where AI provides benefits in efficiency (such as automatic segmentation and treatment planning). Prediction models that directly impact patient decision-making are far less mature in terms of their application in clinical practice.

View Article and Find Full Text PDF

Collaborative Large Language Models for Automated Data Extraction in Living Systematic Reviews.

Muhammad Ali Khan , Umair Ayub , Syed Arsalan Ahmed Naqvi , Kaneez Zahra Rubab Khakwani , Zaryab Bin Riaz Sipra , Danielle S Bitterman

medRxiv

September 2024

Objective: Data extraction from the published literature is the most laborious step in conducting living systematic reviews (LSRs). We aim to build a generalizable, automated data extraction workflow leveraging large language models (LLMs) that mimics the real-world two-reviewer process.

Materials And Methods: A dataset of 10 clinical trials (22 publications) from a published LSR was used, focusing on 23 variables related to trial, population, and outcomes data.

View Article and Find Full Text PDF

Improving Patient Engagement: Is There a Role for Large Language Models?

Ramez Kouzy , Danielle S Bitterman

Int J Radiat Oncol Biol Phys

November 2024

View Article and Find Full Text PDF

Ethical debates amidst flawed healthcare artificial intelligence metrics.

Jack Gallifant , Danielle S Bitterman , Leo Anthony Celi , Judy W Gichoya , Joao Matos

NPJ Digit Med

September 2024

Healthcare AI faces an ethical dilemma between selective and equitable deployment, exacerbated by flawed performance metrics. These metrics inadequately capture real-world complexities and biases, leading to premature assertions of effectiveness. Improved evaluation practices, including continuous monitoring and silent evaluation periods, are crucial.

View Article and Find Full Text PDF

The TRIPOD-LLM Statement: A Targeted Guideline For Reporting Large Language Models Use.

Jack Gallifant , Majid Afshar , Saleem Ameen , Yindalon Aphinyanaphongs , Shan Chen , Danielle S Bitterman

medRxiv

July 2024

Article Synopsis

TRIPOD-LLM is a new set of reporting guidelines specifically designed for the use of Large Language Models (LLMs) in biomedical research, aiming to standardize transparency and quality in healthcare applications.
The guidelines include a checklist with 19 main items and 50 subitems, adaptable to various research designs, emphasizing the importance of human oversight and task-specific performance.
An interactive website is provided to help researchers easily complete the guidelines and generate submissions, with the intention of continually updating the document as the field evolves.

View Article and Find Full Text PDF

Navigating the Complexities of Artificial Intelligence-Enabled Real-World Data Collection for Oncology Pharmacovigilance.

Jack Gallifant , Leo Anthony Celi , Elad Sharon , Danielle S Bitterman

JCO Clin Cancer Inform

May 2024

This new editorial discusses the promise and challenges of successful integration of natural language processing methods into electronic health records for timely, robust, and fair oncology pharmacovigilance.

View Article and Find Full Text PDF

The effect of using a large language model to respond to patient messages.

Shan Chen , Marco Guevara , Shalini Moningi , Frank Hoebers , Hesham Elhalawani , Danielle S Bitterman

Lancet Digit Health

June 2024

View Article and Find Full Text PDF

LCD Benchmark: Long Clinical Document Benchmark on Mortality Prediction for Language Models.

WonJin Yoon , Shan Chen , Yanjun Gao , Zhanzhan Zhao , Dmitriy Dligach , Danielle S Bitterman

medRxiv

July 2024

Objective: The application of Natural Language Processing (NLP) in the clinical domain is important due to the rich unstructured information in clinical documents, which often remains inaccessible in structured data. When applying NLP methods to a certain domain, the role of benchmark datasets is crucial as benchmark datasets not only guide the selection of best-performing models but also enable the assessment of the reliability of the generated outputs. Despite the recent availability of language models (LMs) capable of longer context, benchmark datasets targeting long clinical document classification tasks are absent.

View Article and Find Full Text PDF