Publications by Timothy A Miller | LitMetric

Publications by authors named "Timothy A Miller"

Page 1 of 4

Natural Language Processing to Build a Multicenter Computable Phenotype Library for Adults with Congenital Heart Disease.

Spencer Thomas , Angus Dawson , Hifsa Chaudhry , Sidra Ahmad , Xiyu Ding , Timothy A Miller

medRxiv

August 2025

Objective: Our objective was to build classifiers for multiple phenotypes that categorize a cohort of adults with congenital heart disease (ACHD), that can be used to populate variables in a biobank.

Materials And Methods: A dataset of 1492 ACHD patients, with expert-created labels for eight phenotypes, was created and used to train classifiers with three different architectures. A larger unlabeled dataset containing 15869 patients was used to pre-train the classifiers, and a 20% subset of the unlabeled dataset was used to validate the classifier predictions.

View Article and Find Full Text PDF

Toward Digital Twins in the Intensive Care Unit: A Medication Management Case Study.

Behnaz Eslami , Majid Afshar , Samie Tootooni , Timothy A Miller , Matthew M Churpek

medRxiv

August 2025

Objective: To evaluate the efficacy of digital twins developed using a large language model (LLaMA-3), fine-tuned with Low-Rank Adapters (LoRA) on ICU physician notes, and to determine whether specialty-specific training enhances treatment recommendation accuracy compared to other ICU specialties or zero-shot baselines.

Materials And Methods: Digital twins were created using LLaMA-3 fine-tuned on discharge summaries from the MIMIC-III dataset, where medications were masked to construct training and testing datasets. The medical ICU dataset (1,000 notes) was used for evaluation, and performance was assessed using BERTScore and ROUGE-L.

View Article and Find Full Text PDF

FDA Approval of Cardiac Valve Devices Implanted in a National Cohort of Pediatric Patients, 2016-2022.

Susmitha Wunnava , Timothy A Miller , Meena Nathan , Florence T Bourgeois

JAMA Pediatr

May 2025

View Article and Find Full Text PDF

Lessons learned on information retrieval in electronic health records: a comparison of embedding models and pooling strategies.

Skatje Myers , Timothy A Miller , Yanjun Gao , Matthew M Churpek , Anoop Mayampurath

J Am Med Inform Assoc

February 2025

Objectives: Applying large language models (LLMs) to the clinical domain is challenging due to the context-heavy nature of processing medical records. Retrieval-augmented generation (RAG) offers a solution by facilitating reasoning over large text sources. However, there are many parameters to optimize in just the retrieval system alone.

View Article and Find Full Text PDF

Generalizable clinical note section identification with large language models.

Weipeng Zhou , Timothy A Miller

JAMIA Open

October 2024

Objectives: Clinical note section identification helps locate relevant information and could be beneficial for downstream tasks such as named entity recognition. However, the traditional supervised methods suffer from transferability issues. This study proposes a new framework for using large language models (LLMs) for section identification to overcome the limitations.

View Article and Find Full Text PDF

Cumulus: a federated electronic health record-based learning system powered by Fast Healthcare Interoperability Resources and artificial intelligence.

Andrew J McMurry , Daniel I Gottlieb , Timothy A Miller , James R Jones , Ashish Atreja

J Am Med Inform Assoc

August 2024

Objective: To address challenges in large-scale electronic health record (EHR) data exchange, we sought to develop, deploy, and test an open source, cloud-hosted app "listener" that accesses standardized data across the SMART/HL7 Bulk FHIR Access application programming interface (API).

Methods: We advance a model for scalable, federated, data sharing and learning. Cumulus software is designed to address key technology and policy desiderata including local utility, control, and administrative simplicity as well as privacy preservation during robust data sharing, and artificial intelligence (AI) for processing unstructured text.

View Article and Find Full Text PDF

Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection From Physician Notes: Retrospective Cohort Study.

Andrew J McMurry , Amy R Zipursky , Alon Geva , Karen L Olson , James R Jones , Timothy A Miller

J Med Internet Res

April 2024

Background: Real-time surveillance of emerging infectious diseases necessitates a dynamically evolving, computable case definition, which frequently incorporates symptom-related criteria. For symptom detection, both population health monitoring platforms and research initiatives primarily depend on structured data extracted from electronic health records.

Objective: This study sought to validate and test an artificial intelligence (AI)-based natural language processing (NLP) pipeline for detecting COVID-19 symptoms from physician notes in pediatric patients.

View Article and Find Full Text PDF

Cumulus: A federated EHR-based learning system powered by FHIR and AI.

Andrew J McMurry , Daniel I Gottlieb , Timothy A Miller , James R Jones , Ashish Atreja

medRxiv

February 2024

Objective: To address challenges in large-scale electronic health record (EHR) data exchange, we sought to develop, deploy, and test an open source, cloud-hosted app 'listener' that accesses standardized data across the SMART/HL7 Bulk FHIR Access application programming interface (API).

Methods: We advance a model for scalable, federated, data sharing and learning. Cumulus software is designed to address key technology and policy desiderata including local utility, control, and administrative simplicity as well as privacy preservation during robust data sharing, and AI for processing unstructured text.

View Article and Find Full Text PDF

The SMART Text2FHIR Pipeline.

Timothy A Miller , Andrew J McMurry , James Jones , Daniel Gottlieb , Kenneth D Mandl

AMIA Annu Symp Proc

January 2024

: To implement an open source, free, and easily deployable high throughput natural language processing module to extract concepts from clinician notes and map them to Fast Healthcare Interoperability Resources (FHIR). : Using a popular open-source NLP tool (Apache cTAKES), we create FHIR resources that use modifier extensions to represent negation and NLP sourcing, and another extension to represent provenance of extracted concepts. : The SMART Text2FHIR Pipeline is an open-source tool, released through standard package managers, and publicly available container images that implement the mappings, enabling ready conversion of clinical text to FHIR.

View Article and Find Full Text PDF

Improving the Transferability of Clinical Note Section Classification Models with BERT and Large Language Model Ensembles.

Weipeng Zhou , Dmitriy Dligach , Majid Afshar , Yanjun Gao , Timothy A Miller

Proc Conf Assoc Comput Linguist Meet

July 2023

Text in electronic health records is organized into sections, and classifying those sections into section categories is useful for downstream tasks. In this work, we attempt to improve the transferability of section classification models by combining the dataset-specific knowledge in supervised learning models with the world knowledge inside large language models (LLMs). Surprisingly, we find that zero-shot LLMs out-perform supervised BERT-based models applied to out-of-domain data.

View Article and Find Full Text PDF

Improving model transferability for clinical note section classification models using continued pretraining.

Weipeng Zhou , Meliha Yetisgen , Majid Afshar , Yanjun Gao , Guergana Savova , Timothy A Miller

J Am Med Inform Assoc

December 2023

Objective: The classification of clinical note sections is a critical step before doing more fine-grained natural language processing tasks such as social determinants of health extraction and temporal information extraction. Often, clinical note section classification models that achieve high accuracy for 1 institution experience a large drop of accuracy when transferred to another institution. The objective of this study is to develop methods that classify clinical note sections under the SOAP ("Subjective," "Object," "Assessment," and "Plan") framework with improved transferability.

View Article and Find Full Text PDF

Natural Language Processing to Automatically Extract the Presence and Severity of Esophagitis in Notes of Patients Undergoing Radiotherapy.

Shan Chen , Marco Guevara , Nicolas Ramirez , Arpi Murray , Jeremy L Warner , Timothy A Miller

JCO Clin Cancer Inform

July 2023

Purpose: Radiotherapy (RT) toxicities can impair survival and quality of life, yet remain understudied. Real-world evidence holds potential to improve our understanding of toxicities, but toxicity information is often only in clinical notes. We developed natural language processing (NLP) models to identify the presence and severity of esophagitis from notes of patients treated with thoracic RT.

View Article and Find Full Text PDF

A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital.

Lijing Wang , Amy R Zipursky , Alon Geva , Andrew J McMurry , Kenneth D Mandl , Timothy A Miller

JAMIA Open

October 2023

Objective: To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR).

Materials And Methods: Statistical classifiers were trained on feature representations derived from unstructured text in patient EHRs. We used a proxy dataset of patients COVID-19 polymerase chain reaction (PCR) tests for training.

View Article and Find Full Text PDF

Natural Language Processing Methods to Empirically Explore Social Contexts and Needs in Cancer Patient Notes.

Abigail Derton , Marco Guevara , Shan Chen , Shalini Moningi , David E Kozono , Timothy A Miller

JCO Clin Cancer Inform

May 2023

Purpose: There is an unmet need to empirically explore and understand drivers of cancer disparities, particularly social determinants of health. We explored natural language processing methods to automatically and empirically extract clinical documentation of social contexts and needs that may underlie disparities.

Methods: This was a retrospective analysis of 230,325 clinical notes from 5,285 patients treated with radiotherapy from 2007 to 2019.

View Article and Find Full Text PDF

Improving Model Transferability for Clinical Note Section Classification Models Using Continued Pretraining.

Weipeng Zhou , Meliha Yetisgen , Majid Afshar , Yanjun Gao , Guergana Savova , Timothy A Miller

medRxiv

April 2023

Objective: The classification of clinical note sections is a critical step before doing more fine-grained natural language processing tasks such as social determinants of health extraction and temporal information extraction. Often, clinical note section classification models that achieve high accuracy for one institution experience a large drop of accuracy when transferred to another institution. The objective of this study is to develop methods that classify clinical note sections under the SOAP ("Subjective", "Object", "Assessment" and "Plan") framework with improved transferability.

View Article and Find Full Text PDF

The SMART Text2FHIR Pipeline.

Timothy A Miller , Andrew J McMurry , James Jones , Daniel Gottlieb , Kenneth D Mandl

medRxiv

March 2023

Article Synopsis

The goal is to create a free and easy-to-use NLP module to extract concepts from clinician notes and convert them to FHIR resources.
Using Apache cTAKES, the project develops tools that handle negation and sourcing of the extracted concepts within FHIR.
The resulting SMART Text2FHIR Pipeline is open-source, accessible through package managers, and can enhance data sharing in healthcare, supporting public health and research initiatives.

View Article and Find Full Text PDF

A computable phenotype for patients with SARS-CoV2 testing that occurred outside the hospital.

Lijing Wang , Amy Zipursky , Alon Geva , Andrew J McMurry , Kenneth D Mandl , Timothy A Miller

medRxiv

January 2023

Objective: To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR).

Materials And Methods: Statistical classifiers were trained on feature representations derived from unstructured text in patient electronic health records (EHRs). We used a proxy dataset of patients COVID-19 polymerase chain reaction (PCR) tests for training.

View Article and Find Full Text PDF

US Food and Drug Administration Approval of High-risk Cardiovascular Devices for Use in Children and Adolescents, 1977-2021.

Susmitha Wunnava , Timothy A Miller , Claire Narang , Meena Nathan , Florence T Bourgeois

JAMA

August 2022

This study uses data from US Food and Drug Administration (FDA) databases to quantify approval of high-risk cardiovascular devices for use in pediatric populations and assess the clinical evidence supporting the approvals.

View Article and Find Full Text PDF

Classifying unstructured electronic consult messages to understand primary care physician specialty information needs.

Xiyu Ding , Michael Barnett , Ateev Mehrotra , Delphine S Tuot , Danielle S Bitterman , Timothy A Miller

J Am Med Inform Assoc

August 2022

Objective: Electronic consultation (eConsult) content reflects important information about referring clinician needs across an organization, but is challenging to extract. The objective of this work was to develop machine learning models for classifying eConsult questions for question type and question content. Another objective of this work was to investigate the ability to solve this task with constrained expert time resources.

View Article and Find Full Text PDF

Improving FDA postmarket adverse event reporting for medical devices.

Susmitha Wunnava , Timothy A Miller , Florence T Bourgeois

BMJ Evid Based Med

April 2023

View Article and Find Full Text PDF

Correction to "Imaging Nanometer Phase Coexistence at Defects During the Insulator-Metal Phase Transformation in VO Thin Films by Resonant Soft X-ray Holography".

Luciana Vidas , Christian M Günther , Timothy A Miller , Bastian Pfau , Daniel Perez-Salinas

Nano Lett

September 2021

View Article and Find Full Text PDF

Clinical Natural Language Processing for Radiation Oncology: A Review and Practical Primer.

Danielle S Bitterman , Timothy A Miller , Raymond H Mak , Guergana K Savova

Int J Radiat Oncol Biol Phys

July 2021

Natural language processing (NLP), which aims to convert human language into expressions that can be analyzed by computers, is one of the most rapidly developing and widely used technologies in the field of artificial intelligence. Natural language processing algorithms convert unstructured free text data into structured data that can be extracted and analyzed at scale. In medicine, this unlocking of the rich, expressive data within clinical free text in electronic medical records will help untap the full potential of big data for research and clinical purposes.

View Article and Find Full Text PDF

Incorporating Risk Factor Embeddings in Pre-trained Transformers Improves Sentiment Prediction in Psychiatric Discharge Summaries.

Xiyu Ding , Mei-Hua Hall , Timothy A Miller

Proc Conf Empir Methods Nat Lang Process

November 2020

Reducing rates of early hospital readmission has been recognized and identified as a key to improve quality of care and reduce costs. There are a number of risk factors that have been hypothesized to be important for understanding re-admission risk, including such factors as problems with substance abuse, ability to maintain work, relations with family. In this work, we develop RoBERTa-based models to predict the sentiment of sentences describing readmission risk factors in discharge summaries of patients with psychosis.

View Article and Find Full Text PDF

Classifying Electronic Consults for Triage Status and Question Type.

Xiyu Ding , Michael L Barnett , Ateev Mehrotra , Timothy A Miller

Proc Conf Assoc Comput Linguist Meet

July 2020

Electronic consult (eConsult) systems allow specialists more flexibility to respond to referrals more efficiently, thereby increasing access in under-resourced healthcare settings like safety net systems. Understanding the usage patterns of eConsult system is an important part of improving specialist efficiency. In this work, we develop and apply classifiers to a dataset of eConsult questions from primary care providers to specialists, classifying the messages for how they were triaged by the specialist office, and the underlying type of clinical question posed by the primary care provider.

View Article and Find Full Text PDF

Experiences implementing scalable, containerized, cloud-based NLP for extracting biobank participant phenotypes at scale.

Timothy A Miller , Paul Avillach , Kenneth D Mandl

JAMIA Open

July 2020

Objective: To develop scalable natural language processing (NLP) infrastructure for processing the free text in electronic health records (EHRs).

Materials And Methods: We extend the open-source Apache cTAKES NLP software with several standard technologies for scalability. We remove processing bottlenecks by monitoring component queue size.

View Article and Find Full Text PDF