Publications by authors named "Timothy A Miller"

Objective: Our objective was to build classifiers for multiple phenotypes that categorize a cohort of adults with congenital heart disease (ACHD), that can be used to populate variables in a biobank.

Materials And Methods: A dataset of 1492 ACHD patients, with expert-created labels for eight phenotypes, was created and used to train classifiers with three different architectures. A larger unlabeled dataset containing 15869 patients was used to pre-train the classifiers, and a 20% subset of the unlabeled dataset was used to validate the classifier predictions.

View Article and Find Full Text PDF

Objective: To evaluate the efficacy of digital twins developed using a large language model (LLaMA-3), fine-tuned with Low-Rank Adapters (LoRA) on ICU physician notes, and to determine whether specialty-specific training enhances treatment recommendation accuracy compared to other ICU specialties or zero-shot baselines.

Materials And Methods: Digital twins were created using LLaMA-3 fine-tuned on discharge summaries from the MIMIC-III dataset, where medications were masked to construct training and testing datasets. The medical ICU dataset (1,000 notes) was used for evaluation, and performance was assessed using BERTScore and ROUGE-L.

View Article and Find Full Text PDF

Objectives: Applying large language models (LLMs) to the clinical domain is challenging due to the context-heavy nature of processing medical records. Retrieval-augmented generation (RAG) offers a solution by facilitating reasoning over large text sources. However, there are many parameters to optimize in just the retrieval system alone.

View Article and Find Full Text PDF

Objectives: Clinical note section identification helps locate relevant information and could be beneficial for downstream tasks such as named entity recognition. However, the traditional supervised methods suffer from transferability issues. This study proposes a new framework for using large language models (LLMs) for section identification to overcome the limitations.

View Article and Find Full Text PDF

Objective: To address challenges in large-scale electronic health record (EHR) data exchange, we sought to develop, deploy, and test an open source, cloud-hosted app "listener" that accesses standardized data across the SMART/HL7 Bulk FHIR Access application programming interface (API).

Methods: We advance a model for scalable, federated, data sharing and learning. Cumulus software is designed to address key technology and policy desiderata including local utility, control, and administrative simplicity as well as privacy preservation during robust data sharing, and artificial intelligence (AI) for processing unstructured text.

View Article and Find Full Text PDF

Background: Real-time surveillance of emerging infectious diseases necessitates a dynamically evolving, computable case definition, which frequently incorporates symptom-related criteria. For symptom detection, both population health monitoring platforms and research initiatives primarily depend on structured data extracted from electronic health records.

Objective: This study sought to validate and test an artificial intelligence (AI)-based natural language processing (NLP) pipeline for detecting COVID-19 symptoms from physician notes in pediatric patients.

View Article and Find Full Text PDF

Objective: To address challenges in large-scale electronic health record (EHR) data exchange, we sought to develop, deploy, and test an open source, cloud-hosted app 'listener' that accesses standardized data across the SMART/HL7 Bulk FHIR Access application programming interface (API).

Methods: We advance a model for scalable, federated, data sharing and learning. Cumulus software is designed to address key technology and policy desiderata including local utility, control, and administrative simplicity as well as privacy preservation during robust data sharing, and AI for processing unstructured text.

View Article and Find Full Text PDF

: To implement an open source, free, and easily deployable high throughput natural language processing module to extract concepts from clinician notes and map them to Fast Healthcare Interoperability Resources (FHIR). : Using a popular open-source NLP tool (Apache cTAKES), we create FHIR resources that use modifier extensions to represent negation and NLP sourcing, and another extension to represent provenance of extracted concepts. : The SMART Text2FHIR Pipeline is an open-source tool, released through standard package managers, and publicly available container images that implement the mappings, enabling ready conversion of clinical text to FHIR.

View Article and Find Full Text PDF

Text in electronic health records is organized into sections, and classifying those sections into section categories is useful for downstream tasks. In this work, we attempt to improve the transferability of section classification models by combining the dataset-specific knowledge in supervised learning models with the world knowledge inside large language models (LLMs). Surprisingly, we find that zero-shot LLMs out-perform supervised BERT-based models applied to out-of-domain data.

View Article and Find Full Text PDF

Objective: The classification of clinical note sections is a critical step before doing more fine-grained natural language processing tasks such as social determinants of health extraction and temporal information extraction. Often, clinical note section classification models that achieve high accuracy for 1 institution experience a large drop of accuracy when transferred to another institution. The objective of this study is to develop methods that classify clinical note sections under the SOAP ("Subjective," "Object," "Assessment," and "Plan") framework with improved transferability.

View Article and Find Full Text PDF

Purpose: Radiotherapy (RT) toxicities can impair survival and quality of life, yet remain understudied. Real-world evidence holds potential to improve our understanding of toxicities, but toxicity information is often only in clinical notes. We developed natural language processing (NLP) models to identify the presence and severity of esophagitis from notes of patients treated with thoracic RT.

View Article and Find Full Text PDF

Objective: To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR).

Materials And Methods: Statistical classifiers were trained on feature representations derived from unstructured text in patient EHRs. We used a proxy dataset of patients COVID-19 polymerase chain reaction (PCR) tests for training.

View Article and Find Full Text PDF

Purpose: There is an unmet need to empirically explore and understand drivers of cancer disparities, particularly social determinants of health. We explored natural language processing methods to automatically and empirically extract clinical documentation of social contexts and needs that may underlie disparities.

Methods: This was a retrospective analysis of 230,325 clinical notes from 5,285 patients treated with radiotherapy from 2007 to 2019.

View Article and Find Full Text PDF

Objective: The classification of clinical note sections is a critical step before doing more fine-grained natural language processing tasks such as social determinants of health extraction and temporal information extraction. Often, clinical note section classification models that achieve high accuracy for one institution experience a large drop of accuracy when transferred to another institution. The objective of this study is to develop methods that classify clinical note sections under the SOAP ("Subjective", "Object", "Assessment" and "Plan") framework with improved transferability.

View Article and Find Full Text PDF
Article Synopsis
  • The goal is to create a free and easy-to-use NLP module to extract concepts from clinician notes and convert them to FHIR resources.
  • Using Apache cTAKES, the project develops tools that handle negation and sourcing of the extracted concepts within FHIR.
  • The resulting SMART Text2FHIR Pipeline is open-source, accessible through package managers, and can enhance data sharing in healthcare, supporting public health and research initiatives.
View Article and Find Full Text PDF

Objective: To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR).

Materials And Methods: Statistical classifiers were trained on feature representations derived from unstructured text in patient electronic health records (EHRs). We used a proxy dataset of patients COVID-19 polymerase chain reaction (PCR) tests for training.

View Article and Find Full Text PDF

This study uses data from US Food and Drug Administration (FDA) databases to quantify approval of high-risk cardiovascular devices for use in pediatric populations and assess the clinical evidence supporting the approvals.

View Article and Find Full Text PDF

Objective: Electronic consultation (eConsult) content reflects important information about referring clinician needs across an organization, but is challenging to extract. The objective of this work was to develop machine learning models for classifying eConsult questions for question type and question content. Another objective of this work was to investigate the ability to solve this task with constrained expert time resources.

View Article and Find Full Text PDF

Natural language processing (NLP), which aims to convert human language into expressions that can be analyzed by computers, is one of the most rapidly developing and widely used technologies in the field of artificial intelligence. Natural language processing algorithms convert unstructured free text data into structured data that can be extracted and analyzed at scale. In medicine, this unlocking of the rich, expressive data within clinical free text in electronic medical records will help untap the full potential of big data for research and clinical purposes.

View Article and Find Full Text PDF

Reducing rates of early hospital readmission has been recognized and identified as a key to improve quality of care and reduce costs. There are a number of risk factors that have been hypothesized to be important for understanding re-admission risk, including such factors as problems with substance abuse, ability to maintain work, relations with family. In this work, we develop RoBERTa-based models to predict the sentiment of sentences describing readmission risk factors in discharge summaries of patients with psychosis.

View Article and Find Full Text PDF

Electronic consult (eConsult) systems allow specialists more flexibility to respond to referrals more efficiently, thereby increasing access in under-resourced healthcare settings like safety net systems. Understanding the usage patterns of eConsult system is an important part of improving specialist efficiency. In this work, we develop and apply classifiers to a dataset of eConsult questions from primary care providers to specialists, classifying the messages for how they were triaged by the specialist office, and the underlying type of clinical question posed by the primary care provider.

View Article and Find Full Text PDF

Objective: To develop scalable natural language processing (NLP) infrastructure for processing the free text in electronic health records (EHRs).

Materials And Methods: We extend the open-source Apache cTAKES NLP software with several standard technologies for scalability. We remove processing bottlenecks by monitoring component queue size.

View Article and Find Full Text PDF