Natural Language Processing and Machine Learning to Identify People Who Inject Drugs in Electronic Health Records.

David Goodman-Meza , Amber Tang , Babak Aryanfar , Sergio Vazquez , Adam J Gordon , Michihiko Goto , Matthew Bidwell Goetz , Steven Shoptaw , Alex A T Bui

Open Forum Infect Dis

Medical and Imaging Informatics Group, Department of Radiological Sciences, University of California, Los Angeles, Los Angeles, California, USA.

Published: September 2022

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background: Improving the identification of people who inject drugs (PWID) in electronic medical records can improve clinical decision making, risk assessment and mitigation, and health service research. Identification of PWID currently consists of heterogeneous, nonspecific () codes as proxies. Natural language processing (NLP) and machine learning (ML) methods may have better diagnostic metrics than nonspecific codes for identifying PWID.

Methods: We manually reviewed 1000 records of patients diagnosed with bacteremia admitted to Veterans Health Administration hospitals from 2003 through 2014. The manual review was the reference standard. We developed and trained NLP/ML algorithms with and without regular expression filters for negation (NegEx) and compared these with 11 proxy combinations of codes to identify PWID. Data were split 70% for training and 30% for testing. We calculated diagnostic metrics and estimated 95% confidence intervals (CIs) by bootstrapping the hold-out test set. Best models were determined by best F-score, a summary of sensitivity and positive predictive value.

Results: Random forest with and without NegEx were the best-performing NLP/ML algorithms in the training set. Random forest with NegEx outperformed all -based algorithms. F-score for the best NLP/ML algorithm was 0.905 (95% CI, .786-.967) and 0.592 (95% CI, .550-.632) for the best -based algorithm. The NLP/ML algorithm had a sensitivity of 92.6% and specificity of 95.4%.

Conclusions: NLP/ML outperformed -based coding algorithms at identifying PWID in electronic health records. NLP/ML models should be considered in identifying cohorts of PWID to improve clinical decision making, health services research, and administrative surveillance.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9511274	PMC
http://dx.doi.org/10.1093/ofid/ofac471	DOI Listing

Publication Analysis

Top Keywords

natural language

language processing

machine learning

people inject

inject drugs

electronic health

health records

pwid electronic

improve clinical

clinical decision

Similar Publications

It's Hey Jude, not Hey Jade: Input Variation and the Emergence of the Infant Lexicon.

J Child Lang

September 2025

Department of Psychology, University of TorontoMississauga, Mississauga, Ontario, Canada.

Helen Buckler , Elizabeth K Johnson

A growing literature explores the representational detail of infants' early lexical representations, but no study has investigated how exposure to real-life acoustic-phonetic variation impacts these representations. Indeed, previous experimental work with young infants has largely ignored the impact of accent exposure on lexical development. We ask how routine exposure to accent variation affects 6-month-olds' ability to detect mispronunciations.

View Article and Find Full Text PDF

Similar Publications

Implementing a Resource-Light and Low-Code Large Language Model System for Information Extraction from Mammography Reports: A Pilot Study.

J Imaging Inform Med

September 2025

Department of Diagnostic, Interventional and Pediatric Radiology (DIPR), Inselspital, Bern University Hospital and University of Bern, Bern, Switzerland.

Fabio Dennstädt , Simon Fauser , Nikola Cihoric , Max Schmerder , Paolo Lombardo

Large language models (LLMs) have been successfully used for data extraction from free-text radiology reports. Most current studies were conducted with LLMs accessed via an application programming interface (API). We evaluated the feasibility of using open-source LLMs, deployed on limited local hardware resources for data extraction from free-text mammography reports, using a common data element (CDE)-based structure.

View Article and Find Full Text PDF

Similar Publications

Benchmarking genomic language models.

Nat Methods

September 2025

Nature Methods, .

Lin Tang

View Article and Find Full Text PDF

Similar Publications

Decompensation of degenerative lumbar stenosis: do patients need immediate surgery?

Eur Spine J

September 2025

Centre Hospitalier Universitaire de Tours, Tours, France.

Marie Duigou , Louis-Marie Terrier , Alexia Planty-Bonjour , Christophe Destrieux , Ilyess Zemmoura

Purpose: Degenerative lumbar spinal stenosis (DLSS) represents an increasing challenge due to the aging population. The natural course of untreated DLSS is largely unknown. For the acute DLSS decompensations, the main concern remains the opportunity and timing of surgery, i.

View Article and Find Full Text PDF

Similar Publications

Social Decision Preferences for Close Others are Embedded in Neural and Linguistic Representations.

J Neurosci

September 2025

Department of Psychology, University of California, Los Angeles.

João F Guassi Moreira , L Concepción Esparza , Jennifer A Silvers , Carolyn Parkinson

Humans frequently make decisions that impact close others. Prior research has shown that people have stable preferences regarding such decisions and maintain rich, nuanced mental representations of their close social partners. Yet, if and how such mental representations shape social decisions preferences remains to be seen.

View Article and Find Full Text PDF

Similar Publications