Can Machine Learning Help Identify Patients at Risk for Recurrent Sexually Transmitted Infections?

Heather R Elder , Susan Gruber , Sarah J Willis , Noelle Cocoros , Myfanwy Callahan , Elaine W Flagg , Michael Klompas , Katherine K Hsu

Sex Transm Dis

Division of STD Prevention, National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, GA.

Published: January 2021

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background: A substantial fraction of sexually transmitted infections (STIs) occur in patients who have previously been treated for an STI. We assessed whether routine electronic health record (EHR) data can predict which patients presenting with an incident STI are at greatest risk for additional STIs in the next 1 to 2 years.

Methods: We used structured EHR data on patients 15 years or older who acquired an incident STI diagnosis in 2008 to 2015 in eastern Massachusetts. We applied machine learning algorithms to model risk of acquiring ≥1 or ≥2 additional STIs diagnoses within 365 or 730 days after the initial diagnosis using more than 180 different EHR variables. We performed sensitivity analysis incorporating state health department surveillance data to assess whether improving the accuracy of identifying STI cases improved algorithm performance.

Results: We identified 8723 incident episodes of laboratory-confirmed gonorrhea, chlamydia, or syphilis. Bayesian Additive Regression Trees, the best-performing algorithm of any single method, had a cross-validated area under the receiver operating curve of 0.75. Receiver operating curves for this algorithm showed a poor balance between sensitivity and positive predictive value (PPV). A predictive probability threshold with a sensitivity of 91.5% had a corresponding PPV of 3.9%. A higher threshold with a PPV of 29.5% had a sensitivity of 11.7%. Attempting to improve the classification of patients with and without repeat STIs diagnoses by incorporating health department surveillance data had minimal impact on cross-validated area under the receiver operating curve.

Conclusions: Machine algorithms using structured EHR data did not differentiate well between patients with and without repeat STIs diagnosis. Alternative strategies, able to account for sociobehavioral characteristics, could be explored.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10949112	PMC
http://dx.doi.org/10.1097/OLQ.0000000000001264	DOI Listing

Publication Analysis

Top Keywords

ehr data

receiver operating

machine learning

sexually transmitted

incident sti

additional stis

structured ehr

stis diagnoses

health department

department surveillance

Similar Publications

Updating the epidemiology of blastomycosis and histoplasmosis in the United States, using national electronic health record data, 2013-2023.

J Infect Dis

September 2025

Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.

Juliana G E Bartels , Simon K Camponuri , Theo T Snow , Brittany L Morgan Bustamante , Natalie J Kane

Introduction: Where surveillance data are limited, nationally-representative electronic health records allow for geographic, temporal, and demographic characterization of the fungal diseases blastomycosis and histoplasmosis.

Methods: We identified incident blastomycosis and histoplasmosis cases from 2013 to 2023 within Oracle EHR Real-World Data, which comprises 1.6 billion healthcare encounters nationally.

View Article and Find Full Text PDF

Similar Publications

Measuring alignment between the ADRC UDS data elements, FDA, and EHR data standards.

Alzheimers Dement

September 2025

Department of Population Health Sciences, University of Texas Health Science Center at San Antonio, San Antonio, Texas, USA.

Zhan Wang , Kayla Torres , Helen Foster , Gary Walker , Maryam Y Garza

Introduction: We compared and measured alignment between the Health Level Seven (HL7) Fast Healthcare Interoperability Resources (FHIR) standard used by electronic health records (EHRs), the Clinical Data Interchange Standards Consortium (CDISC) standards used by industry, and the Uniform Data Set (UDS) used by the Alzheimer's Disease Research Centers (ADRCs).

Methods: The ADRC UDS, consisting of 5959 data elements across eleven packets, was mapped to FHIR and CDISC standards by two independent mappers, with discrepancies adjudicated by experts.

Results: Forty-five percent of the 5959 UDS data elements mapped to the FHIR standard, indicating possible electronic obtainment from EHRs.

View Article and Find Full Text PDF

Similar Publications

Identifying levels of alcohol use disorder severity in electronic health records.

Subst Abuse Treat Prev Policy

September 2025

Centre for Interdisciplinary Addiction Research (ZIS), Department of Psychiatry and Psychotherapy, University Medical Center Hamburg-Eppendorf (UKE), Martinistraße 52, 20246, Hamburg, Germany.

Jakob Manthey , Carolin Kilian , Ludwig Kraus , Ingo Schäfer , Anna Schranz

Background: Alcohol use disorder (AUD) is conceptualized as a dimensional phenomenon in the DSM-5, but electronic health records (EHRs) rely on binary AUD definitions according to the ICD-10. The present study classifies AUD severity levels using EHR data and tests whether increasing AUD severity levels are linked with increased comorbidity.

Methods: Billing data from two German statutory health insurance companies in Hamburg included n = 21,954 adults diagnosed with alcohol-specific conditions between 2017 and 2021.

View Article and Find Full Text PDF

Similar Publications

Investigating Information Visualization to Combat Information Overload in Electronic Health Records: Protocol for a Randomized Controlled Trial.

JMIR Res Protoc

September 2025

School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.

Saif Khairat , Jennifer Morelli , Marcella H Boynton , Thomas Bice , Jeffrey A Gold

Background: Electronic health records (EHRs) have been linked to information overload, which can lead to cognitive fatigue, a precursor to burnout. This can cause health care providers to miss critical information and make clinical errors, leading to delays in care delivery. This challenge is particularly pronounced in medical intensive care units (ICUs), where patients are critically ill and their EHRs contain extensive and complex data.

View Article and Find Full Text PDF

Similar Publications

Predicting mortality dynamics in cancer patients: A machine learning approach to pre-death events.

PLoS One

September 2025

Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University, Kyoto, Japan.

Tatsuki Yamamoto , Minoru Sakuragi , Yuzuha Tuji , Yuji Okamoto , Eiichiro Uchino

Capturing the dynamic changes in patients' internal states as they approach death due to fatal diseases remains a major challenge in understanding individual pathologies and improving end-of-life care. However, existing methods primarily focus on specific test values or organ dysfunction markers, failing to provide a comprehensive view of the evolving internal state preceding death. To address this, we analyzed electronic health record (EHR) data from a single institution, including 8,976 cancer patients and 77 laboratory parameters, by constructing continuous mortality prediction models based on gradient-boosting decision trees and leveraging them for temporal analyses.

View Article and Find Full Text PDF

Similar Publications