Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Objective: To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR).

Materials And Methods: Statistical classifiers were trained on feature representations derived from unstructured text in patient electronic health records (EHRs). We used a proxy dataset of patients COVID-19 polymerase chain reaction (PCR) tests for training. We selected a model based on performance on our proxy dataset and applied it to instances without COVID-19 PCR tests. A physician reviewed a sample of these instances to validate the classifier.

Results: On the test split of the proxy dataset, our best classifier obtained 0.56 F1, 0.6 precision, and 0.52 recall scores for SARS-CoV2 positive cases. In an expert validation, the classifier correctly identified 90.8% (79/87) as COVID-19 positive and 97.8% (91/93) as not SARS-CoV2 positive. The classifier identified an additional 960 positive cases that did not have SARS-CoV2 lab tests in hospital, and only 177 of those cases had the ICD-10 code for COVID-19.

Discussion: Proxy dataset performance may be worse because these instances sometimes include discussion of pending lab tests. The most predictive features are meaningful and interpretable. The type of external test that was performed is rarely mentioned.

Conclusion: COVID-19 cases that had testing done outside of the hospital can be reliably detected from the text in EHRs. Training on a proxy dataset was a suitable method for developing a highly performant classifier without labor intensive labeling efforts.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9882620PMC
http://dx.doi.org/10.1101/2023.01.19.23284738DOI Listing

Publication Analysis

Top Keywords

proxy dataset
20
covid-19 cases
8
electronic health
8
pcr tests
8
sars-cov2 positive
8
positive cases
8
lab tests
8
covid-19
5
cases
5
proxy
5

Similar Publications

This article presents a multiproxy investigation of metal samples obtained from 48 Nuragic figurines (so-called bronzetti) and three copper bun ingots. These objects originate from three prominent Sardinian sanctuaries and one unidentified site, dating to the late Nuragic period of the early first millennium BCE. The dataset significantly expands the existing scientific database and unwraps the complex fabrication biographies of the figurines from ore to finished object.

View Article and Find Full Text PDF

Increased use of psychiatric drugs in Brazil over the years: evidence from a country-wide dataset.

Trends Psychiatry Psychother

September 2025

Programa de Pós-Graduação em Ciências Biológicas - Bioquímica, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, RS, Brazil.

Objectives: Stressful events can impact the incidence of psychiatric disorders and, therefore, psychiatric drug use. However, it is not clear whether psychiatric drug use is stable or not across the Brazilian population over time. The aim of this study was to investigate trends in psychiatric drug sales in Brazil over the years, using sales data from private-sector pharmacies as a proxy for psychiatric drug consumption.

View Article and Find Full Text PDF

Circadian clocks play a crucial role in regulating the sleep-wake rhythm of organisms, aligning their activity with fluctuating environmental factors, such as light intensity. Still, significant and consistent interindividual differences in the timing of activity, known as chronotypes, have been observed across various species, but whether this affects fitness is still unknown. While previous studies have primarily focused on annual reproductive success, few studies have examined associations between chronotype and lifetime reproductive success.

View Article and Find Full Text PDF

Community composition as an overlooked driver of spatial population synchrony.

PNAS Nexus

September 2025

School of Aquatic and Fishery Sciences, University of Washington, 1122 NE Boat St, Box 355020, Seattle, WA 98105, USA.

Animal populations often display coherent temporal fluctuations in their abundance, with far-ranging implications for species persistence and ecosystem stability. The key mechanisms driving spatial population synchrony include organismal dispersal, spatially correlated environmental dynamics (Moran effect) and concordant consumer-resource dynamics. Disentangling these mechanisms, however, is notoriously difficult in natural systems, and the extent to which the biotic environment (intensity and types of biotic interactions) mediates metapopulation dynamics remains a largely unanswered question.

View Article and Find Full Text PDF

Background And Objectives: Brain tissue oxygenation is usually inferred from arterial partial pressure of oxygen (paO), which is in turn often inferred from pulse oximetry measurements or other non-invasive proxies. Our aim was to evaluate the feasibility of continuous paO prediction in an intraoperative setting among neurosurgical patients undergoing craniotomies with modern machine learning methods.

Methods: Data from routine clinical care of lung-healthy neurosurgical patients were extracted from databases of the respective clinical systems and normalized.

View Article and Find Full Text PDF