Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Objectives: Unstructured data, such as procedure notes, contain valuable medical information that is frequently underutilized due to the labor-intensive nature of data extraction. This study aims to develop a generative artificial intelligence (GenAI) pipeline using an open-source Large Language Model (LLM) with built-in guardrails and a retry mechanism to extract data from unstructured right heart catheterization (RHC) notes while minimizing errors, including hallucinations.

Materials And Methods: A total of 220 RHC notes were randomly selected for pipeline development and 200 for validation from the Pulmonary Vascular Disease Registry. The pipeline comprised three main components: the Engineered Preload Framework (EPF), which integrated schemas and instructions; the LLM module, enhanced by reasoning capabilities; and the validation and retry mechanism, which ensured data accuracy through iterative self-correction. A clinical expert manually extracted data from the validation cohort to establish the ground truth. Pipeline performance was evaluated using precision, recall, and F1 score. Additionally, the dataset was stratified into quartiles to assess the pipeline's ability to handle varying levels of data availability.

Results: The pipeline achieved 99.0% precision, 85.0% recall, and a 91.5% F1 score, with an overall accuracy of 90% when evaluated at the note level. The most common error was missed values (5.2%), while hallucinations were the least frequent (<0.01%).

Discussion And Conclusion: This study demonstrates the feasibility of a robust GenAI pipeline for automating structured data extraction from unstructured RHC procedure notes. The approach highlights the potential of LLMs in medical data mining, improving research efficiency and clinical applications.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12410982PMC
http://dx.doi.org/10.1093/jamiaopen/ooaf097DOI Listing

Publication Analysis

Top Keywords

generative artificial
8
artificial intelligence
8
data extraction
8
retry mechanism
8
rhc notes
8
data
7
pipeline
5
intelligence automated
4
automated data
4
extraction unstructured
4

Similar Publications

Nuclear receptors (NRs) are a superfamily of ligand-activated transcription factors that regulate gene expression in response to metabolic, hormonal, and environmental signals. These receptors play a critical role in metabolic homeostasis, inflammation, immune function, and disease pathogenesis, positioning them as key therapeutic targets. This review explores the mechanistic roles of NRs such as PPARs, FXR, LXR, and thyroid hormone receptors (THRs) in regulating lipid and glucose metabolism, energy expenditure, cardiovascular health, and neurodegeneration.

View Article and Find Full Text PDF

Engineering resistance genes against tomato brown rugose fruit virus.

Sci China Life Sci

September 2025

MOE Key Laboratory of Bioinformatics and Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China.

Tomato brown rugose fruit virus (ToBRFV) overcomes all known tomato resistance genes, including the durable Tm-2, posing a serious threat to global tomato production. Here, we employed in vitro random mutagenesis to evolve the Tm-2 leucine-rich repeat (LRR) domain and screened ∼8,000 variants for gain-of-function mutants capable of recognizing the ToBRFV movement protein (MP) and triggering hypersensitive cell death. We identified five such mutants.

View Article and Find Full Text PDF

Purpose: The study aims to compare the treatment recommendations generated by four leading large language models (LLMs) with those from 21 sarcoma centers' multidisciplinary tumor boards (MTBs) of the sarcoma ring trial in managing complex soft tissue sarcoma (STS) cases.

Methods: We simulated STS-MTBs using four LLMs-Llama 3.2-vison: 90b, Claude 3.

View Article and Find Full Text PDF

Although dynamical systems models are a powerful tool for analysing microbial ecosystems, challenges in learning these models from complex microbiome datasets and interpreting their outputs limit use. We introduce the Microbial Dynamical Systems Inference Engine 2 (MDSINE2), a Bayesian method that learns compact and interpretable ecosystems-scale dynamical systems models from microbiome timeseries data. Microbial dynamics are modelled as stochastic processes driven by interaction modules, or groups of microbes with similar interaction structure and responses to perturbations, and additionally, noise characteristics of data are modelled.

View Article and Find Full Text PDF

Hypoxia has been extensively studied as a stressor which pushes human bodily systems to responses and adaptations. Nevertheless, a few evidence exist onto constituent trains of motor unit action potential, despite recent advancements which allow to decompose surface electromyographic signals. This study aimed to investigate motor unit properties from noninvasive approaches during maximal isometric exercise in normobaric hypoxia.

View Article and Find Full Text PDF