Generalizable clinical note section identification with large language models.

JAMIA Open

Computational Health Informatics Program, Boston Children's Hospital, Boston, MA 02215, United States.

Published: October 2024


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Objectives: Clinical note section identification helps locate relevant information and could be beneficial for downstream tasks such as named entity recognition. However, the traditional supervised methods suffer from transferability issues. This study proposes a new framework for using large language models (LLMs) for section identification to overcome the limitations.

Materials And Methods: We framed section identification as question-answering and provided the section definitions in free-text. We evaluated multiple LLMs off-the-shelf without any training. We also fine-tune our LLMs to investigate how the size and the specificity of the fine-tuning dataset impacts model performance.

Results: GPT4 achieved the highest 1 score of 0.77. The best open-source model (Tulu2-70b) achieved 0.64 and is on par with GPT3.5 (ChatGPT). GPT4 is also found to obtain 1 scores greater than 0.9 for 9 out of the 27 (33%) section types and greater than 0.8 for 15 out of 27 (56%) section types. For our fine-tuned models, we found they plateaued with an increasing size of the general domain dataset. We also found that adding a reasonable amount of section identification examples is beneficial.

Discussion: These results indicate that GPT4 is nearly production-ready for section identification, and seemingly contains both knowledge of note structure and the ability to follow complex instructions, and the best current open-source LLM is catching up.

Conclusion: Our study shows that LLMs are promising for generalizable clinical note section identification. They have the potential to be further improved by adding section identification examples to the fine-tuning dataset.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11319784PMC
http://dx.doi.org/10.1093/jamiaopen/ooae075DOI Listing

Publication Analysis

Top Keywords

clinical note
12
note identification
12
generalizable clinical
8
identification
8
large language
8
language models
8
fine-tuning dataset
8
identification examples
8
note
4
identification large
4

Similar Publications

Comparative Analysis of COVID-19 Gene Target Dropout/Failure Results using Thermofisher TaqPath COVID-19 Combo Kit and Nextstrain CoVariants Genomic Databases.

J Healthc Sci Humanit

January 2024

Assistant Professor & Clinical Coordinator, Health Informatics Program, School of Health Professions, State University of New York Downstate Health Sciences University, 450 Clarkson Avenue, MSC 94, Brooklyn, NY 11203, (718) 270-7738, Fax: (718) 270-7739 Email:

COVID-19 variants continue to infect thousands of people even though the end of the pandemic was announced on May 11, 2023. Nextstrain CoVariants (CoVariants) genomic databases provide detailed information about more than 31 variants of COVID-19 viruses that have been identified through genomic sequencing, showing the mutations they carry. Mutated viruses may yield a negative result for a gene target using a PCR test that has a positive COVID-19 test result.

View Article and Find Full Text PDF

Background: Intracranial solitary fibrous tumors (SFTs) are rare mesenchymal tumors often presenting with dural-based lesions. These tumors can exhibit aggressive characteristics with high recurrence rates and extracranial metastasis. While SFTs occasionally invade venous sinuses, cases where the tumor arises within the venous sinus are rare.

View Article and Find Full Text PDF

Purpose: This systematic review provides a critical evaluation, synthesis of the existing literature on isotretinoin's effects on craniomaxillofacial bone.

Methods: Following the PRISMA guidelines and registered in PROSPERO, the review was conducted in August 2024 across various databases. Eligible in vivo studies were analysed for their assessment of isotretinoin's effects on craniomaxillofacial bone.

View Article and Find Full Text PDF

Introduction Stereotactic radiosurgery (SRS) is widely regarded as the standard of care after the resection of brain metastases in order to reduce local cavity recurrence risk. The objective of this study was to explore the reproducibility of published outcomes for patients receiving post-operative stereotactic radiosurgery (cavity SRS) in a National Health Service (NHS) setting for a non-selective series of patients. For our service, the median interval between surgery to cavity SRS (cSRS) is eight weeks, whereas similar timelines have been found to have a deleterious impact on survival in the published literature.

View Article and Find Full Text PDF