Large language models can extract metadata for annotation of human neuroimaging publications.

Matthew D Turner , Abhishek Appaji , Nibras Ar Rakib , Pedram Golnari , Arcot K Rajasekar , Anitha Rathnam K V , Satya S Sahoo , Yue Wang , Lei Wang , Jessica A Turner

Front Neuroinform

Department of Psychiatry, The Ohio State University, Columbus, OH, United States.

Published: August 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

We show that recent (mid-to-late 2024) commercial large language models (LLMs) are capable of good quality metadata extraction and annotation with very little work on the part of investigators for several exemplar real-world annotation tasks in the neuroimaging literature. We investigated the GPT-4o LLM from OpenAI which performed comparably with several groups of specially trained and supervised human annotators. The LLM achieves similar performance to humans, between 0.91 and 0.97 on zero-shot prompts without feedback to the LLM. Reviewing the disagreements between LLM and gold standard human annotations we note that actual LLM errors are comparable to human errors in most cases, and in many cases these disagreements are not errors. Based on the specific types of annotations we tested, with exceptionally reviewed gold-standard correct values, the LLM performance is usable for metadata annotation at scale. We encourage other research groups to develop and make available more specialized "micro-benchmarks," like the ones we provide here, for testing both LLMs, and more complex agent systems annotation performance in real-world metadata annotation tasks.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12405296	PMC
http://dx.doi.org/10.3389/fninf.2025.1609077	DOI Listing

Publication Analysis

Top Keywords

metadata annotation

large language

language models

annotation tasks

annotation

llm

models extract

metadata

extract metadata

human

Similar Publications

Large language models can extract metadata for annotation of human neuroimaging publications.

Front Neuroinform

August 2025

Department of Psychiatry, The Ohio State University, Columbus, OH, United States.

Matthew D Turner , Abhishek Appaji , Nibras Ar Rakib , Pedram Golnari , Arcot K Rajasekar

View Article and Find Full Text PDF

Similar Publications

Improving the FAIRness and Sustainability of the NHGRI Resources Ecosystem.

ArXiv

August 2025

Nationwide Children's Hospital, Columbus, OH.

Larry Babb , Carol Bult , Vincent J Carey , Robert J Carroll , Benjamin C Hitz

In 2024, individuals funded by NHGRI to support genomic community resources completed a Self-Assessment Tool (SAT) to evaluate their application of the FAIR (Findable, Accessible, Interoperable, and Reusable) principles and assess their sustainability. By collecting insights from the self-administered questionnaires and conducting personal interviews, a valuable perspective was gained on the FAIRness and sustainability of the NHGRI resources. The results highlighted several challenges and key areas the NHGRI resource community could improve by working together to form recommendations to address these challenges.

View Article and Find Full Text PDF

Similar Publications

scELMo: Embeddings from Language Models are Good Learners for Single-cell Data Analysis.

bioRxiv

August 2025

Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, 06511, CT, USA.

Tianyu Liu , Tianqi Chen , Wangjie Zheng , Xiao Luo , Yiqun Chen

Various Foundation Models (FMs) have been built based on the pre-training and fine-tuning framework to analyze single-cell data with different degrees of success. In this manuscript, we propose a method named scELMo (Single-cell Embedding from Language Models), to analyze single-cell data that utilizes Large Language Models (LLMs) as a generator for both the description of metadata information and the embeddings for such descriptions. We combine the embeddings from LLMs with the raw data under the zero-shot learning framework to further extend its function by using the fine-tuning framework to handle different tasks.

View Article and Find Full Text PDF

Similar Publications

metagRoot: a comprehensive database of protein families associated with plant root microbiomes.

Nucleic Acids Res

September 2025

Institute for Fundamental Biomedical Resea rch, BSRC "Alexander Fleming," 16672 Vari, Greece.

Maria N Chasapi , Iro N Chasapi , Eleni Aplakidou , Fotis A Baltoumas , Evangelos Karatzas

The plant root microbiome is vital in plant health, nutrient uptake, and environmental resilience. To explore and harness this diversity, we present metagRoot, a specialized and enriched database focused on the protein families of the plant root microbiome. MetagRoot integrates metagenomic, metatranscriptomic, and reference genome-derived protein data to characterize 71 091 enriched protein families, each containing at least 100 sequences.

View Article and Find Full Text PDF

Similar Publications

A simple and effective approach for body part recognition on CT scans based on projection estimation.

Sci Rep

August 2025

Musculoskeletal Digital Innovation and Informatics (MDI²) Program, Department of Orthopaedic and Sports Medicine, Boston Children's Hospital, Harvard Medical School, Boston MA, USA.

Franko Hrzic , Mohammadreza Movahhedi , Ophelie Lavoie-Gagne , Ata Kiapour

It is well known that machine learning models require a high amount of annotated data to obtain optimal performance. Labelling Computed Tomography (CT) data can be a particularly challenging task due to its volumetric nature and often missing and/or incomplete associated meta-data. Even inspecting one CT scan requires additional computer software, or in the case of programming languages-additional programming libraries.

View Article and Find Full Text PDF

Similar Publications