98%
921
2 minutes
20
We show that recent (mid-to-late 2024) commercial large language models (LLMs) are capable of good quality metadata extraction and annotation with very little work on the part of investigators for several exemplar real-world annotation tasks in the neuroimaging literature. We investigated the GPT-4o LLM from OpenAI which performed comparably with several groups of specially trained and supervised human annotators. The LLM achieves similar performance to humans, between 0.91 and 0.97 on zero-shot prompts without feedback to the LLM. Reviewing the disagreements between LLM and gold standard human annotations we note that actual LLM errors are comparable to human errors in most cases, and in many cases these disagreements are not errors. Based on the specific types of annotations we tested, with exceptionally reviewed gold-standard correct values, the LLM performance is usable for metadata annotation at scale. We encourage other research groups to develop and make available more specialized "micro-benchmarks," like the ones we provide here, for testing both LLMs, and more complex agent systems annotation performance in real-world metadata annotation tasks.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12405296 | PMC |
http://dx.doi.org/10.3389/fninf.2025.1609077 | DOI Listing |
Front Neuroinform
August 2025
Department of Psychiatry, The Ohio State University, Columbus, OH, United States.
We show that recent (mid-to-late 2024) commercial large language models (LLMs) are capable of good quality metadata extraction and annotation with very little work on the part of investigators for several exemplar real-world annotation tasks in the neuroimaging literature. We investigated the GPT-4o LLM from OpenAI which performed comparably with several groups of specially trained and supervised human annotators. The LLM achieves similar performance to humans, between 0.
View Article and Find Full Text PDFArXiv
August 2025
Nationwide Children's Hospital, Columbus, OH.
In 2024, individuals funded by NHGRI to support genomic community resources completed a Self-Assessment Tool (SAT) to evaluate their application of the FAIR (Findable, Accessible, Interoperable, and Reusable) principles and assess their sustainability. By collecting insights from the self-administered questionnaires and conducting personal interviews, a valuable perspective was gained on the FAIRness and sustainability of the NHGRI resources. The results highlighted several challenges and key areas the NHGRI resource community could improve by working together to form recommendations to address these challenges.
View Article and Find Full Text PDFbioRxiv
August 2025
Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, 06511, CT, USA.
Various Foundation Models (FMs) have been built based on the pre-training and fine-tuning framework to analyze single-cell data with different degrees of success. In this manuscript, we propose a method named scELMo (Single-cell Embedding from Language Models), to analyze single-cell data that utilizes Large Language Models (LLMs) as a generator for both the description of metadata information and the embeddings for such descriptions. We combine the embeddings from LLMs with the raw data under the zero-shot learning framework to further extend its function by using the fine-tuning framework to handle different tasks.
View Article and Find Full Text PDFNucleic Acids Res
September 2025
Institute for Fundamental Biomedical Resea rch, BSRC "Alexander Fleming," 16672 Vari, Greece.
The plant root microbiome is vital in plant health, nutrient uptake, and environmental resilience. To explore and harness this diversity, we present metagRoot, a specialized and enriched database focused on the protein families of the plant root microbiome. MetagRoot integrates metagenomic, metatranscriptomic, and reference genome-derived protein data to characterize 71 091 enriched protein families, each containing at least 100 sequences.
View Article and Find Full Text PDFSci Rep
August 2025
Musculoskeletal Digital Innovation and Informatics (MDI²) Program, Department of Orthopaedic and Sports Medicine, Boston Children's Hospital, Harvard Medical School, Boston MA, USA.
It is well known that machine learning models require a high amount of annotated data to obtain optimal performance. Labelling Computed Tomography (CT) data can be a particularly challenging task due to its volumetric nature and often missing and/or incomplete associated meta-data. Even inspecting one CT scan requires additional computer software, or in the case of programming languages-additional programming libraries.
View Article and Find Full Text PDF