From literature to biodiversity data: mining arthropod organismal traits with machine learning.

Biodivers Data J

Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland Department of Ecology and Evolution, University of Lausanne Lausanne Switzerland.

Published: August 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

The fields of taxonomy and biodiversity research have witnessed an exponential growth in published literature. This vast corpus of articles holds information on the diverse biological traits of organisms and their ecologies. However, access to and extraction of relevant data from this extensive resource remain challenging. Advances in text and data mining (TDM) and Natural Language Processing (NLP) techniques offer new opportunities for liberating such information from literature. Testing and using such approaches to annotate articles in machine-actionable formats is, therefore, necessary to enable the exploitation of existing knowledge in new biology, ecology and evolution research. Here, we explore the potential of these methods to annotate and extract organismal trait data for the most diverse animal group on Earth, the arthropods. The article processing workflow uses manually curated trait dictionaries with trained NLP models to perform labelling of entities and relationships of thousands of articles. A subset of manually annotated documents facilitated the formal evaluation of the performance of the workflow in terms of entity recognition and normalisation and relationship extraction, highlighting several important technical challenges. The results are made available to the scientific community through an interactive web tool and queryable resource, the ArTraDB Arthropod Trait Database. These methodological explorations provide a framework that could be extended beyond the arthropods, where TDM and NLP approaches applied to the taxonomy and biodiversity literature will greatly facilitate data synthesis studies and literature reviews, the identification of knowledge gaps and biases, as well as the data-informed investigation of ecological and evolutionary trends and patterns.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12344437PMC
http://dx.doi.org/10.3897/BDJ.13.e153070DOI Listing

Publication Analysis

Top Keywords

data mining
8
taxonomy biodiversity
8
literature
5
data
5
literature biodiversity
4
biodiversity data
4
mining arthropod
4
arthropod organismal
4
organismal traits
4
traits machine
4

Similar Publications

Thirty years of SPM-BrainMap synergy: making and mining coordinate-based literature.

Cereb Cortex

August 2025

Research Imaging Institute, University of Texas Health Science Center at San Antonio, 8403 Floyd Curl Drive, San Antonio, TX 78229, United States.

Statistical Parametric Mapping (SPM) adheres to rigorous methodological standards, including: spatial normalization, inter-subject averaging, voxel-wise contrasts, and coordinate reporting. This rigor ensures that a thematically diverse literature is amenable to meta-analysis. BrainMap is a community database (www.

View Article and Find Full Text PDF

Accurate differentiation between persistent vegetative state (PVS) and minimally conscious state and estimation of recovery likelihood in patients in PVS are crucial. This study analyzed electroencephalography (EEG) metrics to investigate their relationship with consciousness improvements in patients in PVS and developed a machine learning prediction model. We retrospectively evaluated 19 patients in PVS, categorizing them into two groups: those with improved consciousness ( = 7) and those without improvement ( = 12).

View Article and Find Full Text PDF

Sorting nexin 3 promotes ischemic retinopathy through RIP1- and RIP3-mediated myeloid cell necroptosis and mitochondrial fission.

Proc Natl Acad Sci U S A

September 2025

State Key Laboratory of Bioactive Molecules and Druggability Assessment, Guangdong Province Key Laboratory of Pharmacodynamic Constituents of Traditional Chinese Medicine and New Drugs Research, International Cooperative Laboratory of Traditional Chinese Medicine Modernization and Innovative Drug De

Proliferative retinopathy is a leading cause of irreversible blindness in humans; however, the molecular mechanisms behind the immune cell-mediated retinal angiogenesis remain poorly elucidated. Here, using single-cell RNA sequencing in an oxygen-induced retinopathy (OIR) model, we identified an enrichment of sorting nexin (SNX)-related pathways, with SNX3, a member of the SNX family that is involved in endosomal sorting and trafficking, being significantly upregulated in the myeloid cell subpopulations of OIR retinas. Immunostaining showed that SNX3 expression is markedly increased in the retinal microglia/macrophages of mice with OIR, which is mainly located within and around the neovascular tufts.

View Article and Find Full Text PDF

Recombinant DNA technology is widely used to produce industrially and pharmaceutically important proteins. In silico analysis, performed before executing wet lab experiments has been greatly helpful in this connection. A shift in protein analysis has been observed over the past decade, driven by advancements in bioinformatics databases, tools, software, and web servers.

View Article and Find Full Text PDF

Advances in Pectinase Engineering for Food Bioprocessing: Novel Sources, Mechanisms, and Optimization Strategies.

J Agric Food Chem

September 2025

School of Food & Biological Engineering, Jiangsu University, 301 Xuefu Road, Zhenjiang 212013 Jiangsu Province, China.

Pectinases are indispensable biocatalysts for pectin degradation in food and bioprocessing industries, yet natural enzymes often lack tailored functionalities for modern applications. While a previous review discussed pectinases in terms of production and application, this review particularly discusses an integrated framework for robust pectinases. This framework combines enzyme mining, protein engineering, and AI-assisted design to systematically discover, optimize, and customize pectinases.

View Article and Find Full Text PDF