98%
921
2 minutes
20
The fields of taxonomy and biodiversity research have witnessed an exponential growth in published literature. This vast corpus of articles holds information on the diverse biological traits of organisms and their ecologies. However, access to and extraction of relevant data from this extensive resource remain challenging. Advances in text and data mining (TDM) and Natural Language Processing (NLP) techniques offer new opportunities for liberating such information from literature. Testing and using such approaches to annotate articles in machine-actionable formats is, therefore, necessary to enable the exploitation of existing knowledge in new biology, ecology and evolution research. Here, we explore the potential of these methods to annotate and extract organismal trait data for the most diverse animal group on Earth, the arthropods. The article processing workflow uses manually curated trait dictionaries with trained NLP models to perform labelling of entities and relationships of thousands of articles. A subset of manually annotated documents facilitated the formal evaluation of the performance of the workflow in terms of entity recognition and normalisation and relationship extraction, highlighting several important technical challenges. The results are made available to the scientific community through an interactive web tool and queryable resource, the ArTraDB Arthropod Trait Database. These methodological explorations provide a framework that could be extended beyond the arthropods, where TDM and NLP approaches applied to the taxonomy and biodiversity literature will greatly facilitate data synthesis studies and literature reviews, the identification of knowledge gaps and biases, as well as the data-informed investigation of ecological and evolutionary trends and patterns.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12344437 | PMC |
http://dx.doi.org/10.3897/BDJ.13.e153070 | DOI Listing |
Cereb Cortex
August 2025
Research Imaging Institute, University of Texas Health Science Center at San Antonio, 8403 Floyd Curl Drive, San Antonio, TX 78229, United States.
Statistical Parametric Mapping (SPM) adheres to rigorous methodological standards, including: spatial normalization, inter-subject averaging, voxel-wise contrasts, and coordinate reporting. This rigor ensures that a thematically diverse literature is amenable to meta-analysis. BrainMap is a community database (www.
View Article and Find Full Text PDFNeurotrauma Rep
August 2025
Institute of Acupuncture and Moxibustion, China Academy of Chinese Medical Sciences, Beijing, China.
Accurate differentiation between persistent vegetative state (PVS) and minimally conscious state and estimation of recovery likelihood in patients in PVS are crucial. This study analyzed electroencephalography (EEG) metrics to investigate their relationship with consciousness improvements in patients in PVS and developed a machine learning prediction model. We retrospectively evaluated 19 patients in PVS, categorizing them into two groups: those with improved consciousness ( = 7) and those without improvement ( = 12).
View Article and Find Full Text PDFProc Natl Acad Sci U S A
September 2025
State Key Laboratory of Bioactive Molecules and Druggability Assessment, Guangdong Province Key Laboratory of Pharmacodynamic Constituents of Traditional Chinese Medicine and New Drugs Research, International Cooperative Laboratory of Traditional Chinese Medicine Modernization and Innovative Drug De
Proliferative retinopathy is a leading cause of irreversible blindness in humans; however, the molecular mechanisms behind the immune cell-mediated retinal angiogenesis remain poorly elucidated. Here, using single-cell RNA sequencing in an oxygen-induced retinopathy (OIR) model, we identified an enrichment of sorting nexin (SNX)-related pathways, with SNX3, a member of the SNX family that is involved in endosomal sorting and trafficking, being significantly upregulated in the myeloid cell subpopulations of OIR retinas. Immunostaining showed that SNX3 expression is markedly increased in the retinal microglia/macrophages of mice with OIR, which is mainly located within and around the neovascular tufts.
View Article and Find Full Text PDFAppl Biochem Biotechnol
September 2025
School of Biological Sciences, University of the Punjab, Quaid-E-Azam Campus, P.O. 54590, Lahore, Pakistan.
Recombinant DNA technology is widely used to produce industrially and pharmaceutically important proteins. In silico analysis, performed before executing wet lab experiments has been greatly helpful in this connection. A shift in protein analysis has been observed over the past decade, driven by advancements in bioinformatics databases, tools, software, and web servers.
View Article and Find Full Text PDFJ Agric Food Chem
September 2025
School of Food & Biological Engineering, Jiangsu University, 301 Xuefu Road, Zhenjiang 212013 Jiangsu Province, China.
Pectinases are indispensable biocatalysts for pectin degradation in food and bioprocessing industries, yet natural enzymes often lack tailored functionalities for modern applications. While a previous review discussed pectinases in terms of production and application, this review particularly discusses an integrated framework for robust pectinases. This framework combines enzyme mining, protein engineering, and AI-assisted design to systematically discover, optimize, and customize pectinases.
View Article and Find Full Text PDF