Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Morphological data are critical for taxonomy, evolutionary biology, ecology, and species identification. However, no widely used central database for morphological data exists as it does for DNA sequences or specimen data. Most of these data are "locked up" in taxonomic literature. Various scripted and Natural Language Processing approaches have been explored to automate the extraction of morphological data from taxonomic descriptions. Here, we explore the feasibility of using Large Language Models (LLMs) and Optical Character Recognition (OCR) to rapidly extract data for 51 morphological characters of Australian native and introduced Asteraceae (daisy family) to populate a taxon × character table. ChatGPT 4o was used to process all 1,121 descriptions, which, following currently accepted taxonomy and after accounting for taxa with descriptions in multiple sources, comprise data for 95 genera and 838 species or infraspecific taxa, totalling 945 taxa. The missing data rate is 51.1%. Visual checking of 109 profiles revealed an error rate of 5.8%, a majority of them misapplication of data to the wrong trait based on confusion between different kinds of bracts and between individual involucral bracts and the involucre as a whole. Error rates were lowest for cypsela and pappus characters, at 2.1%. When repeating 109 inferences with the same LLM, 78.9% of the table cells for which at least one replicate had data showed no substantive difference; the main source of inconsistency was 16.7% of those cells having data in only one replicate. When repeating 109 inferences with an open source LLM run on a local computer, results were considerably less reproducible and showed numerous unit errors, irrelevant information being retrieved, and characters being skipped. Our results suggest that while mining of morphological descriptions with LLMs is possible in principle, instructions for the LLM have to be extremely precise. Even then, in contrast to scripting approaches, LLMs are inherently probabilistic. This makes their responses not fully reproducible and their integration into automated workflows difficult. Future work could explore if results can be improved using approaches such as Retrieval Augmented Generation or fine tuning of models on explanations of morphological terminology. The scripts used in the study and the extracted morphological data for Australian Asteraceae data are made available to support future studies.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12381580PMC
http://dx.doi.org/10.3897/phytokeys.261.158396DOI Listing

Publication Analysis

Top Keywords

morphological data
20
data
14
language models
8
morphological
8
data taxonomic
8
taxonomic descriptions
8
australian asteraceae
8
repeating 109
8
109 inferences
8
descriptions
5

Similar Publications

Estimation of Brachial-Ankle Pulse Wave Velocity With Hierarchical Regression Model From Wrist Photoplethysmography and Electrocardiographic Signals: Method Design.

JMIR Biomed Eng

August 2025

Cardiovascular Center and Divisions of Cardiology and Hospital Medicine, Department of Internal Medicine, National Taiwan University Hospital, No.7, Chung Shan S Rd, Taipei, 100225, Taiwan, 886 2-2312-3456.

Background: Photoplethysmography (PPG) signals captured by wearable devices can provide vascular age information and support pervasive and long-term monitoring of personal health condition.

Objective: In this study, we aimed to estimate brachial-ankle pulse wave velocity (baPWV) from wrist PPG and electrocardiography (ECG) from smartwatch.

Methods: A total of 914 wrist PPG and ECG sequences and 278 baPWV measurements were collected via the smartwatch from 80 men and 82 women with average age of 63.

View Article and Find Full Text PDF

Pediatric high-grade gliomas remain a significant therapeutic challenge due to their resistance to conventional treatments. The aim of this study was to investigate the cytotoxic potential of solamargine (SM), a natural glycoalkaloid, alone and in combination with the chemotherapeutic agent temozolomide (TMZ) against the human KNS-42 glioma cell line. Solamargine significantly reduced cell viability and proliferation in a concentration-, time-, and hypoxia-dependent manner, while selectively sparing non-tumor human astrocytes (NHA).

View Article and Find Full Text PDF

Evaluating the Long-term Effects of Microfocused Ultrasound on Facial Tightening Using Quantitative Instruments: Efficacy and Safety.

Aesthetic Plast Surg

September 2025

Department of Plastic Surgery, The First Affiliated Hospital, Jinan University, No. 613 West, Huangpu Avenue, Guangzhou, 510630, Guangdong Province, China.

Background: Microfocused ultrasound (MFU) is a non-invasive technique used for facial rejuvenation, yet there is limited quantitative data on its long-term effects. This study aimed to evaluate the long-term efficacy and safety of MFU for facial rejuvenation. We utilized standardized photography along with advanced skin assessment technologies to analyze the impact of MFU on facial morphology, skin function, and patient satisfaction over a 12-month period.

View Article and Find Full Text PDF

The European Council recommends adopting risk-based screening when relevant. In triaging HPV-positive women, it can be an effective strategy to reduce overtreatment and referral to colposcopy. HPV genotyping and p16/ki67 expression may allow a better risk stratification than cytology.

View Article and Find Full Text PDF

Low-Density Lipoprotein Receptor-Related Protein 11 Promotes Proliferation in Lung Adenocarcinoma.

Cancer Sci

September 2025

Section of Oncopathology and Morphological Pathology, Department of Pathology, Faculty of Medicine, University of Miyazaki, Miyazaki, Japan.

Low-density lipoprotein receptor-related protein 11 (LRP11) is reported to be overexpressed in various cancers; however, its functional role in lung adenocarcinoma remains poorly understood. This study aimed to elucidate the tumor-promoting function of LRP11 in lung adenocarcinoma. We assessed the expression and function of LRP11 in lung adenocarcinoma cell lines through both silencing and overexpression experiments.

View Article and Find Full Text PDF