An active learning pipeline to automatically identify candidate terms for a CDSS ontology-measures, experiments, and performance.

Shailesh Alluri , Keerthana Komatineni , Rohan Goli , Richard D Boyce , Nina Hubig , Hua Min , Yang Gong , Dean F Sittig , David Robinson , Paul Biondich , Adam Wright , Christian Nøhr , Timothy Law , Arild Faxvaag , Ronald Gimbel , Lior Rennert , Xia Jing

medRxiv

Department of Public Health Sciences, College of Behavioral, Social and Health Sciences, Clemson University, Clemson, SC.

Published: August 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Objective: To explore new strategies to make the document selection process more transparent, reproducible, and effective for the active learning process. The ultimate goal is to leverage active learning in identifying keyphrases to facilitate ontology development and construction, to streamline the process, and help with the long-term maintenance.

Methods: The active learning pipeline used a BILSTM-CRF model and over 2900 abstracts retrieved from PubMed relevant to clinical decision support systems. We started the model training with synthetic labeled abstracts, then used different strategies to select domain experts' annotated abstracts (gold standards). Random sampling was used as the baseline. Recall, F1 (beta = 1, 5, and 10) scores are used as measures to compare the performance of the active learning pipeline by different strategies.

Results: We tested four novel document-level uncertainty aggregation strategies-KPSum, KPAvg, DOCSum, and DOCAvg-that operate over standard token-level uncertainty scores such as Maximum Token Probability (MTP), Token Entropy (TE), and Margin. All strategies show significant improvement in early active learning cycles (θ to θ) for recall and F1. The systematic evaluations show that KPSum (actual order) shows consistent improvement in both recall and F1 and KPSum (actual order) shows better results than the random sampling results. The document order (actual versus reverse) does not seem to play a critical role across strategies in model learning and performance in our datasets, although in some strategies, actual order shows slightly more effective results. The weighted F1 (beta = 5 and 10) provided complementary results to raw recall and F1 (beta = 1).

Conclusion: While prior work on uncertainty sampling typically focuses on token-level uncertainty metrics within generic NER tasks, our work advances this line of research by introducing a higher-level abstraction: document-level uncertainty aggregation. With a human-in-the-loop Active Learning pipeline, it can effectively prioritize high-impact documents, improve early-cycle recall, and reduce annotation effort. Our results show promise in automating part of ontology construction and maintenance work, i.e., monitoring and screening new publications to identify candidate keyphrases. However, future work needs to improve the model performance to make it usable in real-world operations.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12047919	PMC
http://dx.doi.org/10.1101/2025.04.15.25325868	DOI Listing

Publication Analysis

Top Keywords

active learning

learning pipeline

actual order

identify candidate

random sampling

recall beta

document-level uncertainty

uncertainty aggregation

token-level uncertainty

kpsum actual

A PHP Error was encountered