Quality indices for topic model selection and evaluation: a literature review and case study.

Christopher Meaney , Therese A Stukel , Peter C Austin , Rahim Moineddin , Michelle Greiver , Michael Escobar

BMC Med Inform Decis Mak

Dalla Lana School of Public Health, University of Toronto, Toronto, Canada.

Published: July 2023

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background: Topic models are a class of unsupervised machine learning models, which facilitate summarization, browsing and retrieval from large unstructured document collections. This study reviews several methods for assessing the quality of unsupervised topic models estimated using non-negative matrix factorization. Techniques for topic model validation have been developed across disparate fields. We synthesize this literature, discuss the advantages and disadvantages of different techniques for topic model validation, and illustrate their usefulness for guiding model selection on a large clinical text corpus.

Design, Setting And Data: Using a retrospective cohort design, we curated a text corpus containing 382,666 clinical notes collected between 01/01/2017 through 12/31/2020 from primary care electronic medical records in Toronto Canada.

Methods: Several topic model quality metrics have been proposed to assess different aspects of model fit. We explored the following metrics: reconstruction error, topic coherence, rank biased overlap, Kendall's weighted tau, partition coefficient, partition entropy and the Xie-Beni statistic. Depending on context, cross-validation and/or bootstrap stability analysis were used to estimate these metrics on our corpus.

Results: Cross-validated reconstruction error favored large topic models (K ≥ 100 topics) on our corpus. Stability analysis using topic coherence and the Xie-Beni statistic also favored large models (K = 100 topics). Rank biased overlap and Kendall's weighted tau favored small models (K = 5 topics). Few model evaluation metrics suggested mid-sized topic models (25 ≤ K ≤ 75) as being optimal. However, human judgement suggested that mid-sized topic models produced expressive low-dimensional summarizations of the corpus.

Conclusions: Topic model quality indices are transparent quantitative tools for guiding model selection and evaluation. Our empirical illustration demonstrated that different topic model quality indices favor models of different complexity; and may not select models aligning with human judgment. This suggests that different metrics capture different aspects of model goodness of fit. A combination of topic model quality indices, coupled with human validation, may be useful in appraising unsupervised topic models.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10362613	PMC
http://dx.doi.org/10.1186/s12911-023-02216-1	DOI Listing

Publication Analysis

Top Keywords

topic model

topic models

quality indices

model quality

topic

model

model selection

models

selection evaluation

unsupervised topic

Similar Publications

Testing for Similarity of Dose Response in Multiregional Clinical Trials.

Stat Med

September 2025

Statistical Methodology, Novartis Pharma AG, Basel, Switzerland.

Holger Dette , Lukas Koletzko , Frank Bretz

This article addresses the problem of determining whether the dose response relationships between subgroups and the full population in a multiregional trial are similar. Similarity is assessed in terms of the maximal deviation between the dose response curves. We consider a parametric framework and develop two powerful bootstrap tests: one for assessing the similarity between the dose response curves of a single subgroup and that of the full population, and another for comparing the dose response curves of multiple subgroups with that of the full population.

View Article and Find Full Text PDF

Similar Publications

Evaluation of satisfaction on additional postpartum care - a comparative, multicentre study.

BMC Pregnancy Childbirth

September 2025

Department of Obstetrics and Gynecology, Sahlgrenska Academy at Gothenburg University, Sahlgrenska University Hospital, Medicinaregatan 3, Gothenburg, SE- 413 45, Sweden.

Kajsa Sandberg Kedfors , Mattias Molin , Ingela Lindh

Background: A growing body of knowledge is questioning the timing of postpartum care (PPC) and suggesting a structural change. The primary aim was to evaluate individuals' satisfaction with additional PPC, and the secondary aim was to identify different needs postpartum.

Methods: This comparative study was conducted in six maternity clinics in Gothenburg, Sweden 2019-2020.

View Article and Find Full Text PDF

Similar Publications

The 2024 UK clinical guideline for the prevention and treatment of osteoporosis.

Arch Osteoporos

September 2025

School of Clinical Medicine, University of Cambridge, Cambridge, UK.

Celia L Gregson , David J Armstrong , Christina Avgerinou , Jean Bowden , Cyrus Cooper

Unlabelled: The National Osteoporosis Guideline Group (NOGG) has updated the revised UK guideline for the assessment and management of osteoporosis and the prevention of fragility fractures in postmenopausal women, and men age 50 years and older. This guideline is relevant for all healthcare professionals involved in osteoporosis management.

Introduction: The UK National Osteoporosis Guideline Group (NOGG) first produced a guideline on the prevention and treatment of osteoporosis in 2008, with updates in 2013, 2017 and 2021.

View Article and Find Full Text PDF

Similar Publications

Multiple large language models versus clinical guidelines for postmenopausal osteoporosis: a comparative study of ChatGPT-3.5, ChatGPT-4.0, ChatGPT-4o, Google Gemini, Google Gemini Advanced, and Microsoft Copilot.

Arch Osteoporos

September 2025

Department of Family Medicine, Chang-Gung Memorial Hospital, Linkou Branch, Taoyuan City, Taiwan.

Chun-Ru Lin , Yi-Jun Chen , Po-An Tsai , Wen-Yuan Hsieh , Sung Huang Laurent Tsai

Unlabelled: The study assesses the performance of AI models in evaluating postmenopausal osteoporosis. We found that ChatGPT-4o produced the most appropriate responses, highlighting the potential of AI to enhance clinical decision-making and improve patient care in osteoporosis management.

Purpose: The rise of artificial intelligence (AI) offers the potential for assisting clinical decisions.

View Article and Find Full Text PDF

Similar Publications

Superior vena cava isolation added to pulmonary vein isolation enhances outcomes in paroxysmal atrial fibrillation: a meta-analysis with trial sequential analysis.

BMJ Open

September 2025

Arrhythmia Center, Chinese Academy of Medical Sciences Fuwai Hospital, Beijing, China.

Wenchi Guan , Jun Liu , Xiaofeng Li , Keping Chen , Yan Yao

Objectives: To evaluate the efficacy and safety of adding Superior Vena Cava Isolation (SVCI) to Pulmonary Vein Isolation (PVI) in patients with drug-refractory paroxysmal atrial fibrillation (PAF).

Design: Systematic review and meta-analysis of randomised controlled trials (RCTs) using the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) approach, supplemented with Trial Sequential Analysis (TSA) to assess evidence sufficiency.

Data Sources: We searched PubMed, EMBASE, the Cochrane Library (CENTRAL) and Web of Science for relevant studies published up to 13 July 2025.

View Article and Find Full Text PDF

Similar Publications