Evaluating the predictive performance of presence-absence models: Why can the same model appear excellent or poor?

Nerea Abrego , Otso Ovaskainen

Ecol Evol

Department of Biological and Environmental Science University of Jyväskylä Jyväskylä Finland.

Published: December 2023

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

When comparing multiple models of species distribution, models yielding higher predictive performance are clearly to be favored. A more difficult question is how to decide whether even the best model is "good enough". Here, we clarify key choices and metrics related to evaluating the predictive performance of presence-absence models. We use a hierarchical case study to evaluate how four metrics of predictive performance (AUC, Tjur's , max-Kappa, and max-TSS) relate to each other, the random and fixed effects parts of the model, the spatial scale at which predictive performance is measured, and the cross-validation strategy chosen. We demonstrate that the very same metric can achieve different values for the very same model, even when similar cross-validation strategies are followed, depending on the spatial scale at which predictive performance is measured. Among metrics, Tjur's and max-Kappa generally increase with species' prevalence, whereas AUC and max-TSS are largely independent of prevalence. Thus, Tjur's and max-Kappa often reach lower values when measured at the smallest scales considered in the study, while AUC and max-TSS reaching similar values across the different spatial levels included in the study. However, they provide complementary insights on predictive performance. The very same model may appear excellent or poor not only due to the applied metric, but also how predictive performance is exactly calculated, calling for great caution on the interpretation of predictive performance. The most comprehensive evaluation of predictive performance can be obtained by evaluating predictive performance through the combination of measures providing complementary insights. Instead of following simple rules of thumb or focusing on absolute values, we recommend comparing the achieved predictive performance to the researcher's own a priori expectations on how easy it is to make predictions related to the same question that the model is used for.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10726276	PMC
http://dx.doi.org/10.1002/ece3.10784	DOI Listing

Publication Analysis

Top Keywords

predictive performance

evaluating predictive

performance

tjur's max-kappa

predictive

performance presence-absence

presence-absence models

model appear

appear excellent

spatial scale

Similar Publications

Early warning of harmful cyanobacteria blooms based on high frequency in situ monitoring and intelligible machine learning modelling: The case study of Lake Müggelsee (Germany).

Water Res

September 2025

Leibniz Institute of Freshwater Ecology and Inland Fisheries, Berlin, Germany.

Friedrich Recknagel , Kun Shan , Rita Adrian , Jan Köhler

Driven by eutrophication and global warming, the occurrence and frequency of harmful cyanobacteria blooms (CyanoHABs) are increasing worldwide, posing a serious threat to human health and biodiversity. Early warning enables precautional control measures of CyanoHABs within water bodies and in water works, and it becomes operational with high frequency in situ data (HFISD) of water quality and forecasting models by machine learning (ML). However, the acceptance of early warning systems by end-users relies significantly on the interpretability and generalizability of underlying models, and their operability.

View Article and Find Full Text PDF

Similar Publications

Integrating opinion dynamics and differential game modeling for sustainable groundwater management.

Water Res

September 2025

College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China. Electronic address:

Hao Chen , Changxin Xu , Chenchen Tong

Groundwater overextraction presents persistent challenges due to strategic interdependence among decentralized users. While game-theoretic models have advanced the analysis of individual incentives and collective outcomes, most frameworks assume fully rational agents and neglect the role of cognitive and social factors. This study proposes a coupled model that integrates opinion dynamics with a differential game of groundwater extraction, capturing the interaction between institutional authority and evolving stakeholder preferences.

View Article and Find Full Text PDF

Similar Publications

Predicting In-Hospital Cardiac Arrest Using Machine Learning Models: Protocol for a Scoping Review.

JMIR Res Protoc

September 2025

University of Nevada, Las Vegas, Las Vegas, NV, United States.

Mina Attin , Bryar Shareef , Nelson Appiah-Agyei , Farzana Mahamud Rini , Xan Goodman

Background: In-hospital cardiac arrest (IHCA) remains a public health conundrum with high morbidity and mortality rates. While early identification of high-risk patients could enable preventive interventions and improve survival, evidence on the effectiveness of current prediction methods remains inconclusive. Limited research exists on patients' prearrest pathophysiological status and predictive and prognostic factors of IHCA, highlighting the need for a comprehensive synthesis of predictive methodologies.

View Article and Find Full Text PDF

Similar Publications

A Novel CART-Driven Decision Tree Combining NLR and CRP for Early Prognostication of Severe Acute Pancreatitis: A Prospective Vietnamese Cohort Study.

Clin Transl Gastroenterol

September 2025

Department of Internal Medicine, School of Medicine, University of Medicine and Pharmacy at Ho Cho Minh City, Vietnam.

Tien Manh Huynh , An Tran , Duy Thanh Tran , Yen Hoang Thi Dao , Thong Duy Vo

Background: Severe acute pancreatitis (SAP) is a life-threatening condition requiring early risk stratification. While the Bedside Index for Severity in Acute Pancreatitis (BISAP) is widely used, its reliance on complex parameters limits its applicability in resource-constrained settings. This study introduces a decision tree model based on Classification and Regression Tree (CART) analysis, utilizing Neutrophil-to-Lymphocyte Ratio (NLR) and C-reactive Protein (CRP), as a simpler alternative for early SAP prediction.

View Article and Find Full Text PDF

Similar Publications

STmiR: A Novel XGBoost-based framework for spatially resolved miRNA activity prediction in cancer transcriptomics.

PLoS One

September 2025

Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China.

Jiaqi Yuan , Peng Xu , Zheng Ye , Wenbin Liu

MicroRNAs (miRNAs) are critical regulators of gene expression in cancer biology, yet their spatial dynamics within tumor microenvironments (TMEs) remain underexplored due to technical limitations in current spatial transcriptomics (ST) technologies. To address this gap, we present STmiR, a novel XGBoost-based framework for spatially resolved miRNA activity prediction. STmiR integrates bulk RNA-seq data (TCGA and CCLE) with spatial transcriptomics profiles to model nonlinear miRNA-mRNA interactions, achieving high predictive accuracy (Spearman's ρ > 0.

View Article and Find Full Text PDF

Similar Publications