Reporting of Fairness Metrics in Clinical Risk Prediction Models Used for Precision Health: Scoping Review.

Lillian Rountree , Yi-Ting Lin , Chuyu Liu , Maxwell Salvatore , Andrew Admon , Brahmajee Nallamothu , Karandeep Singh , Anirban Basu , Fan Bu , Bhramar Mukherjee

Online J Public Health Inform

Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT, United States.

Published: March 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Background: Clinical risk prediction models integrated into digitized health care informatics systems hold promise for personalized primary prevention and care, a core goal of precision health. Fairness metrics are important tools for evaluating potential disparities across sensitive features, such as sex and race or ethnicity, in the field of prediction modeling. However, fairness metric usage in clinical risk prediction models remains infrequent, sporadic, and rarely empirically evaluated.

Objective: We seek to assess the uptake of fairness metrics in clinical risk prediction modeling through an empirical evaluation of popular prediction models for 2 diseases, 1 chronic and 1 infectious disease.

Methods: We conducted a scoping literature review in November 2023 of recent high-impact publications on clinical risk prediction models for cardiovascular disease (CVD) and COVID-19 using Google Scholar.

Results: Our review resulted in a shortlist of 23 CVD-focused articles and 22 COVID-19 pandemic-focused articles. No articles evaluated fairness metrics. Of the CVD-focused articles, 26% used a sex-stratified model, and of those with race or ethnicity data, 92% had study populations that were more than 50% from 1 race or ethnicity. Of the COVID-19 models, 9% used a sex-stratified model, and of those that included race or ethnicity data, 50% had study populations that were more than 50% from 1 race or ethnicity. No articles for either disease stratified their models by race or ethnicity.

Conclusions: Our review shows that the use of fairness metrics for evaluating differences across sensitive features is rare, despite their ability to identify inequality and flag potential gaps in prevention and care. We also find that training data remain largely racially and ethnically homogeneous, demonstrating an urgent need for diversifying study cohorts and data collection. We propose an implementation framework to initiate change, calling for better connections between theory and practice when it comes to the adoption of fairness metrics for clinical risk prediction. We hypothesize that this integration will lead to a more equitable prediction world.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11966066	PMC
http://dx.doi.org/10.2196/66598	DOI Listing

Publication Analysis

Top Keywords

fairness metrics

clinical risk

risk prediction

prediction models

race ethnicity

metrics clinical

prediction

precision health

prevention care

sensitive features

Similar Publications

Applications of Federated Large Language Model for Adverse Drug Reactions Prediction: Scoping Review.

J Med Internet Res

September 2025

Department of Information Systems and Cybersecurity, The University of Texas at San Antonio, 1 UTSA Circle, San Antonio, TX, 78249, United States, 1 (210) 458-6300.

David Guo , Kim-Kwang Raymond Choo

Background: Adverse drug reactions (ADR) present significant challenges in health care, where early prevention is vital for effective treatment and patient safety. Traditional supervised learning methods struggle to address heterogeneous health care data due to their unstructured nature, regulatory constraints, and restricted access to sensitive personal identifiable information.

Objective: This review aims to explore the potential of federated learning (FL) combined with natural language processing and large language models (LLMs) to enhance ADR prediction.

View Article and Find Full Text PDF

Similar Publications

Detecting, Characterizing, and Mitigating Implicit and Explicit Racial Biases in Health Care Datasets With Subgroup Learnability: Algorithm Development and Validation Study.

J Med Internet Res

September 2025

Icahn School of Medicine at Mount Sinai, 1468 Madison Avenue, New York, NY, 10029, United States, 1 2122416500.

Faris Gulamali , Ashwin Shreekant Sawant , Lora Liharska , Carol Horowitz , Lili Chan

Background: The growing adoption of diagnostic and prognostic algorithms in health care has led to concerns about the perpetuation of algorithmic bias against disadvantaged groups of individuals. Deep learning methods to detect and mitigate bias have revolved around modifying models, optimization strategies, and threshold calibration with varying levels of success and tradeoffs. However, there have been limited substantive efforts to address bias at the level of the data used to generate algorithms in health care datasets.

View Article and Find Full Text PDF

Similar Publications

A fairness scale for real-time recidivism forecasts using a national database of convicted offenders.

Neural Comput Appl

August 2025

Institute of Criminology, University of Cambridge, Sidgwick Ave, Cambridge, CB3 9DA UK.

Jacob Verrey , Peter Neyroud , Lawrence Sherman , Barak Ariel

Unlabelled: This investigation explores whether machine learning can predict recidivism while addressing societal biases. To investigate this, we obtained conviction data from the UK's Police National Computer (PNC) on 346,685 records between January 1, 2000, and February 3, 2006 (His Majesty's Inspectorate of Constabulary in Use of the Police National Computer: An inspection of the ACRO Criminal Records Office. His Majesty's Inspectorate of Constabulary, Birmingham, https://assets-hmicfrs.

View Article and Find Full Text PDF

Similar Publications

Towards Fairness in Synthetic Healthcare Data: A Framework for the Evaluation of Synthetization Algorithms.

Stud Health Technol Inform

September 2025

Institute of Medical Informatics, University of Münster, Münster, Germany.

Yannik Warnecke , Martin Kuhn , Felix Diederichs , Tobias J Brix , Lena Clever

Introduction: Synthetic data generation is a rapidly evolving field, with significant potential for improving data privacy. However, evaluating the performance of synthetic data generation methods, especially the tradeoff between fairness and utility of the generated data, remains a challenge.

Methodology: In this work, we present our comprehensive framework, which evaluates fair synthetic data generation methods, benchmarking them against state-of-the-art synthesizers.

View Article and Find Full Text PDF

Similar Publications

Focused Professional Practice Evaluation and the Weaponization of Anonymous Reporting: Erosion of Care and Fairness.

Cureus

July 2025

Spine Surgery, Inspired Spine-Avicenna Technical University (ATU), Minneapolis, USA.

Dominic Moore , Jiawen Zhan , Ross Matlack , Mohammad Zaki , Uzma Samadani

Importance Focused professional practice evaluation (FPPE) was designed to increase patient safety, and anonymous reporting systems are often implemented to empower staff at all levels of an organization to speak up without the fear of reproach. However, the impact of these systems on patient outcomes remains largely unexamined. In this study, we survey physicians to assess whether they believe these mechanisms improve patient care or well-being.

View Article and Find Full Text PDF

Similar Publications