Publications by Iain S Forrest | LitMetric

Publications by authors named "Iain S Forrest"

Page 1 of 2

Machine learning-based penetrance of genetic variants.

Iain S Forrest , Ha My T Vy , Ghislain Rocheleau , Daniel M Jordan , Ben O Petrazzini

Science

August 2025

Accurate variant penetrance estimation is crucial for precision medicine. We constructed machine learning (ML) models for 10 diseases using 1,347,298 participants with electronic health records, then applied them to an independent cohort with linked exome data. Resulting probabilities were used to evaluate ML penetrance of 1648 rare variants in 31 autosomal dominant disease-predisposition genes.

View Article and Find Full Text PDF

Genetic analyses of eight complex diseases using predicted continuous representations of disease.

Robert Chen , Ghislain Rocheleau , Ben Omega Petrazzini , Iain S Forrest , Joshua K Park

Cell Rep Methods

August 2025

We evaluated whether predicted continuous disease representations could enhance genetic discovery beyond case-control genome-wide association study (GWAS) phenotypes across eight complex diseases in up to 485,448 UK Biobank participants. Predicted phenotypes had high genetic correlations with case-control phenotypes (median r = 0.66) but identified more independent associations (median 306 versus 125).

View Article and Find Full Text PDF

Using large-scale population-based data to improve disease risk assessment of clinical variants.

Iain S Forrest , Kuan-Lin Huang , Julie M Eggington , Wendy K Chung , Daniel M Jordan

Nat Genet

July 2025

Understanding the disease risk of genetic variants is fundamental to precision medicine. Estimates of penetrance-the probability of disease for individuals with a variant allele-rely on disease-specific cohorts, clinical testing and emerging electronic health record (EHR)-linked biobanks. These data sources, while valuable, each have limitations in quality, representativeness and analyzability.

View Article and Find Full Text PDF

Evaluation of a machine learning-based metabolic marker for coronary artery disease in the UK Biobank.

Kyle Gibson , Iain S Forrest , Ben O Petrazzini , Áine Duffy , Joshua K Park

Atherosclerosis

February 2025

Background And Aims: An in silico quantitative score of coronary artery disease (ISCAD), built using machine learning and clinical data from electronic health records, has been shown to result in gradations of risk of subclinical atherosclerosis, coronary artery disease (CAD) sequelae, and mortality. Large-scale metabolite biomarker profiling provides increased portability and objectivity in machine learning for disease prediction and gradation. However, these models have not been fully leveraged.

View Article and Find Full Text PDF

Ensemble and consensus approaches to prediction of recessive inheritance for missense variants in human disease.

Ben O Petrazzini , Daniel J Balick , Iain S Forrest , Judy Cho , Ghislain Rocheleau

Cell Rep Methods

December 2024

Article Synopsis

Mode of inheritance (MOI) is crucial for understanding pathogenic variants, yet most variants lack this information, particularly impacting recessive diseases.
MOI-Pred and ConMOI are new tools developed to predict variant pathogenicity by incorporating MOI, with MOI-Pred focusing on both dominant and recessive variants through evolutionary and functional data.
Both tools have shown high accuracy in benchmarks and real-world evaluations, with ConMOI outperforming individual methods, underscoring the benefits of using a consensus approach for variant predictions.

View Article and Find Full Text PDF

Exome sequence analysis identifies rare coding variants associated with a machine learning-based marker for coronary artery disease.

Ben Omega Petrazzini , Iain S Forrest , Ghislain Rocheleau , Ha My T Vy , Carla Márquez-Luna

Nat Genet

July 2024

Article Synopsis

Coronary artery disease (CAD) involves a mix of risk factors and processes, and a new machine learning-based score can help track its progression and severity.
Researchers tested this score against rare gene variants in different biobanks and found significant associations in 17 genes, with 14 receiving prior support related to CAD.
The study suggests that there are likely more ultrarare gene variants associated with CAD, highlighting how digital tools can improve genetic research in complex diseases.

View Article and Find Full Text PDF

Muesli Intake May Protect Against Coronary Artery Disease: Mendelian Randomization on 13 Dietary Traits.

Joshua K Park , Ben Omega Petrazzini , Shantanu Bafna , Áine Duffy , Iain S Forrest

JACC Adv

April 2024

Background: Diet is a key modifiable risk factor of coronary artery disease (CAD). However, the causal effects of specific dietary traits on CAD risk remain unclear. With the expansion of dietary data in population biobanks, Mendelian randomization (MR) could help enable the efficient estimation of causality in diet-disease associations.

View Article and Find Full Text PDF

Genome-first evaluation with exome sequence and clinical data uncovers underdiagnosed genetic disorders in a large healthcare system.

Iain S Forrest , Áine Duffy , Joshua K Park , Ha My T Vy , Louis R Pasquale

Cell Rep Med

May 2024

Article Synopsis

Population-based genomic screening helps identify individuals at risk for diseases by analyzing their genetic variants alongside their health records.
In a study of over 29,000 participants, researchers found 614 individuals with significant genetic variants, but 76% of these cases had no prior clinical diagnosis.
The findings suggest that genomic screening may uncover previously undiagnosed conditions, showing a higher prevalence of harmful genetic variants than clinical diagnoses and illustrating the importance of genetic testing in identifying untreated diseases.

View Article and Find Full Text PDF

Development of a human genetics-guided priority score for 19,365 genes and 399 drug indications.

Áine Duffy , Ben Omega Petrazzini , David Stein , Joshua K Park , Iain S Forrest

Nat Genet

January 2024

Studies have shown that drug targets with human genetic support are more likely to succeed in clinical trials. Hence, a tool integrating genetic evidence to prioritize drug target genes is beneficial for drug discovery. We built a genetic priority score (GPS) by integrating eight genetic features with drug indications from the Open Targets and SIDER databases.

View Article and Find Full Text PDF

Machine learning-based markers for CAD - Authors' reply.

Iain S Forrest , Ben O Petrazzini , Ron Do

Lancet

July 2023

View Article and Find Full Text PDF

Cholesterol Contributes to Risk, Severity, and Machine Learning-Driven Diagnosis of Lyme Disease.

Iain S Forrest , Anya J O'Neal , Joao H F Pedra , Ron Do

Clin Infect Dis

September 2023

Background: Lyme disease is the most prevalent vector-borne disease in the US, yet its host factors are poorly understood and diagnostic tests are limited. We evaluated patients in a large health system to uncover cholesterol's role in the susceptibility, severity, and machine learning-based diagnosis of Lyme disease.

Methods: A longitudinal health system cohort comprised 1 019 175 individuals with electronic health record data and 50 329 with linked genetic data.

View Article and Find Full Text PDF

A machine learning model identifies patients in need of autoimmune disease testing using electronic health records.

Iain S Forrest , Ben O Petrazzini , Áine Duffy , Joshua K Park , Anya J O'Neal

Nat Commun

April 2023

Systemic autoimmune rheumatic diseases (SARDs) can lead to irreversible damage if left untreated, yet these patients often endure long diagnostic journeys before being diagnosed and treated. Machine learning may help overcome the challenges of diagnosing SARDs and inform clinical decision-making. Here, we developed and tested a machine learning model to identify patients who should receive rheumatological evaluation for SARDs using longitudinal electronic health records of 161,584 individuals from two institutions.

View Article and Find Full Text PDF

Phenome-wide Mendelian randomization study of plasma triglyceride levels and 2600 disease traits.

Joshua K Park , Shantanu Bafna , Iain S Forrest , Áine Duffy , Carla Marquez-Luna

Elife

March 2023

Background: Causality between plasma triglyceride (TG) levels and atherosclerotic cardiovascular disease (ASCVD) risk remains controversial despite more than four decades of study and two recent landmark trials, STRENGTH, and REDUCE-IT. Further unclear is the association between TG levels and non-atherosclerotic diseases across organ systems.

Methods: Here, we conducted a phenome-wide, two-sample Mendelian randomization (MR) analysis using inverse-variance weighted (IVW) regression to systematically infer the causal effects of plasma TG levels on 2600 disease traits in the European ancestry population of UK Biobank.

View Article and Find Full Text PDF

Machine learning-based marker for coronary artery disease: derivation and validation in two longitudinal cohorts.

Iain S Forrest , Ben O Petrazzini , Áine Duffy , Joshua K Park , Carla Marquez-Luna

Lancet

January 2023

Background: Binary diagnosis of coronary artery disease does not preserve the complexity of disease or quantify its severity or its associated risk with death; hence, a quantitative marker of coronary artery disease is warranted. We evaluated a quantitative marker of coronary artery disease derived from probabilities of a machine learning model.

Methods: In this cohort study, we developed and validated a coronary artery disease-predictive machine learning model using 95 935 electronic health records and assessed its probabilities as in-silico scores for coronary artery disease (ISCAD; range 0 [lowest probability] to 1 [highest probability]) in participants in two longitudinal biobank cohorts.

View Article and Find Full Text PDF

A tissue-level phenome-wide network map of colocalized genes and phenotypes in the UK Biobank.

Ghislain Rocheleau , Iain S Forrest , Áine Duffy , Shantanu Bafna , Amanda Dobbyn

Commun Biol

August 2022

Phenome-wide association studies identified numerous loci associated with traits and diseases. To help interpret these associations, we constructed a phenome-wide network map of colocalized genes and phenotypes. We generated colocalized signals using the Genotype-Tissue Expression data and genome-wide association results in UK Biobank.

View Article and Find Full Text PDF

Penetrance of Deleterious Clinical Variants-Reply.

Iain S Forrest , Girish N Nadkarni , Ron Do

JAMA

May 2022

View Article and Find Full Text PDF

Genome-first recall of healthy individuals by polygenic risk score reveals differences in coronary artery calcium.

Iain S Forrest , Lili Chan , Kumardeep Chaudhary , Aparna Saha , Huei Hsun Wen

Am Heart J

August 2022

Genetic risk for coronary artery disease (CAD) is commonly measured with polygenic risk scores (PRS); yet, the relationship of atherosclerotic burden with PRS in healthy individuals not at high clinical risk for CAD (ie, without a high pooled cohort equations [PCE] score) is unknown. Here, we implemented a novel recall-by-PRS strategy to measure coronary artery calcium (CAC) scores prospectively in 53 healthy individuals with extreme high PRS (median [IQR] PRS = 94% [83-98]) and low PRS (median [IQR] PRS = 3.6% [1.

View Article and Find Full Text PDF

Coronary Risk Estimation Based on Clinical Data in Electronic Health Records.

Ben O Petrazzini , Kumardeep Chaudhary , Carla Márquez-Luna , Iain S Forrest , Ghislain Rocheleau

J Am Coll Cardiol

March 2022

Background: Clinical features from electronic health records (EHRs) can be used to build a complementary tool to predict coronary artery disease (CAD) susceptibility.

Objectives: The purpose of this study was to determine whether an EHR score can improve CAD prediction and reclassification 1 year before diagnosis, beyond conventional clinical guidelines as determined by the pooled cohort equations (PCE) and a polygenic risk score for CAD.

Methods: We applied a machine learning framework using clinical features from the EHR in a multiethnic, clinical care cohort (BioMe) comprising 555 CAD cases and 6,349 control subjects and in a population-based cohort (UK Biobank) comprising 3,130 CAD cases and 378,344 control subjects for external validation.

View Article and Find Full Text PDF

Genetic and phenotypic profiling of supranormal ejection fraction reveals decreased survival and underdiagnosed heart failure.

Iain S Forrest , Ghislain Rocheleau , Shantanu Bafna , Edgar Argulian , Jagat Narula

Eur J Heart Fail

November 2022

Aims: Individuals with supranormal left ventricular ejection fraction (snLVEF; LVEF >70%) have increased mortality. However, the genetic and phenotypic profile of snLVEF remains unknown. This study aimed to determine the relationship of both snLVEF genetic risk and phenotype with survival and underdiagnosed heart failure (HF).

View Article and Find Full Text PDF

Population-Based Penetrance of Deleterious Clinical Variants.

Iain S Forrest , Kumardeep Chaudhary , Ha My T Vy , Ben O Petrazzini , Shantanu Bafna

JAMA

January 2022

Importance: Population-based assessment of disease risk associated with gene variants informs clinical decisions and risk stratification approaches.

Objective: To evaluate the population-based disease risk of clinical variants in known disease predisposition genes.

Design, Setting, And Participants: This cohort study included 72 434 individuals with 37 780 clinical variants who were enrolled in the BioMe Biobank from 2007 onwards with follow-up until December 2020 and the UK Biobank from 2006 to 2010 with follow-up until June 2020.

View Article and Find Full Text PDF

Derivation and Validation of Genome-Wide Polygenic Score for Ischemic Heart Failure.

Ishan Paranjpe , Noah L Tsao , Jessica K De Freitas , Renae Judy , Kumardeep Chaudhary , Iain S Forrest

J Am Heart Assoc

November 2021

Background Despite advances in cardiovascular disease and risk factor management, mortality from ischemic heart failure (HF) in patients with coronary artery disease (CAD) remains high. Given the partial role of genetics in HF and lack of reliable risk stratification tools, we developed and validated a polygenic risk score for HF in patients with CAD, which we term HF-PRS. Methods and Results Using summary statistics from a recent genome-wide association study for HF, we developed candidate PRSs in the Mount Sinai Bio CAD patient cohort (N=6274) by using the pruning and thresholding method and LDPred.

View Article and Find Full Text PDF

Non-invasive ventilation versus mechanical ventilation in hypoxemic patients with COVID-19.

Iain S Forrest , Suraj K Jaladanki , Ishan Paranjpe , Benjamin S Glicksberg , Girish N Nadkarni

Infection

October 2021

Purpose: Limited mechanical ventilators (MV) during the Coronavirus disease (COVID-19) pandemic have led to the use of non-invasive ventilation (NIV) in hypoxemic patients, which has not been studied well. We aimed to assess the association of NIV versus MV with mortality and morbidity during respiratory intervention among hypoxemic patients admitted with COVID-19.

Methods: We performed a retrospective multi-center cohort study across 5 hospitals during March-April 2020.

View Article and Find Full Text PDF

Genetic pleiotropy of ERCC6 loss-of-function and deleterious missense variants links retinal dystrophy, arrhythmia, and immunodeficiency in diverse ancestries.

Iain S Forrest , Kumardeep Chaudhary , Ha My T Vy , Shantanu Bafna , Soyeon Kim

Hum Mutat

August 2021

Biobanks with exomes linked to electronic health records (EHRs) enable the study of genetic pleiotropy between rare variants and seemingly disparate diseases. We performed robust clinical phenotyping of rare, putatively deleterious variants (loss-of-function [LoF] and deleterious missense variants) in ERCC6, a gene implicated in inherited retinal disease. We analyzed 213,084 exomes, along with a targeted set of retinal, cardiac, and immune phenotypes from two large-scale EHR-linked biobanks.

View Article and Find Full Text PDF

Genome-wide polygenic risk score for retinopathy of type 2 diabetes.

Iain S Forrest , Kumardeep Chaudhary , Ishan Paranjpe , Ha My T Vy , Carla Marquez-Luna

Hum Mol Genet

May 2021

Diabetic retinopathy (DR) is a common consequence in type 2 diabetes (T2D) and a leading cause of blindness in working-age adults. Yet, its genetic predisposition is largely unknown. Here, we examined the polygenic architecture underlying DR by deriving and assessing a genome-wide polygenic risk score (PRS) for DR.

View Article and Find Full Text PDF

Tissue-specific genetic features inform prediction of drug side effects in clinical trials.

Áine Duffy , Marie Verbanck , Amanda Dobbyn , Hong-Hee Won , Joshua L Rein , Iain S Forrest

Sci Adv

September 2020

Adverse side effects often account for the failure of drug clinical trials. We evaluated whether a phenome-wide association study (PheWAS) of 1167 phenotypes in >360,000 U.K.

View Article and Find Full Text PDF