A versatile, fast and unbiased method for estimation of gene-by-environment interaction effects on biobank-scale datasets.

Matteo Di Scipio , Mohammad Khan , Shihong Mao , Michael Chong , Conor Judge , Nazia Pathan , Nicolas Perrot , Walter Nelson , Ricky Lali , Shuang Di , Robert Morton , Jeremy Petch , Guillaume Paré

Nat Commun

Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, ON, Canada.

Published: August 2023

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Identification of gene-by-environment interactions (GxE) is crucial to understand the interplay of environmental effects on complex traits. However, current methods evaluating GxE on biobank-scale datasets have limitations. We introduce MonsterLM, a multiple linear regression method that does not rely on model specification and provides unbiased estimates of variance explained by GxE. We demonstrate robustness of MonsterLM through comprehensive genome-wide simulations using real genetic data from 325,989 individuals. We estimate GxE using waist-to-hip-ratio, smoking, and exercise as the environmental variables on 13 outcomes (N = 297,529-325,989) in the UK Biobank. GxE variance is significant for 8 environment-outcome pairs, ranging from 0.009 - 0.071. The majority of GxE variance involves SNPs without strong marginal or interaction associations. We observe modest improvements in polygenic score prediction when incorporating GxE. Our results imply a significant contribution of GxE to complex trait variance and we show MonsterLM to be well-purposed to handle this with biobank-scale data.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10457310	PMC
http://dx.doi.org/10.1038/s41467-023-40913-7	DOI Listing

Publication Analysis

Top Keywords

biobank-scale datasets

gxe

gxe variance

versatile fast

fast unbiased

unbiased method

method estimation

estimation gene-by-environment

gene-by-environment interaction

interaction effects

Similar Publications

Fast Phenotype Simulation for Genotype Representation Graphs.

bioRxiv

August 2025

Department of Computational Biology, Cornell University, Ithaca, NY.

Aditya Syam , Chris Adonizio , Xinzhu Wei

Motivation: The Genotype Representation Graph (GRG) [DeHaas et al., 2025] is a graph representation of whole genome polymorphisms, designed to encode the variant hard-call information in phased whole genomes. It encodes the genotypes as an extremely compact graph that can be traversed efficiently, enabling dynamic programming-style algorithms on applications such as genome-wide association studies that run faster on biobank-scale data than existing alternatives.

View Article and Find Full Text PDF

Similar Publications

Gut-brain nexus: Mapping multimodal links to neurodegeneration at biobank scale.

Sci Adv

August 2025

Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA.

Mohammad Shafieinouri , Samantha Hong , Paul Suhwan Lee , Spencer M Grant , Marzieh Khani

Alzheimer's disease (AD) and Parkinson's disease (PD) are influenced by genetic and environmental factors. We conducted a biobank-scale study to (i) identify endocrine, nutritional, metabolic, and digestive disorders with potential causal or temporal associations with AD/PD risk before diagnosis; (ii) assess plasma biomarkers' specificity for AD/PD in the context of co-occurring gut related traits and disorders; and (iii) integrate multimodal datasets to enhance AD/PD prediction. Our findings show that several disorders were associated with increased AD/PD risk before diagnosis, with variation in the strength and timing of associations across conditions.

View Article and Find Full Text PDF

Similar Publications

MutBERT: probabilistic genome representation improves genomics foundation models.

Bioinformatics

July 2025

Data Science and Analytics Thrust, Hong Kong University of Science and Technology (Guangzhou), Guangzhou, 511453, China.

Weicai Long , Houcheng Su , Jiaqi Xiong , Yanlin Zhang

Motivation: Understanding the genomic foundation of human diversity and disease requires models that effectively capture sequence variation, such as single nucleotide polymorphisms (SNPs). While recent genomic foundation models have scaled to larger datasets and multi-species inputs, they often fail to account for the sparsity and redundancy inherent in human population data, such as those in the 1000 Genomes Project. SNPs are rare in humans, and current masked language models (MLMs) trained directly on whole-genome sequences may struggle to efficiently learn these variations.

View Article and Find Full Text PDF

Similar Publications

Study design and the sampling of deleterious rare variants in biobank-scale datasets.

Proc Natl Acad Sci U S A

June 2025

Department of Human Genetics, University of Chicago, Chicago, IL 60637.

Margaret C Steiner , Daniel P Rice , Arjun Biddanda , Mariadaria K Ianni-Ravn , Christian Porras

One key component of study design in population genetics is the "geographic breadth" of a sample (i.e., how broad a region across which individuals are sampled).

View Article and Find Full Text PDF

Similar Publications

Analysis-ready VCF at Biobank scale using Zarr.

Gigascience

January 2025

Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, OX3 7LF, UK.

Eric Czech , Will Tyler , Tom White , Ben Jeffery , Timothy R Millar

Background: Variant Call Format (VCF) is the standard file format for interchanging genetic variation data and associated quality control metrics. The usual row-wise encoding of the VCF data model (either as text or packed binary) emphasizes efficient retrieval of all data for a given variant, but accessing data on a field or sample basis is inefficient. The Biobank-scale datasets currently available consist of hundreds of thousands of whole genomes and hundreds of terabytes of compressed VCF.

View Article and Find Full Text PDF

Similar Publications