The broader application of polygenic risk score (PRS) is hindered by the limited transferability of PRS developed in Europeans to non-European populations. While many statistical methods have been developed to improve the performance of PRS in non-European populations, most of them focused on discrete genetic ancestry clusters and did not consider admixed individuals. Admixed individuals pose a unique challenge for PRS calculation due to the complexity of local ancestry and cross-ancestry effect sizes.
View Article and Find Full Text PDFMethylome-wide association studies (MWASs) have identified many 5'-cytosine-phosphate-guanine-3' (CpG) sites associated with complex traits. Several methods have been developed to predict CpG methylation levels from genotypes when the direct measurements of methylation are unavailable. To date, the published methods have mostly used datasets from populations of European ancestry to train prediction models for methylations, which limits the generalizability of methylome-wide association study to non-European populations.
View Article and Find Full Text PDFMany multi-population polygenic risk score (PRS) methods have been proposed to improve prediction accuracy in underrepresented populations; however, no single method outperforms other methods across all data scenarios. Although integrating PRS results across multiple methods and populations may lead to more accurate predictions, this approach may be limited by the availability of individual-level tuning data to calculate combination weights. In this manuscript, we introduce MIXPRS, a robust PRS integration framework based on data fission principles, to effectively combine multiple multi-population PRS methods using only genome-wide association study (GWAS) summary statistics from multiple populations.
View Article and Find Full Text PDFBackground: Polygenic scores (PGSs) have shown promise in predicting disease risk, but their predictive accuracy remains limited for many complex diseases. Leveraging the shared genetic architecture among correlated traits may improve prediction performance.
Methods: We developed a flexible framework for constructing multi-trait PGSs by integrating candidate PGSs (N=2,651) derived from publicly available GWAS summary statistics (N=51)-using single-trait, MTAG-all, and MTAG-pairwise approaches.
Genetic risk prediction for non-European populations is hindered by limited Genome-Wide Association Study (GWAS) sample sizes and small tuning datasets. We propose JointPRS, a data-adaptive framework that leverages genetic correlations across multiple populations using GWAS summary statistics. It achieves accurate predictions without individual-level tuning data and remains effective in the presence of a small tuning set thanks to its data-adaptive approach.
View Article and Find Full Text PDFPolygenic risk score has become increasingly popular for predicting the value of complex traits. In many settings, polygenic risk score is used as a covariate in regression analysis to study the association between different phenotypes. However, measurement error in polygenic risk score causes attenuation bias in the estimation of regression coefficients.
View Article and Find Full Text PDFTransl Psychiatry
January 2025
Brain anatomy plays a key role in complex behaviors and mental disorders that are sexually divergent. While our understanding of the sex differences in the brain anatomy remains relatively limited, particularly of the underlying genetic and molecular mechanisms that contribute to these differences. We performed the largest study of sex differences in brain volumes (N = 33,208) by examining sex differences both in the raw brain volumes and after controlling the whole brain volumes.
View Article and Find Full Text PDFHum Genomics
March 2024
With the development of next-generation sequencing technology, de novo variants (DNVs) with deleterious effects can be identified and investigated for their effects on birth defects such as congenital heart disease (CHD). However, statistical power is still limited for such studies because of the small sample size due to the high cost of recruiting and sequencing samples and the low occurrence of DNVs. DNV analysis is further complicated by genetic heterogeneity across diseased individuals.
View Article and Find Full Text PDFPolygenic scores (PGSs) are quantitative metrics for predicting phenotypic values, such as human height or disease status. Some PGS methods require only summary statistics of a relevant genome-wide association study (GWAS) for their score. One such method is Lassosum, which inherits the model selection advantages of Lasso to select a meaningful subset of the GWAS single-nucleotide polymorphisms as predictors from their association statistics.
View Article and Find Full Text PDFJNCI Cancer Spectr
February 2024
Background: Models with polygenic risk scores and clinical factors to predict risk of different cancers have been developed, but these models have been limited by the polygenic risk score-derivation methods and the incomplete selection of clinical variables.
Methods: We used UK Biobank to train the best polygenic risk scores for 8 cancers (bladder, breast, colorectal, kidney, lung, ovarian, pancreatic, and prostate cancers) and select relevant clinical variables from 733 baseline traits through extreme gradient boosting (XGBoost). Combining polygenic risk scores and clinical variables, we developed Cox proportional hazards models for risk prediction in these cancers.
The disparity in genetic risk prediction accuracy between European and non-European individuals highlights a critical challenge in health inequality. To bridge this gap, we introduce JointPRS, a novel method that models multiple populations jointly to improve genetic risk predictions for non-European individuals. JointPRS has three key features.
View Article and Find Full Text PDFPolygenic risk score (PRS) has become increasingly popular for predicting the value of complex traits. In many settings, PRS is used as a covariate in regression analysis to study the association between different phenotypes. However, measurement error in PRS causes attenuation bias in the estimation of regression coefficients.
View Article and Find Full Text PDFGenetic prediction accuracy for non-European populations is hindered by the limited sample size of Genome-wide association studies (GWAS) data in these populations. Additionally, it is challenging to tune model parameters with a small tuning dataset for methods that require tuning data, which is often the case for non-European samples. To address these challenges, we propose JointPRS, a novel, data-adaptive framework that simultaneously models multiple populations using GWAS summary statistics.
View Article and Find Full Text PDFBackground: A large proportion of pulmonary embolism (PE) heritability remains unexplained, particularly among the East Asian (EAS) population. Our study aims to expand the genetic architecture of PE and reveal more genetic determinants in Han Chinese.
Methods: We conducted the first genome-wide association study (GWAS) of PE in Han Chinese, then performed the GWAS meta-analysis based on the discovery and replication stages.
Most existing TWAS tools require individual-level eQTL reference data and thus are not applicable to summary-level reference eQTL datasets. The development of TWAS methods that can harness summary-level reference data is valuable to enable TWAS in broader settings and enhance power due to increased reference sample size. Thus, we develop a TWAS framework called OTTERS (Omnibus Transcriptome Test using Expression Reference Summary data) that adapts multiple polygenic risk score (PRS) methods to estimate eQTL weights from summary-level eQTL reference data and conducts an omnibus TWAS.
View Article and Find Full Text PDFAm J Hum Genet
January 2023
Polygenic risk score (PRS) has demonstrated its great utility in biomedical research through identifying high-risk individuals for different diseases from their genotypes. However, the broader application of PRS to the general population is hindered by the limited transferability of PRS developed in Europeans to non-European populations. To improve PRS prediction accuracy in non-European populations, we develop a statistical method called SDPRX that can effectively integrate genome wide association study summary statistics from different populations.
View Article and Find Full Text PDFPLoS Genet
October 2022
Genome wide association studies (GWAS) can play an essential role in understanding genetic basis of complex traits in plants and animals. Conventional SNP-based linear mixed models (LMM) that marginally test single nucleotide polymorphisms (SNPs) have successfully identified many loci with major and minor effects in many GWAS. In plant, the relatively small population size in GWAS and the high genetic diversity found in many plant species can impede mapping efforts on complex traits.
View Article and Find Full Text PDFAlthough there are pronounced sex differences for psychiatric disorders, relatively little has been published on the heterogeneity of sex-specific genetic effects for these traits until very recently for adults. Much less is known about children because most psychiatric disorders will not manifest until later in life and existing studies for children on psychiatric traits such as cognitive functions are underpowered. We used results from publicly available genome-wide association studies for six psychiatric disorders and individual-level data from the Adolescent Brain Cognitive Development (ABCD) study and the UK Biobank (UKB) study to evaluate the associations between the predicted polygenic risk scores (PRS) of these six disorders and observed cognitive functions, behavioral and brain imaging traits.
View Article and Find Full Text PDFGenetic prediction of complex traits has great promise for disease prevention, monitoring, and treatment. The development of accurate risk prediction models is hindered by the wide diversity of genetic architecture across different traits, limited access to individual level data for training and parameter tuning, and the demand for computational resources. To overcome the limitations of the most existing methods that make explicit assumptions on the underlying genetic architecture and need a separate validation data set for parameter tuning, we develop a summary statistics-based nonparametric method that does not rely on validation datasets to tune parameters.
View Article and Find Full Text PDFRing chromosomes occur when the ends of normally rod-shaped chromosomes fuse. In ring chromosome 20 (ring 20), intellectual disability and epilepsy are usually present, even if there is no deleted coding material; the mechanism by which individuals with complete ring chromosomes develop seizures and other phenotypic abnormalities is not understood. We investigated altered gene transcription as a contributing factor by performing RNA-sequencing (RNA-seq) analysis on blood from seven patients with ring 20, and 11 first-degree relatives (all parents).
View Article and Find Full Text PDFPLoS Comput Biol
November 2020
To increase statistical power to identify genes associated with complex traits, a number of transcriptome-wide association study (TWAS) methods have been proposed using gene expression as a mediating trait linking genetic variations and diseases. These methods first predict expression levels based on inferred expression quantitative trait loci (eQTLs) and then identify expression-mediated genetic effects on diseases by associating phenotypes with predicted expression levels. The success of these methods critically depends on the identification of eQTLs, which may not be functional in the corresponding tissue, due to linkage disequilibrium (LD) and the correlation of gene expression between tissues.
View Article and Find Full Text PDFBackground: We did a phase 2 trial of pembrolizumab in patients with non-small-cell lung cancer (NSCLC) or melanoma with untreated brain metastases to determine the activity of PD-1 blockade in the CNS. Interim results were previously published, and we now report an updated analysis of the full NSCLC cohort.
Methods: This was an open-label, phase 2 study of patients from the Yale Cancer Center (CT, USA).
Purpose: Limb-girdle muscular dystrophies (LGMD) are a genetically heterogeneous category of autosomal inherited muscle diseases. Many genes causing LGMD have been identified, and clinical trials are beginning for treatment of some genetic subtypes. However, even with the gene-level mechanisms known, it is still difficult to get a robust and generalizable prevalence estimation for each subtype due to the limited amount of epidemiology data and the low incidence of LGMDs.
View Article and Find Full Text PDFColorectal cancer (CRC) is among the most frequently occurring cancers worldwide. Baicalin is isolated from the roots of Scutellaria baicalensis and is its dominant flavonoid. Anticancer activity of baicalin has been evaluated in different types of cancers, especially in CRC.
View Article and Find Full Text PDF