Publications by authors named "Jonathan Marchini"

Understanding genetic differences between populations is essential for avoiding confounding in genome-wide association studies and improving polygenic score (PGS) portability. We developed a statistical pipeline to infer fine-scale Ancestry Components and applied it to UK Biobank data. Ancestry Components identify population structure not captured by widely used principal components, improving stratification correction for geographically correlated traits.

View Article and Find Full Text PDF

Gene-based burden tests are a popular and powerful approach for analysis of exome-wide association studies. These approaches combine sets of variants within a gene into a single burden score that is then tested for association. Typically, a range of burden scores are calculated and tested across a range of annotation classes and frequency bins.

View Article and Find Full Text PDF

Whole-genome sequencing (WGS), whole-exome sequencing (WES) and array genotyping with imputation (IMP) are common strategies for assessing genetic variation and its association with medically relevant phenotypes. To date, there has been no systematic empirical assessment of the yield of these approaches when applied to hundreds of thousands of samples to enable the discovery of complex trait genetic signals. Using data for 100 complex traits from 149,195 individuals in the UK Biobank, we systematically compare the relative yield of these strategies in genetic association studies.

View Article and Find Full Text PDF
Article Synopsis
  • The genetic factors contributing to stroke risk in South Asians remain largely unstudied, with a recent study examining 75,000 Pakistanis using exome-wide sequencing.
  • A specific genetic variant, NOTCH3 p.Arg1231Cys, was found to be more common in South Asians (0.58%) compared to Western Europeans (0.019%) and was significantly linked to hemorrhagic and overall stroke risk.
  • This variant accounts for about 2.0% of hemorrhagic strokes and 1.1% of all strokes in South Asians, emphasizing the importance of including diverse populations in genetic research for better understanding and treatment of stroke.
View Article and Find Full Text PDF

We built a reference panel with 342 million autosomal variants using 78,195 individuals from the Genomics England (GEL) dataset, achieving a phasing switch error rate of 0.18% for European samples and imputation quality of r = 0.75 for variants with minor allele frequencies as low as 2 × 10 in white British samples.

View Article and Find Full Text PDF
Article Synopsis
  • COVID-19 and influenza are respiratory illnesses caused by different viruses but share some symptoms and clinical risk factors, yet their genetic connections remain poorly understood.
  • A study involving over 18,000 influenza cases and nearly 276,000 control subjects found no common genetic risk factors between COVID-19 and influenza, revealing specific gene variants linked only to influenza.
  • The research highlights the potential for targeting cell surface receptors involved in viral entry, showing that manipulating specific genes could lead to treatments that prevent both COVID-19 and influenza infections.
View Article and Find Full Text PDF
Article Synopsis
  • Researchers analyzed genetic data from nearly 1 million individuals to create a comprehensive catalogue of human protein-coding variations, shedding light on gene function and the frequency of rare coding variants.
  • The study identified over 10 million missense and 1.1 million loss-of-function variants, discovering 1,751 novel genes with rare biallelic loss-of-function variants and 3,988 genes intolerant to these variants.
  • They estimate that 3% of people carry a clinically significant genetic variant and provide public access to their data to enhance genetic interpretation and support precision medicine.
View Article and Find Full Text PDF
Article Synopsis
  • The Mexico City Prospective Study is a large-scale research initiative involving over 150,000 adults from urban areas in Mexico City, aimed at understanding genetic diversity and ancestry.
  • The study reveals a mix of Indigenous American, European, and African ancestries among participants, highlighting significant genetic differences and a unique genetic landscape within the Indigenous Mexican population.
  • Researchers created a valuable reference panel for genetic research, improving the accuracy of studying genetic variants in populations with high Indigenous ancestry, and providing essential resources for future genetic studies in both Mexico and the US.
View Article and Find Full Text PDF

Human genetic studies of smoking behavior have been thus far largely limited to common variants. Studying rare coding variants has the potential to identify drug targets. We performed an exome-wide association study of smoking phenotypes in up to 749,459 individuals and discovered a protective association in CHRNB2, encoding the β2 subunit of the α4β2 nicotine acetylcholine receptor.

View Article and Find Full Text PDF

Coding variants that have significant impact on function can provide insights into the biology of a gene but are typically rare in the population. Identifying and ascertaining the frequency of such rare variants requires very large sample sizes. Here, we present the largest catalog of human protein-coding variation to date, derived from exome sequencing of 985,830 individuals of diverse ancestry to serve as a rich resource for studying rare coding variants.

View Article and Find Full Text PDF

Clonal haematopoiesis involves the expansion of certain blood cell lineages and has been associated with ageing and adverse health outcomes. Here we use exome sequence data on 628,388 individuals to identify 40,208 carriers of clonal haematopoiesis of indeterminate potential (CHIP). Using genome-wide and exome-wide association analyses, we identify 24 loci (21 of which are novel) where germline genetic variation influences predisposition to CHIP, including missense variants in the lymphocytic antigen coding gene LY75, which are associated with reduced incidence of CHIP.

View Article and Find Full Text PDF

Background: There is large individual variation in both clinical presentation and progression between Parkinson's disease patients. Generation of deeply and longitudinally phenotyped patient cohorts has enormous potential to identify disease subtypes for prognosis and therapeutic targeting.

Methods: Replicating across three large Parkinson's cohorts (Oxford Discovery cohort (n = 842)/Tracking UK Parkinson's study (n = 1807) and Parkinson's Progression Markers Initiative (n = 472)) with clinical observational measures collected longitudinally over 5-10 years, we developed a Bayesian multiple phenotypes mixed model incorporating genetic relationships between individuals able to explain many diverse clinical measurements as a smaller number of continuous underlying factors ("phenotypic axes").

View Article and Find Full Text PDF

Body fat distribution is a major, heritable risk factor for cardiometabolic disease, independent of overall adiposity. Using exome-sequencing in 618,375 individuals (including 160,058 non-Europeans) from the UK, Sweden and Mexico, we identify 16 genes associated with fat distribution at exome-wide significance. We show 6-fold larger effect for fat-distribution associated rare coding variants compared with fine-mapped common alleles, enrichment for genes expressed in adipose tissue and causal genes for partial lipodystrophies, and evidence of sex-dimorphism.

View Article and Find Full Text PDF

Background: Exome sequencing in hundreds of thousands of persons may enable the identification of rare protein-coding genetic variants associated with protection from human diseases like liver cirrhosis, providing a strategy for the discovery of new therapeutic targets.

Methods: We performed a multistage exome sequencing and genetic association analysis to identify genes in which rare protein-coding variants were associated with liver phenotypes. We conducted in vitro experiments to further characterize associations.

View Article and Find Full Text PDF

To better understand the genetics of hearing loss, we performed a genome-wide association meta-analysis with 125,749 cases and 469,497 controls across five cohorts. We identified 53/c loci affecting hearing loss risk, including common coding variants in COL9A3 and TMPRSS3. Through exome sequencing of 108,415 cases and 329,581 controls, we observed rare coding associations with 11 Mendelian hearing loss genes, including additive effects in known hearing loss genes GJB2 (Gly12fs; odds ratio [OR] = 1.

View Article and Find Full Text PDF
Article Synopsis
  • A genome-wide association study identified a genetic variant (rs190509934) that reduces ACE2 expression by 37% and lowers the risk of SARS-CoV-2 infection by 40%.
  • The study confirms six previously known genetic risk variants, with four linked to worse outcomes in COVID-19 infected individuals.
  • A risk score based on common variants was developed, which improves prediction of severe disease beyond just demographic and clinical factors.
View Article and Find Full Text PDF

A major goal in human genetics is to use natural variation to understand the phenotypic consequences of altering each protein-coding gene in the genome. Here we used exome sequencing to explore protein-altering variants and their consequences in 454,787 participants in the UK Biobank study. We identified 12 million coding variants, including around 1 million loss-of-function and around 1.

View Article and Find Full Text PDF

We present a comprehensive statistical framework to analyze data from genome-wide association studies of polygenic traits, producing interpretable findings while controlling the false discovery rate. In contrast with standard approaches, our method can leverage sophisticated multivariate algorithms but makes no parametric assumptions about the unknown relation between genotypes and phenotype. Instead, we recognize that genotypes can be considered as a random sample from an appropriate model, encapsulating our knowledge of genetic inheritance and human populations.

View Article and Find Full Text PDF
Article Synopsis
  • Large-scale sequencing of 645,626 individuals' exomes identified rare protein-coding variants linked to body mass index (BMI) and obesity.
  • Researchers found 16 significant genes associated with BMI, particularly noting certain G protein-coupled receptors.
  • The study revealed that variants in one gene correlated with lower BMI and reduced obesity risk, and experiments in mice showed that inhibiting this gene could prevent weight gain.
View Article and Find Full Text PDF

Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) causes coronavirus disease 2019 (COVID-19), a respiratory illness that can result in hospitalization or death. We used exome sequence data to investigate associations between rare genetic variants and seven COVID-19 outcomes in 586,157 individuals, including 20,952 with COVID-19. After accounting for multiple testing, we did not identify any clear associations with rare variants either exome wide or when specifically focusing on (1) 13 interferon pathway genes in which rare deleterious variants have been reported in individuals with severe COVID-19, (2) 281 genes located in susceptibility loci identified by the COVID-19 Host Genetics Initiative, or (3) 32 additional genes of immunologic relevance and/or therapeutic potential.

View Article and Find Full Text PDF

Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory.

View Article and Find Full Text PDF