Publications by Jared O'Connell | LitMetric

Publications by authors named "Jared O'Connell"

Page 1 of 1

Phasing millions of samples achieves near perfect accuracy, enabling parent-of-origin analyses.

Cole M Williams , Jared O'Connell , Ethan Jewett , William A Freyman ,

HGG Adv

July 2025

Haplotype phasing, the process of determining which genetic variants are physically located on the same chromosome, is crucial for genetic analyses. Here, we benchmark SHAPEIT and Beagle, two state-of-the-art phasing methods, on two large datasets: >8 million research-consented 23andMe, Inc. customers and the UK Biobank (UKB).

View Article and Find Full Text PDF

Phasing millions of samples achieves near perfect accuracy, enabling parent-of-origin classification of variants.

Cole M Williams , Jared O'Connell , William A Freyman , , Christopher R Gignoux

bioRxiv

May 2024

Haplotype phasing, the process of determining which genetic variants are physically located on the same chromosome, is crucial for various genetic analyses. In this study, we first benchmark SHAPEIT and Beagle, two state-of-the-art phasing methods, on two large datasets: > 8 million diverse, research-consented 23andMe, Inc. customers and the UK Biobank (UKB).

View Article and Find Full Text PDF

An ensemble penalized regression method for multi-ancestry polygenic risk prediction.

Jingning Zhang , Jianan Zhan , Jin Jin , Cheng Ma , Ruzhang Zhao , Jared O'Connell

Nat Commun

April 2024

Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on enSemble of PEnalized Regression models (PROSPER).

View Article and Find Full Text PDF

MUSSEL: Enhanced Bayesian polygenic risk prediction leveraging information across multiple ancestry groups.

Jin Jin , Jianan Zhan , Jingning Zhang , Ruzhang Zhao , Jared O'Connell

Cell Genom

April 2024

Polygenic risk scores (PRSs) are now showing promising predictive performance on a wide variety of complex traits and diseases, but there exists a substantial performance gap across populations. We propose MUSSEL, a method for ancestry-specific polygenic prediction that borrows information in summary statistics from genome-wide association studies (GWASs) across multiple ancestry groups via Bayesian hierarchical modeling and ensemble learning. In our simulation studies and data analyses across four distinct studies, totaling 5.

View Article and Find Full Text PDF

Influence of autozygosity on common disease risk across the phenotypic spectrum.

Daniel S Malawsky , Eva van Walree , Benjamin M Jacobs , Teng Hiang Heng , Qin Qin Huang , Jared O'Connell

Cell

October 2023

Autozygosity is associated with rare Mendelian disorders and clinically relevant quantitative traits. We investigated associations between the fraction of the genome in runs of homozygosity (F) and common diseases in Genes & Health (n = 23,978 British South Asians), UK Biobank (n = 397,184), and 23andMe. We show that restricting analysis to offspring of first cousins is an effective way of reducing confounding due to social/environmental correlates of F.

View Article and Find Full Text PDF

A new method for multiancestry polygenic prediction improves performance across diverse populations.

Haoyu Zhang , Jianan Zhan , Jin Jin , Jingning Zhang , Wenxuan Lu , Jared O'Connell

Nat Genet

October 2023

Polygenic risk scores (PRSs) increasingly predict complex traits; however, suboptimal performance in non-European populations raise concerns about clinical applications and health inequities. We developed CT-SLEB, a powerful and scalable method to calculate PRSs, using ancestry-specific genome-wide association study summary statistics from multiancestry training samples, integrating clumping and thresholding, empirical Bayes and superlearning. We evaluated CT-SLEB and nine alternative methods with large-scale simulated genome-wide association studies (~19 million common variants) and datasets from 23andMe, Inc.

View Article and Find Full Text PDF

MUSSEL: Enhanced Bayesian Polygenic Risk Prediction Leveraging Information across Multiple Ancestry Groups.

Jin Jin , Jianan Zhan , Jingning Zhang , Ruzhang Zhao , Jared O'Connell

bioRxiv

September 2023

Polygenic risk scores (PRS) are now showing promising predictive performance on a wide variety of complex traits and diseases, but there exists a substantial performance gap across different populations. We propose MUSSEL, a method for ancestry-specific polygenic prediction that borrows information in the summary statistics from genome-wide association studies (GWAS) across multiple ancestry groups. MUSSEL conducts Bayesian hierarchical modeling under a MUltivariate Spike-and-Slab model for effect-size distribution and incorporates an Ensemble Learning step using super learner to combine information across different tuning parameter settings and ancestry groups.

View Article and Find Full Text PDF

An Ensemble Penalized Regression Method for Multi-ancestry Polygenic Risk Prediction.

Jingning Zhang , Jianan Zhan , Jin Jin , Cheng Ma , Ruzhang Zhao , Jared O'Connell

bioRxiv

April 2024

Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on enSemble of PEnalized Regression models (PROSPER).

View Article and Find Full Text PDF

A population-specific reference panel for improved genotype imputation in African Americans.

Jared O'Connell , Taedong Yun , Meghan Moreno , Helen Li , Nadia Litterman

Commun Biol

November 2021

There is currently a dearth of accessible whole genome sequencing (WGS) data for individuals residing in the Americas with Sub-Saharan African ancestry. We generated whole genome sequencing data at intermediate (15×) coverage for 2,294 individuals with large amounts of Sub-Saharan African ancestry, predominantly Atlantic African admixed with varying amounts of European and American ancestry. We performed extensive comparisons of variant callers, phasing algorithms, and variant filtration on these data to construct a high quality imputation panel containing data from 2,269 unrelated individuals.

View Article and Find Full Text PDF

Nuclear genome-wide associations with mitochondrial heteroplasmy.

Priyanka Nandakumar , Chao Tian , Jared O'Connell , , David Hinds

Sci Adv

March 2021

The role of the nuclear genome in maintaining the stability of the mitochondrial genome (mtDNA) is incompletely known. mtDNA sequence variants can exist in a state of heteroplasmy, which denotes the coexistence of organellar genomes with different sequences. Heteroplasmic variants that impair mitochondrial capacity cause disease, and the state of heteroplasmy itself is deleterious.

View Article and Find Full Text PDF

Tracking human population structure through time from whole genome sequences.

Ke Wang , Iain Mathieson , Jared O'Connell , Stephan Schiffels

PLoS Genet

March 2020

The genetic diversity of humans, like many species, has been shaped by a complex pattern of population separations followed by isolation and subsequent admixture. This pattern, reaching at least as far back as the appearance of our species in the paleontological record, has left its traces in our genomes. Reconstructing a population's history from these traces is a challenging problem.

View Article and Find Full Text PDF

The UK Biobank resource with deep phenotyping and genomic data.

Clare Bycroft , Colin Freeman , Desislava Petkova , Gavin Band , Lloyd T Elliott , Jared O'Connell

Nature

October 2018

The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain.

View Article and Find Full Text PDF

The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease.

William J Astle , Heather Elding , Tao Jiang , Dave Allen , Dace Ruklisa , Jared O'Connell

Cell

November 2016

Many common variants have been associated with hematological traits, but identification of causal genes and pathways has proven challenging. We performed a genome-wide association analysis in the UK Biobank and INTERVAL studies, testing 29.5 million genetic variants for association with 36 red cell, white cell, and platelet properties in 173,480 European-ancestry participants.

View Article and Find Full Text PDF

AKT: ancestry and kinship toolkit.

Rudy Arthur , Ole Schulz-Trieglaff , Anthony J Cox , Jared O'Connell

Bioinformatics

January 2017

Motivation: Ancestry and Kinship Toolkit (AKT) is a statistical genetics tool for analysing large cohorts of whole-genome sequenced samples. It can rapidly detect related samples, characterize sample ancestry, calculate correlation between variants, check Mendel consistency and perform data clustering. AKT brings together the functionality of many state-of-the-art methods, with a focus on speed and a unified interface.

View Article and Find Full Text PDF

Haplotype estimation for biobank-scale data sets.

Jared O'Connell , Kevin Sharp , Nick Shrine , Louise Wain , Ian Hall

Nat Genet

July 2016

The UK Biobank (UKB) has recently released genotypes on 152,328 individuals together with extensive phenotypic and lifestyle information. We present a new phasing method, SHAPEIT3, that can handle such biobank-scale data sets and results in switch error rates as low as ∼0.3%.

View Article and Find Full Text PDF

Rapid genotype refinement for whole-genome sequencing data using multi-variate normal distributions.

Rudy Arthur , Jared O'Connell , Ole Schulz-Trieglaff , Anthony J Cox

Bioinformatics

August 2016

Motivation: Whole-genome low-coverage sequencing has been combined with linkage-disequilibrium (LD)-based genotype refinement to accurately and cost-effectively infer genotypes in large cohorts of individuals. Most genotype refinement methods are based on hidden Markov models, which are accurate but computationally expensive. We introduce an algorithm that models LD using a simple multivariate Gaussian distribution.

View Article and Find Full Text PDF

Novel insights into the genetics of smoking behaviour, lung function, and chronic obstructive pulmonary disease (UK BiLEVE): a genetic association study in UK Biobank.

Louise V Wain , Nick Shrine , Suzanne Miller , Victoria E Jackson , Ioanna Ntalla , Jared O'Connell

Lancet Respir Med

October 2015

Background: Understanding the genetic basis of airflow obstruction and smoking behaviour is key to determining the pathophysiology of chronic obstructive pulmonary disease (COPD). We used UK Biobank data to study the genetic causes of smoking behaviour and lung health.

Methods: We sampled individuals of European ancestry from UK Biobank, from the middle and extremes of the forced expiratory volume in 1 s (FEV1) distribution among heavy smokers (mean 35 pack-years) and never smokers.

View Article and Find Full Text PDF

Multicohort analysis of the maternal age effect on recombination.

Hilary C Martin , Ryan Christ , Julie G Hussin , Jared O'Connell , Scott Gordon

Nat Commun

August 2015

Several studies have reported that the number of crossovers increases with maternal age in humans, but others have found the opposite. Resolving the true effect has implications for understanding the maternal age effect on aneuploidies. Here, we revisit this question in the largest sample to date using single nucleotide polymorphism (SNP)-chip data, comprising over 6,000 meioses from nine cohorts.

View Article and Find Full Text PDF

NxRepair: error correction in de novo sequence assembly using Nextera mate pairs.

Rebecca R Murphy , Jared O'Connell , Anthony J Cox , Ole Schulz-Trieglaff

PeerJ

June 2015

Scaffolding errors and incorrect repeat disambiguation during de novo assembly can result in large scale misassemblies in draft genomes. Nextera mate pair sequencing data provide additional information to resolve assembly ambiguities during scaffolding. Here, we introduce NxRepair, an open source toolkit for error correction in de novo assemblies that uses Nextera mate pair libraries to identify and correct large-scale errors.

View Article and Find Full Text PDF

NxTrim: optimized trimming of Illumina mate pair reads.

Jared O'Connell , Ole Schulz-Trieglaff , Emma Carlson , Matthew M Hims , Niall A Gormley

Bioinformatics

June 2015

Motivation: Mate pair protocols add to the utility of paired-end sequencing by boosting the genomic distance spanned by each pair of reads, potentially allowing larger repeats to be bridged and resolved. The Illumina Nextera Mate Pair (NMP) protocol uses a circularization-based strategy that leaves behind 38-bp adapter sequences, which must be computationally removed from the data. While 'adapter trimming' is a well-studied area of bioinformatics, existing tools do not fully exploit the particular properties of NMP data and discard more data than is necessary.

View Article and Find Full Text PDF

A general approach for haplotype phasing across the full spectrum of relatedness.

Jared O'Connell , Deepti Gurdasani , Olivier Delaneau , Nicola Pirastu , Sheila Ulivi

PLoS Genet

April 2014

Many existing cohorts contain a range of relatedness between genotyped individuals, either by design or by chance. Haplotype estimation in such cohorts is a central step in many downstream analyses. Using genotypes from six cohorts from isolated populations and two cohorts from non-isolated populations, we have investigated the performance of different phasing methods designed for nominally 'unrelated' individuals.

View Article and Find Full Text PDF

Joint genotype calling with array and sequence data.

Jared O'Connell , Jonathan Marchini

Genet Epidemiol

September 2012

Analysis of rare variants is currently a major focus of genetic studies of human disease. Single-nucleotide polymorphism (SNP) genotypes can be assayed using microarray genotyping or by sequencing, but neither technology produces perfect genotype calls, especially at rare SNPs. Studies that collect both types of data are becoming increasingly common, so it may be possible to combine data types to increase accuracy.

View Article and Find Full Text PDF