98%
921
2 minutes
20
The UK Biobank (UKB) has recently released genotypes on 152,328 individuals together with extensive phenotypic and lifestyle information. We present a new phasing method, SHAPEIT3, that can handle such biobank-scale data sets and results in switch error rates as low as ∼0.3%. The method exhibits O(NlogN) scaling with sample size N, enabling fast and accurate phasing of even larger cohorts.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4926957 | PMC |
http://dx.doi.org/10.1038/ng.3583 | DOI Listing |
bioRxiv
August 2025
Department of Computational Biology, Cornell University, Ithaca, NY.
Motivation: The Genotype Representation Graph (GRG) [DeHaas et al., 2025] is a graph representation of whole genome polymorphisms, designed to encode the variant hard-call information in phased whole genomes. It encodes the genotypes as an extremely compact graph that can be traversed efficiently, enabling dynamic programming-style algorithms on applications such as genome-wide association studies that run faster on biobank-scale data than existing alternatives.
View Article and Find Full Text PDFNat Commun
August 2025
Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA.
Alzheimer's disease and related dementias (AD/ADRDs) pose a significant global public health challenge. To effectively implement personalized therapeutic interventions on a global scale, it is essential to identify disease-causing, risk, and resilience factors across diverse ancestral backgrounds. This study leveraged biobank-scale data to conduct a large multi-ancestry whole-genome sequencing characterization of AD/ADRDs.
View Article and Find Full Text PDFWhite matter hyperintensities (WMH) are covert magnetic resonance imaging (MRI) - markers of microvascular dysfunction and are primary vascular contributors to dementia, emphasizing its importance in prevention strategies. Here, we integrate gene expression and protein levels measured across plasma, cerebrospinal fluid (CSF), brain and multiple other tissues from population-based and biobank-scale data to triangulate druggable genes influencing WMH-burden and Alzheimer's disease (AD) and to map their spatial localization specifically in brain-cell types. Lowering the expression levels of and shows putative causal associations with reduced WMH-burden, and AD risk.
View Article and Find Full Text PDFTrends Genet
September 2025
Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Genomics Preprint Club, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA. Electronic address: c
The scale and granularity of biobanks are reshaping the future of immunogenetics, enabling breakthroughs at a pace never before possible. Here, we highlight two recent preprints that apply novel statistical methods to biobank data to understand how inherited variants shape immune traits and disease risk.
View Article and Find Full Text PDFNPJ Digit Med
August 2025
Department of Biomedical Informatics, Columbia University, New York City, NY, USA.
Biobanks are a rich source of data for genome-wide association studies (GWAS). They store clinical data from electronic health records, with data domains such as laboratory measurements, conditions, and self-reported diagnoses. Traditionally, biobank GWAS utilize case-control cohorts built exclusively from conditions.
View Article and Find Full Text PDF