Unsupervised discovery of ancestry-informative markers and genetic admixture proportions in biobank-scale datasets.

Am J Hum Genet

Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Statistics, University of California, Los Angeles, Los Angeles, CA 90095, USA.

Published: February 2023


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Admixture estimation plays a crucial role in ancestry inference and genome-wide association studies (GWASs). Computer programs such as ADMIXTURE and STRUCTURE are commonly employed to estimate the admixture proportions of sample individuals. However, these programs can be overwhelmed by the computational burdens imposed by the 10 to 10 samples and millions of markers commonly found in modern biobanks. An attractive strategy is to run these programs on a set of ancestry-informative SNP markers (AIMs) that exhibit substantially different frequencies across populations. Unfortunately, existing methods for identifying AIMs require knowing ancestry labels for a subset of the sample. This supervised learning approach creates a chicken and the egg scenario. In this paper, we present an unsupervised, scalable framework that seamlessly carries out AIM selection and likelihood-based estimation of admixture proportions. Our simulated and real data examples show that this approach is scalable to modern biobank datasets. OpenADMIXTURE, our Julia implementation of the method, is open source and available for free.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9943729PMC
http://dx.doi.org/10.1016/j.ajhg.2022.12.008DOI Listing

Publication Analysis

Top Keywords

admixture proportions
8
admixture
5
unsupervised discovery
4
discovery ancestry-informative
4
ancestry-informative markers
4
markers genetic
4
genetic admixture
4
proportions biobank-scale
4
biobank-scale datasets
4
datasets admixture
4

Similar Publications

The significant global energy consumption strongly emphasizes the crucial role of net-zero or green structures in ensuring a sustainable future. Considering this aspect, incorporating thermal insulation materials into building components is a well-accepted method that helps to enhance thermal comfort in buildings. Furthermore, integrating architectural components made from solid refuse materials retrieved from the environment can have significant environmental benefits.

View Article and Find Full Text PDF

Feral pigs may serve as a valuable genetic resource for the future, offering potential interesting gene pool for adaptation to climate change and the preservation of biodiversity. The main objective of this study was to identify the genetic structure of feral pigs from the Caribbean island of Martinique, measure the inbreeding rate of a Creole population re-domesticated in 2016 from captured feral pigs, and evaluate its evolution to the present day. We hypothesized that feral pigs, like Creole breeds of the Americas, have been shaped by a unique cross-breeding process linked to the historical context of the Caribbean.

View Article and Find Full Text PDF

Background And Objectives: African American individuals have a higher risk of Alzheimer disease (AD) and related dementia (ADRD) than non-Hispanic White individuals. Some cross-sectional studies with self-reported race and ethnicity have reported racial differences in circulating ADRD biomarkers, including phosphorylated tau181 (p-Tau181), glial fibrillary acidic protein (GFAP), and neurofilament light (NfL). We aimed to examine the associations of genetically inferred African ancestry proportion with the longitudinal changes in these biomarkers and to evaluate the associations of previously identified ADRD-related genetic factors in European cohorts with these biomarkers in an African American cohort.

View Article and Find Full Text PDF

Background: Parenteral nutrition is a crucial clinical therapy, yet its inappropriate use is widespread. Pharmacists play an irreplaceable role in the rational application of drugs. This study assessed the impact of a pharmacist-led improvement project on optimizing the rational use of parenteral nutrition drugs.

View Article and Find Full Text PDF

Population genomic analyses rely on an accurate and unbiased characterization of the genetic composition of the studied population. For short-read, high-throughput sequencing data, mapping sequencing reads to a linear reference genome can bias population genetic inference due to mismatches in reads carrying non-reference alleles. In this study, we investigate the impact of mapping bias on allele frequency estimates from pseudohaploid data and genotype likelihoods, two approaches commonly used in ultra-low to medium coverage sequencing.

View Article and Find Full Text PDF