Rapid genotype refinement for whole-genome sequencing data using multi-variate normal distributions.

Bioinformatics

Illumina Cambridge Ltd, Chesterford Research Park, Little Chesterford, Essex CB10 1XL, UK.

Published: August 2016


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Motivation: Whole-genome low-coverage sequencing has been combined with linkage-disequilibrium (LD)-based genotype refinement to accurately and cost-effectively infer genotypes in large cohorts of individuals. Most genotype refinement methods are based on hidden Markov models, which are accurate but computationally expensive. We introduce an algorithm that models LD using a simple multivariate Gaussian distribution. The key feature of our algorithm is its speed.

Results: Our method is hundreds of times faster than other methods on the same data set and its scaling behaviour is linear in the number of samples. We demonstrate the performance of the method on both low- and high-coverage samples.

Availability And Implementation: The source code is available at https://github.com/illumina/marvin

Contact: rarthur@illumina.com

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btw097DOI Listing

Publication Analysis

Top Keywords

genotype refinement
12
rapid genotype
4
refinement whole-genome
4
whole-genome sequencing
4
sequencing data
4
data multi-variate
4
multi-variate normal
4
normal distributions
4
distributions motivation
4
motivation whole-genome
4

Similar Publications

Identifying causal genetic variants underlying economically important traits in dairy cattle is essential for understanding their genetic basis and optimizing breeding programs. The growing availability of sequenced reference genomes and individuals with both phenotypic and genotypic data notably enhances our ability to detect genetic associations and further pinpoint causal effects. This comprehensive GWAS of dairy cattle used deregressed breeding values as phenotypes and analyzed 11,292,243 quality-controlled, imputed sequence variants from 50,309 Holstein bulls.

View Article and Find Full Text PDF

Clinical and molecular insights into Wiedemann-Rautenstrauch syndrome: A case report and genetic analysis of the c.2707G > A variant in the POLR3A gene.

Exp Gerontol

September 2025

Grupo de Investigación en Neurosciencias y Muerte Celular, Instituto de Genética, Universidad Nacional de Colombia, Bogotá, Colombia; Departamento de Pediatría, Facultad de Medicina, Universidad Nacional de Colombia, Bogotá, Colombia.

Wiedemann-Rautenstrauch syndrome (WRS) is a rare neonatal progeroid disorder primarily associated with pathogenic variants in POLR3A. However, the pathogenicity of certain variants remains unclear. Here, we report a WRS case carrying the POLR3A c.

View Article and Find Full Text PDF

Sorghum () is an ancient grain and the fifth most produced cereal worldwide, and the most consumed cereal in the semi-arid regions of Africa and Asia, being a key grain for the diet of about 500 million people. It is rich in phenolic compounds (like flavonoids, 3-deoxyanthocyanidins, phenolic acids), resistant starch, and dietary fiber, which may beneficially influence intestinal health. This systematic review analyzed 22 studies to assess the effects of sorghum processing on bioactive compounds and their effects on intestinal health.

View Article and Find Full Text PDF

GCNMF-SDA: predicting snoRNA-disease associations based on graph convolution and non-negative matrix factorization.

Brief Bioinform

August 2025

College of Information and Artificial Intelligence, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China.

Small nucleolar RNAs (snoRNAs) play crucial roles in a wide range of biological processes, and studying their association with diseases can enhance our understanding of disease pathogenesis. Nevertheless, current knowledge of these associations is limited traditional biological experiments are both costly and time-consuming. Consequently, developing efficient computational methods is essential for predicting potential snoRNA-disease associations.

View Article and Find Full Text PDF

Background: Autonomic symptoms are among the most important factors determining the quality of life in patients with Parkinson's disease (PD). This study aimed to assess the profile of autonomic dysfunction symptoms in three groups of patients with genetic PD, carrying mutations in , , and genes, compared with subjects with sporadic PD.

Methods: This case-control observational secondary analysis of prospectively collected data was performed on 742 patients (485 in the sporadic group, 165 in the LRRK2 group, 85 in the GBA group, and nine in the PRKN group).

View Article and Find Full Text PDF