Varigraph: An accurate and widely applicable pangenome graph-based variant genotyper for diploid and polyploid genomes.

Mol Plant

National Key Laboratory for Germplasm Innovation and Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China; Hubei Hongshan Laboratory, Wuhan, China; Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, Chin

Published: September 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Accurate variant genotyping is crucial for genomics-assisted breeding. Graph pangenome references can address single-reference bias, thereby enhancing the performance of variant genotyping and empowering downstream applications in population genetics and quantitative genetics. However, existing pangenome-based genotyping methods are ineffective in handling large or complex pangenome graphs, particularly in polyploid genomes. Here, we introduce Varigraph, an algorithm that leverages the comparison of unique and repetitive k-mers between variant sites and short reads for genotyping both small and large variants. We evaluated Varigraph on a diverse set of representative plant genomes as well as human genomes. Varigraph outperforms current state-of-the-art linear and graph-based genotypers across non-human genomes while maintaining comparable genotyping performance in human genomes. By employing efficient data structures including counting Bloom filter and bitmap storage, as well as GPU models, Varigraph achieves improved precision and robustness in repetitive regions while managing computational costs for large datasets. Its wide applicability extends to highly repetitive or large genomes, such as those of maize and wheat. Significantly, Varigraph can handle extensive pangenome graphs, as demonstrated by its performance on a dataset containing 252 rice genomes, for which it achieved a precision exceeding 0.9 for both small and large variants. Notably, Varigraph is capable of effectively utilizing pangenome graphs for genotyping autopolyploids, enabling precise determination of allele dosage. In summary, this work provides a robust and accurate solution for genotyping plant genomes and will advance plant genomic studies and genomics-assisted breeding.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.molp.2025.08.001DOI Listing

Publication Analysis

Top Keywords

pangenome graphs
12
genomes
9
polyploid genomes
8
variant genotyping
8
genomics-assisted breeding
8
small large
8
large variants
8
plant genomes
8
human genomes
8
varigraph
7

Similar Publications

Genome graphs provide a powerful reference structure for representing genetic diversity. Their structure emphasizes the polymorphic regions in a collection of genomes, enabling network-based comparisons of population-level variation. However, current tools are limited in their ability to quantify and compare structural features across large genome graphs.

View Article and Find Full Text PDF

Genome structural variants (SVs) comprise a sizable portion of functionally important genetic variation in all organisms; yet, many SVs evade discovery using short reads. While long-read sequencing can find the hidden SVs, the role of SVs in variation in organismal traits remains largely unclear. To address this gap, we investigate the molecular basis of 50 classical phenotypes in 11 strains using highly contiguous genome assemblies generated with Oxford Nanopore long reads.

View Article and Find Full Text PDF

Background: Water buffalo is a cornerstone livestock species in many low- and middle-income countries, yet major gaps persist in its genomic characterization-complicated by the divergent karyotypes of its two subspecies (swamp and river). Such genomic complexity makes water buffalo a particularly good candidate for the use of graph genomics, which can capture variation missed by linear reference approaches. However, the utility of this approach to improve water buffalo has been largely unexplored.

View Article and Find Full Text PDF

Affordable genotyping methods are essential in genomics. Commonly used genotyping methods primarily support single nucleotide variants and short indels but neglect structural variants. Additionally, accuracy of read alignments to a reference genome is unreliable in highly polymorphic and repetitive regions, further impacting genotyping performance.

View Article and Find Full Text PDF