Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, which are particularly challenging for fast k-mer-based genotypers. In the present study, we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation-a process we refer to as genome inference. Compared with mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage and achieves better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (≥50 bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being faster compared with alignment-based workflows.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9005351PMC
http://dx.doi.org/10.1038/s41588-022-01043-wDOI Listing

Publication Analysis

Top Keywords

genome inference
8
wide spectrum
8
pangenome-based genome
4
inference allows
4
allows efficient
4
efficient accurate
4
accurate genotyping
4
genotyping wide
4
spectrum variant
4
variant classes
4

Similar Publications

Objective: Major depressive disorder (MDD) is among the most prevalent and debilitating mental health conditions worldwide. This study aims to investigate the bidirectional causal relationship between immune cells and MDD using Mendelian randomization (MR) analysis and determine whether metabolites mediate this relationship.

Methods: We compiled and analyzed whole-genome data for 731 immune cell traits, 1091 blood metabolites, 309 metabolic ratios, and disease data from 170,756 individuals with MDD and 329,443 controls.

View Article and Find Full Text PDF

TissueMosaic: Self-supervised learning of tissue representations enables differential spatial transcriptomics across samples.

Cell Syst

September 2025

Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. Electronic address:

Spatial transcriptomics allows for the measurement of gene expression within the native tissue context. However, despite technological advancements, computational methods to link cell states with their microenvironment and compare these relationships across samples and conditions remain limited. To address this, we introduce Tissue Motif-Based Spatial Inference across Conditions (TissueMosaic), a self-supervised convolutional neural network designed to discover and represent tissue architectural motifs from multi-sample spatial transcriptomic datasets.

View Article and Find Full Text PDF

Lentinula edodes (shiitake mushroom) is a widely cultivated edible and medicinal fungus, valued for its bioactive compounds. While East Asian strains have been well studied, Indian populations remain under-characterized. This study explores the genetic and functional diversity of five Indian-origin L.

View Article and Find Full Text PDF

Background: Labeo fimbriatus (Bloch, 1795) is a medium-sized South Asian minor carp with ecological significance and emerging aquaculture potential, particularly in polyculture systems with Indian major carps. Despite its wide distribution, it remains underrepresented in phylogenetic studies, and limited genomic resources are available. Here, we report the complete mitochondrial genome sequence of L.

View Article and Find Full Text PDF

Genomes are composed of a mosaic of segments inherited from different ancestors, each separated by past recombination events. Consequently, genealogical relationships among multiple genomes vary spatially across different genomic regions. Genealogical variation among unlinked (uncorrelated) genomic regions is well described for either a single population (coalescent) or multiple structured populations (multispecies coalescent).

View Article and Find Full Text PDF