Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10234299PMC
http://dx.doi.org/10.1101/gr.277334.122DOI Listing

Publication Analysis

Top Keywords

phased genome
12
genome assembly
8
trio-based approaches
8
assembly gaps
8
segmental duplications
8
satellite dna
8
protein-coding genes
8
gaps
5
assembly
5
dna
5

Similar Publications

Dormancy release and germination of the seed are two separate, but continuous phases controlled by both external (e.g., light and temperature) and internal (e.

View Article and Find Full Text PDF

A model-free method for genealogical inference without phasing and its application for topology weighting.

Genetics

September 2025

Institute of Ecology and Evolution, School of Biological Sciences, The University of Edinburgh, Edinburgh, EH9 3FL, United Kingdom.

Recent advances in methods to infer and analyse ancestral recombination graphs (ARGs) are providing powerful new insights in evolutionary biology and beyond. Existing inference approaches tend to be designed for use with fully-phased datasets, and some rely on model assumptions about demography and recombination rate. Here I describe a simple model-free approach for genealogical inference along the genome from unphased genotype data called Sequential Tree Inference by Collecting Compatible Sites (sticcs).

View Article and Find Full Text PDF

Here, we present a novel approach to estimate the degree to which the phenotypic effect of a DNA locus is attributable to four components: alleles in the child (direct genetic effects), alleles in the mother and the father (indirect genetic effects), or is dependent upon the parent from which it is inherited (parent-of-origin, PofO effects). Applying our model, JODIE, to 30,000 child-mother-father trios with phased DNA information from the Estonian Biobank (EstBB) and the Norwegian Mother, Father, Child Cohort (MoBa), we jointly estimate the phenotypic variance attributable to these four effects unbiased of assortative mating (AM) for height, body mass index (BMI) and childhood educational test score (EA). For all three traits, direct effects make the largest contribution to the genetic effect variance.

View Article and Find Full Text PDF

PhasiRNAs (phased small interfering RNAs) are a major class of plant small RNAs (sRNA) known to be key regulators in male reproductive development of maize (Zea mays) and rice (Oryza sativa), among other plants. Earlier research focused primarily on premeiotic 21-nucleotide (nt) phasiRNAs and meiotic 24-nt phasiRNAs, while new studies uncovered a premeiotic class of 24-nt phasiRNAs. The biogenesis and function of these phasiRNAs remain unclear.

View Article and Find Full Text PDF

New reference genomes and transcriptomes are increasingly available across the tree of life, opening new avenues to tackle exciting questions. However, there are still challenges associated with annotating genomes and inferring evolutionary processes and with a lack of methodological standardisation. Here, we propose a new workflow designed for evolutionary analyses to overcome these challenges, facilitating the detection of recombination suppression and its consequences in terms of rearrangements and transposable element accumulation.

View Article and Find Full Text PDF