98%
921
2 minutes
20
The field of genome assembly merely exists as long as sequencers are not able to yield chromosome-level error-less sequencing reads for all species. It consists in reconstituting the original genome sequence from sequencing reads, with a final number of fragments matching the expected number of chromosomes. This process has been facilitated by the availability of longer and more accurate reads. At the incipit of genome assembly, Sanger sequencing reads Sanger et al (Proc Natl Acad Sci 74(12)spiepr A3B2 twbch ":":spiepr A3B2 twbch5463-5467, 1977) were already used to yield initial assemblies of different species, including the first human genome assembly International Human Genome Sequencing Consortium (Nature 409(6822):860-921, 2001). The higher throughput of second-generation sequencing, often called next-generation sequencing, democratized assemblies for a wider variety of species but brought assembly difficulties as the large datasets of short reads required long computational time, large memory resources, and yielded highly fragmented assemblies, with fragment numbers far over the expected number of chromosomes Metzker (Nat Rev Genet 11(1):31-46, 2010). Third-generation sequencing introduced long reads through the technologies of Oxford Nanopore Deamer et al(Nat Biotechnol 34(5):518-524, 2016) and Pacific BioSciences (PacBio) Eid et al (Science 323(5910):133-138, 2009). Long reads can reach several tens of kilobase, and up to hundreds of thousands of base pairs; although these reads initially had a low accuracy, recent developments to decrease the error rate below 1% Sereika et al (Oxford Nanopore R10.4 long-read sequencing enables near-perfect bacterial genomes from pure cultures and metagenomes without short-read or reference polishing. bioRxiv, 2021); Wenger et al (Nat Biotechnol 37(October):1155-1162, 2019) have additionally reduced the complexity of genome assembly.Chromosome-level assemblies have become a standard in genome assembly publications: they can be used for synteny analysis, finding chromosomal rearrangements, they have more complete gene sets, a better resolution of repetitive content, and fewer contaminants. The availability of long reads and Hi-C and the decreasing cost of sequencing have brought about many high-quality insect genome assemblies, with currently 6,194 assemblies published on GenBank (accessed on 20.11.2024). Some remarkable efforts have been put toward Lepidoptera genomes, for which 188 chromosome-level assemblies were generated by the Darwin Tree of Life Wright et al (Nat Ecol Evol 8(4):777-790, 2024). At a smaller scale, a highly contiguous assembly was obtained for the ant Camponotus pennsylvanicus using Nanopore reads with a budget of only 1000$.Traditionally, genome assemblies are collapsed, meaning that sets of homologous chromosomes are represented by a single consensus sequence Guiglielmoni et al (BMC Bioinf 22(1):303, 2021). This approach is most adequate for low-heterozygosity genomes, and variants are documented a posteriori. Along the advent of high-accuracy long reads, phased assemblies, in which all haplotypes are included, are becoming more common.The assembly process typically involves multiple steps to tackle the challenges posed by the characteristics of the genome: ploidy, heterozygosity, repetitiveness… Reads may need to be preprocessed to remove adapters, select the longest and/or most accurate reads, or correct them to further improve accuracy. The most essential part lies, of course, in the assembly step. Reads are overlapped to build an assembly graph, and ad hoc algorithms are applied to find a path giving the most faithful representation of the genome. Assemblers yield a set of contiguous sequences or contigs. The output should then be evaluated to decide whether the assembly has reached the highest contiguity, completeness, and correctness, and if not, which steps should be performed subsequently to increase quality.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1007/978-1-0716-4583-3_1 | DOI Listing |
DNA Res
September 2025
Key Laboratory of National Forestry and Grassland Administration on Plant Conservation and Utilization in Southern China, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China.
Sauvagesia rhodoleuca is an endangered species endemic to southern China. Due to human activities, only six fragmented populations remain in Guangdong and Guangxi. Despite considerable conservation efforts, its demographic history and evolution remain poorly understood, particularly from a genomic perspective.
View Article and Find Full Text PDFPlant Cell Environ
October 2025
Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, China.
Vet Microbiol
September 2025
University of Kentucky Veterinary Diagnostic Laboratory, Lexington, KY 40511, United States of America. Electronic address:
Neorickettsia risticii (N. risticii) is an obligatory intracellular bacterium that causes Potomac horse fever (PHF), a disease clinically characterized by diarrhea, pyrexia, and laminitis in horses. Although sporadic reports of N.
View Article and Find Full Text PDFMol Genet Genomics
September 2025
Department of Biotechnology, School of Life Sciences, Central University of Rajasthan, Ajmer, Rajasthan, 305817, India.
Mosquito reproductive biology is an underexplored area with potential for developing novel vector control strategies. In this study, we investigated the role of the testis-specific serine/threonine-protein kinase (tssk) family, an essential regulator of spermiogenesis in mammals, in mosquitoes. We identified tssk homologues, As_tssk3 and Aea_tssk1, in Anopheles stephensi and Aedes aegypti, respectively and analyzed their expression across different developmental stages.
View Article and Find Full Text PDFISME J
September 2025
Institute of Molecular Biology, Academia Sinica, Taipei, 11529, Taiwan.
Mutualistic endosymbiosis is a cornerstone of evolutionary innovation, enabling organisms to exploit diverse niches unavailable to individual species. However, our knowledge about the early evolutionary stage of this relationship remains limited. The association between the ciliate Tetrahymena utriculariae and its algal endosymbiont Micractinium tetrahymenae indicates an incipient stage of photoendosymbiosis.
View Article and Find Full Text PDF