98%
921
2 minutes
20
Background: Many computational methods have been developed to detect non-reference transposable element (TE) insertions using short-read whole genome sequencing data. The diversity and complexity of such methods often present challenges to new users seeking to reproducibly install, execute, or evaluate multiple TE insertion detectors.
Results: We previously developed the McClintock meta-pipeline to facilitate the installation, execution, and evaluation of six first-generation short-read TE detectors. Here, we report a completely re-implemented version of McClintock written in Python using Snakemake and Conda that improves its installation, error handling, speed, stability, and extensibility. McClintock 2 now includes 12 short-read TE detectors, auxiliary pre-processing and analysis modules, interactive HTML reports, and a simulation framework to reproducibly evaluate the accuracy of component TE detectors. When applied to the model microbial eukaryote Saccharomyces cerevisiae, we find substantial variation in the ability of McClintock 2 components to identify the precise locations of non-reference TE insertions, with RelocaTE2 showing the highest recall and precision in simulated data. We find that RelocaTE2, TEMP, TEMP2 and TEBreak provide consistent estimates of [Formula: see text]50 non-reference TE insertions per strain and that Ty2 has the highest number of non-reference TE insertions in a species-wide panel of [Formula: see text]1000 yeast genomes. Finally, we show that best-in-class predictors for yeast applied to resequencing data have sufficient resolution to reveal a dyad pattern of integration in nucleosome-bound regions upstream of yeast tRNA genes for Ty1, Ty2, and Ty4, allowing us to extend knowledge about fine-scale target preferences revealed previously for experimentally-induced Ty1 insertions to spontaneous insertions for other copia-superfamily retrotransposons in yeast.
Conclusion: McClintock ( https://github.com/bergmanlab/mcclintock/ ) provides a user-friendly pipeline for the identification of TEs in short-read WGS data using multiple TE detectors, which should benefit researchers studying TE insertion variation in a wide range of different organisms. Application of the improved McClintock system to simulated and empirical yeast genome data reveals best-in-class methods and novel biological insights for one of the most widely-studied model eukaryotes and provides a paradigm for evaluating and selecting non-reference TE detectors in other species.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10347736 | PMC |
http://dx.doi.org/10.1186/s13100-023-00296-4 | DOI Listing |
bioRxiv
August 2025
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
Structural variation causes some human haplotypes to align poorly with the linear reference genome, leading to 'reference bias'. A pangenome reference graph could ameliorate this bias by relating a sample to multiple reference assemblies. However, this approach requires a new definition of a 'genetic variant.
View Article and Find Full Text PDFGenome Biol
July 2025
Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076, Tübingen, Germany.
Background: Because transposable elements (TEs) can cause heritable genetic changes, past work on TE mobility in Arabidopsis thaliana has mostly focused on new TE insertions in the germline of hypomethylated plants. It is, however, well-known that TEs can also be active in the soma, although the high-confidence detection of somatic events has been challenging.
Results: Here, we leverage the high accuracy of PacBio HiFi long reads to evaluate the somatic mobility of TEs in individuals of an A.
J Anim Sci Biotechnol
July 2025
Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, Laboratory of Animal Fat Deposition & Muscle Development, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, China.
Background: Traditional genomic analysis relies on a single reference genome, which struggles to effectively characterize the genetic diversity among populations. This is due to the substantial genetic differences between the genome of the studied species and the reference genome, potentially introducing reference bias.
Results: In this study, we focused on Guanzhong Black pigs (GZB), Danish Large White pigs (DLW), and their hybrid offspring, Qinchuan Black pigs (QCB).
Exp Biol Med (Maywood)
June 2025
Perron Institute for Neurological and Translational Science, Perth, WA, Australia.
The fatal neurodegenerative disease, amyotrophic lateral sclerosis (ALS), leads to the degeneration of motor neurons in the brain and spinal cord. Many different genetic variants are known to increase the risk of developing ALS, however much of the disease heritability is still to be identified. To identify novel genetic factors, we characterised SINE-VNTR-Alu (SVA) presence/absence variation in 4403 genomes from the New York Genome Center (NYGC) ALS consortium.
View Article and Find Full Text PDFGenome Biol
March 2025
Institute of Plant Sciences Paris-Saclay (IPS2), Centre National de la Recherche Scientifique, Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement, Université Evry, Université Paris-Saclay, Gif Sur Yvette, 91190, France.
Background: Mobilization of transposable elements (TEs) can generate large effect mutations. However, due to the difficulty of detecting new TE insertions in genomes and the typically rare occurrence of transposition, the actual rate, distribution, and population dynamics of new insertions remain largely unexplored.
Results: We present a TE display sequencing approach that leverages target amplification of TE extremities to detect non-reference TE insertions with high specificity and sensitivity, enabling the detection of insertions at frequencies as low as 1 in 250,000 within a DNA sample.