Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: Many computational methods have been developed to detect non-reference transposable element (TE) insertions using short-read whole genome sequencing data. The diversity and complexity of such methods often present challenges to new users seeking to reproducibly install, execute, or evaluate multiple TE insertion detectors.

Results: We previously developed the McClintock meta-pipeline to facilitate the installation, execution, and evaluation of six first-generation short-read TE detectors. Here, we report a completely re-implemented version of McClintock written in Python using Snakemake and Conda that improves its installation, error handling, speed, stability, and extensibility. McClintock 2 now includes 12 short-read TE detectors, auxiliary pre-processing and analysis modules, interactive HTML reports, and a simulation framework to reproducibly evaluate the accuracy of component TE detectors. When applied to the model microbial eukaryote Saccharomyces cerevisiae, we find substantial variation in the ability of McClintock 2 components to identify the precise locations of non-reference TE insertions, with RelocaTE2 showing the highest recall and precision in simulated data. We find that RelocaTE2, TEMP, TEMP2 and TEBreak provide consistent estimates of [Formula: see text]50 non-reference TE insertions per strain and that Ty2 has the highest number of non-reference TE insertions in a species-wide panel of [Formula: see text]1000 yeast genomes. Finally, we show that best-in-class predictors for yeast applied to resequencing data have sufficient resolution to reveal a dyad pattern of integration in nucleosome-bound regions upstream of yeast tRNA genes for Ty1, Ty2, and Ty4, allowing us to extend knowledge about fine-scale target preferences revealed previously for experimentally-induced Ty1 insertions to spontaneous insertions for other copia-superfamily retrotransposons in yeast.

Conclusion: McClintock ( https://github.com/bergmanlab/mcclintock/ ) provides a user-friendly pipeline for the identification of TEs in short-read WGS data using multiple TE detectors, which should benefit researchers studying TE insertion variation in a wide range of different organisms. Application of the improved McClintock system to simulated and empirical yeast genome data reveals best-in-class methods and novel biological insights for one of the most widely-studied model eukaryotes and provides a paradigm for evaluating and selecting non-reference TE detectors in other species.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10347736PMC
http://dx.doi.org/10.1186/s13100-023-00296-4DOI Listing

Publication Analysis

Top Keywords

non-reference insertions
12
transposable element
8
short-read detectors
8
mcclintock
7
detectors
6
insertions
6
yeast
5
non-reference
5
data
5
reproducible evaluation
4

Similar Publications

Structural variation causes some human haplotypes to align poorly with the linear reference genome, leading to 'reference bias'. A pangenome reference graph could ameliorate this bias by relating a sample to multiple reference assemblies. However, this approach requires a new definition of a 'genetic variant.

View Article and Find Full Text PDF

Background: Because transposable elements (TEs) can cause heritable genetic changes, past work on TE mobility in Arabidopsis thaliana has mostly focused on new TE insertions in the germline of hypomethylated plants. It is, however, well-known that TEs can also be active in the soma, although the high-confidence detection of somatic events has been challenging.

Results: Here, we leverage the high accuracy of PacBio HiFi long reads to evaluate the somatic mobility of TEs in individuals of an A.

View Article and Find Full Text PDF

Integrating parental genomes to reduce reference bias and identify intramuscular fat genes in Qinchuan Black pigs.

J Anim Sci Biotechnol

July 2025

Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, Laboratory of Animal Fat Deposition & Muscle Development, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, China.

Background: Traditional genomic analysis relies on a single reference genome, which struggles to effectively characterize the genetic diversity among populations. This is due to the substantial genetic differences between the genome of the studied species and the reference genome, potentially introducing reference bias.

Results: In this study, we focused on Guanzhong Black pigs (GZB), Danish Large White pigs (DLW), and their hybrid offspring, Qinchuan Black pigs (QCB).

View Article and Find Full Text PDF

The fatal neurodegenerative disease, amyotrophic lateral sclerosis (ALS), leads to the degeneration of motor neurons in the brain and spinal cord. Many different genetic variants are known to increase the risk of developing ALS, however much of the disease heritability is still to be identified. To identify novel genetic factors, we characterised SINE-VNTR-Alu (SVA) presence/absence variation in 4403 genomes from the New York Genome Center (NYGC) ALS consortium.

View Article and Find Full Text PDF

Ultra-sensitive detection of transposon insertions across multiple families by transposable element display sequencing.

Genome Biol

March 2025

Institute of Plant Sciences Paris-Saclay (IPS2), Centre National de la Recherche Scientifique, Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement, Université Evry, Université Paris-Saclay, Gif Sur Yvette, 91190, France.

Background: Mobilization of transposable elements (TEs) can generate large effect mutations. However, due to the difficulty of detecting new TE insertions in genomes and the typically rare occurrence of transposition, the actual rate, distribution, and population dynamics of new insertions remain largely unexplored.

Results: We present a TE display sequencing approach that leverages target amplification of TE extremities to detect non-reference TE insertions with high specificity and sensitivity, enabling the detection of insertions at frequencies as low as 1 in 250,000 within a DNA sample.

View Article and Find Full Text PDF