Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs.

BMC Bioinformatics

Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany.

Published: June 2023


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: Identifying the locations of gene breakpoints between species of different taxonomic groups can provide useful insights into the underlying evolutionary processes. Given the exact locations of their genes, the breakpoints can be computed without much effort. However, often, existing gene annotations are erroneous, or only nucleotide sequences are available. Especially in mitochondrial genomes, high variations in gene orders are usually accompanied by a high degree of sequence inconsistencies. This makes accurately locating breakpoints in mitogenomic nucleotide sequences a challenging task.

Results: This contribution presents a novel method for detecting gene breakpoints in the nucleotide sequences of complete mitochondrial genomes, taking into account possible high substitution rates. The method is implemented in the software package DeBBI. DeBBI allows to analyze transposition- and inversion-based breakpoints independently and uses a parallel program design, allowing to make use of modern multi-processor systems. Extensive tests on synthetic data sets, covering a broad range of sequence dissimilarities and different numbers of introduced breakpoints, demonstrate DeBBI 's ability to produce accurate results. Case studies using species of various taxonomic groups further show DeBBI 's applicability to real-life data. While (some) multiple sequence alignment tools can also be used for the task at hand, we demonstrate that especially gene breaks between short, poorly conserved tRNA genes can be detected more frequently with the proposed approach.

Conclusion: The proposed method constructs a position-annotated de-Bruijn graph of the input sequences. Using a heuristic algorithm, this graph is searched for particular structures, called bulges, which may be associated with the breakpoint locations. Despite the large size of these structures, the algorithm only requires a small number of graph traversal steps.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10243065PMC
http://dx.doi.org/10.1186/s12859-023-05371-4DOI Listing

Publication Analysis

Top Keywords

gene breakpoints
12
nucleotide sequences
12
detecting gene
8
species taxonomic
8
taxonomic groups
8
mitochondrial genomes
8
breakpoints
7
sequences
5
gene
5
breakpoints noisy
4

Similar Publications

Acute lymphoblastic leukemia (ALL) is the most common hematologic malignancy in children. Current clinical diagnosis primarily relies on invasive detection methods, while molecular subtyping remains a complex and time-consuming process. This study innovatively employed silver nanoparticle-based surface-enhanced Raman spectroscopy (SERS) technology to systematically analyze 116 serum samples, including those with breakpoint cluster region-Abelson (-) fusion genotype, mixed-lineage leukemia (, also known as lysine methyltransferase 2A, ) gene rearrangement subtype, T-lymphoblastic ALL, and healthy controls.

View Article and Find Full Text PDF

Background And Objective: Parental chromosomal structural variations (SVs) represent a primary genetic factor contributing to recurrent spontaneous abortion (RSA). Individuals carrying SVs with complex chromosomal rearrangements (CCRs) typically exhibit a normal phenotype but are at an increased risk of miscarriage. Current standard clinical detection methods are insufficient for the identification and interpretation of all SV types, particularly complex and occult SVs, thereby presenting a significant challenge for clinical genetic counseling.

View Article and Find Full Text PDF

Chronic myeloid leukaemia (CML) accounts for 2% of leukaemias in children and 9% in adolescents. While the BCR::ABL1 fusion gene remains a hallmark across all age groups, emerging evidence suggests that paediatric CML exhibits unique biological and clinical characteristics compared to its adult counterpart. Children often present with more aggressive clinical features and show distinct treatment response patterns.

View Article and Find Full Text PDF

Introduction: B-cell acute lymphoblastic leukemia (B-ALL) is genetically heterogeneous. We assessed the utility of FusionPlex ALL targeted RNA sequencing panel in detecting gene fusions and other genomic lesions in B-ALL.

Methods: The high-risk B-ALL, negative for common recurrent gene fusions (RGF), that is, BCR::ABL1, ETV6::RUNX1, TCF3::PBX1 and KMT2A::AFF1, were analysed with RNA-based targeted sequencing 81-gene-panel FusionPlex ALL (IDT, USA).

View Article and Find Full Text PDF

Background & Aims: HBV integration profiles in the natural history of chronic HBV infection (CHB) have not been well-defined. Hence, we aimed to determine HBV integration profiles across different CHB phases.

Methods: We delineated integration profiles from liver biopsies of 55 patients in different CHB phases (3 HBsAg-positive/HBeAg-positive infection; 13 HBsAg-positive/HBeAg-positive hepatitis; 7 HBsAg-positive/HBeAg-negative infection; 12 HBsAg-positive/HBeAg-negative hepatitis; 10 HBsAg seroclearance; 10 occult HBV).

View Article and Find Full Text PDF