98%
921
2 minutes
20
Motivation: RNA secondary structure is often essential to function. Recent work has led to the development of high-throughput experimental probing methods for structure determination. Although structure is more conserved than primary sequence, much of the bioinformatics pipelines to connect RNA structure to function rely on nucleotide sequence alignments rather than structural similarity. There is a need to develop methods for secondary structure comparisons that are also fast and efficient to navigate the vast amounts of structural data. K-mer based similarity approaches are valued for their computational efficiency and have been applied for protein, DNA, and RNA primary sequences. However, these approaches have yet to be implemented for RNA secondary structure.
Results: Our method, bpRNA-CosMoS, fills this gap by using k-mers and length-weighted cosine similarity to compute similarity scores between RNA structures. bpRNA-CosMoS is built upon the bpRNA structure array, which represents the structural category of each nucleotide as a single-character structural code (e.g. hairpin=H, etc.). A structural comparison score is calculated through cosine similarity of the k-mer count vectors, generated from structure arrays. A major challenge with k-mer based methods is that they often ignore the length of the sequences being compared. We have overcome this with a length-weighted penalty that addresses cases of two RNAs of vastly different lengths. In addition, the use of "fuzzy counting" has added some optional flexibility to decrease the negative impact that small structural variations have on the similarity score. This results in a robust and efficient way to identify structural comparisons across large datasets.
Availability And Implementation: The code and application guidelines of bpRNA-CosMoS are made available at github (https://github.com/BLasher113/bpRNA-CosMoS) and Zenodo (10.5281/zenodo.14715285).
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12017588 | PMC |
http://dx.doi.org/10.1093/bioinformatics/btaf108 | DOI Listing |
New Phytol
September 2025
State Key Laboratory of Plant Diversity and Specialty Crops/Key Laboratory of National Forestry and Grassland Administration on Plant Conservation and Utilization in Southern China, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China.
Heterostyly is a polymorphic floral adaptation controlled by supergenes. The molecular basis of distyly has been investigated in diploid species from several unrelated families, but information is lacking for polyploid systems. Here, we address this knowledge gap in Schizomussaenda henryi, a tetraploid distylous species of Rubiaceae, the family with the greatest number of heterostylous species.
View Article and Find Full Text PDFThe Siberian flying squirrel () represents the only European Pteromyini species. Thus, it is biogeographically unique due to its specialised anatomy and biology as a volant rodent. As a result of habitat fragmentation and destruction, Siberian flying squirrels experience severe and ongoing population declines throughout most of their distribution.
View Article and Find Full Text PDFTrop Med Int Health
September 2025
ImmunoCure - Center for Inflammatory Diseases, Karachi, Pakistan.
Background: Antigen cross-reactivity in infections may induce heterologous immunity, leading to immunological protection against widely divergent organisms. We hypothesised that this may be a factor in the varying intensity of COVID-19 infection globally.
Methods: During the COVID-19 pandemic, we tested 46 symptomatic patients for both COVID-19 antibodies and the Typhidot test.
Bioinform Adv
August 2025
Department of Biophysics, University of Delhi, New Delhi, Delhi, 110021, India.
Motivation: Prediction of antimicrobial resistance in using machine learning and genomic sequences holds the potential to serve as comparable alternatives to laboratory based detection if not better. Additionally, model interpretability can further enhance the potential of these models paving way for their reproducibility.
Results: We have developed a machine-learning based 2-tier pipeline to predict resistance phenotype in using only genomic sequences as input in the form of k-mers.
Taxonomic sequence classification is a computational problem central to the study of metagenomics and evolution Advances in compressed indexing with the -index enable full-text pattern matching against large sequence collections. But the data structures that link pattern sequences to their clades of origin still do not scale well to large collections. Previous work proposed the document array profiles, which use () words of space where is the number of maximal-equal letter runs in the Burrows-Wheeler transform and is the number of distinct genomes.
View Article and Find Full Text PDF