bpRNA-CosMoS: a robust and efficient RNA structural comparison method using k-mer based cosine similarity.

Bioinformatics

Department of Biochemistry and Biophysics, Oregon State University, 2011 Agricultural and Life Sciences, 2750 SW Campus Way, Corvallis, Oregon 97331, USA.

Published: March 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Motivation: RNA secondary structure is often essential to function. Recent work has led to the development of high-throughput experimental probing methods for structure determination. Although structure is more conserved than primary sequence, much of the bioinformatics pipelines to connect RNA structure to function rely on nucleotide sequence alignments rather than structural similarity. There is a need to develop methods for secondary structure comparisons that are also fast and efficient to navigate the vast amounts of structural data. K-mer based similarity approaches are valued for their computational efficiency and have been applied for protein, DNA, and RNA primary sequences. However, these approaches have yet to be implemented for RNA secondary structure.

Results: Our method, bpRNA-CosMoS, fills this gap by using k-mers and length-weighted cosine similarity to compute similarity scores between RNA structures. bpRNA-CosMoS is built upon the bpRNA structure array, which represents the structural category of each nucleotide as a single-character structural code (e.g. hairpin=H, etc.). A structural comparison score is calculated through cosine similarity of the k-mer count vectors, generated from structure arrays. A major challenge with k-mer based methods is that they often ignore the length of the sequences being compared. We have overcome this with a length-weighted penalty that addresses cases of two RNAs of vastly different lengths. In addition, the use of "fuzzy counting" has added some optional flexibility to decrease the negative impact that small structural variations have on the similarity score. This results in a robust and efficient way to identify structural comparisons across large datasets.

Availability And Implementation: The code and application guidelines of bpRNA-CosMoS are made available at github (https://github.com/BLasher113/bpRNA-CosMoS) and Zenodo (10.5281/zenodo.14715285).

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12017588	PMC
http://dx.doi.org/10.1093/bioinformatics/btaf108	DOI Listing

Publication Analysis

Top Keywords

k-mer based

cosine similarity

robust efficient

structural

structural comparison

rna secondary

secondary structure

similarity

structure

rna

Similar Publications

Genetic architecture of the S-locus supergene revealed in a tetraploid distylous species.

New Phytol

September 2025

State Key Laboratory of Plant Diversity and Specialty Crops/Key Laboratory of National Forestry and Grassland Administration on Plant Conservation and Utilization in Southern China, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China.

Zhonglai Luo , Spencer C H Barrett , Tieyao Tu , Zhongtao Zhao , Shanshan Jia

Heterostyly is a polymorphic floral adaptation controlled by supergenes. The molecular basis of distyly has been investigated in diploid species from several unrelated families, but information is lacking for polyploid systems. Here, we address this knowledge gap in Schizomussaenda henryi, a tetraploid distylous species of Rubiaceae, the family with the greatest number of heterostylous species.

View Article and Find Full Text PDF

Similar Publications

A Prelude to Conservation Genomics: First Chromosome-Level Genome Assembly of a Flying Squirrel (Pteromyini: ).

Ecol Evol

September 2025

Ecology and Genetics Research Unit University of Oulu Oulu Finland.

Gerrit Wehrenberg , Angelika Kiebler , Carola Greve , Núria Beltrán-Sanz , Alexander Ben Hamadou

The Siberian flying squirrel () represents the only European Pteromyini species. Thus, it is biogeographically unique due to its specialised anatomy and biology as a volant rodent. As a result of habitat fragmentation and destruction, Siberian flying squirrels experience severe and ongoing population declines throughout most of their distribution.

View Article and Find Full Text PDF

Similar Publications

Cross-Reactivity of SARS-CoV-2 Antibodies With Typhoid Flagellar-H Protein.

Trop Med Int Health

September 2025

ImmunoCure - Center for Inflammatory Diseases, Karachi, Pakistan.

Aneela Pasha , Mohammad Saeed

Background: Antigen cross-reactivity in infections may induce heterologous immunity, leading to immunological protection against widely divergent organisms. We hypothesised that this may be a factor in the varying intensity of COVID-19 infection globally.

Methods: During the COVID-19 pandemic, we tested 46 symptomatic patients for both COVID-19 antibodies and the Typhidot test.

View Article and Find Full Text PDF

Similar Publications

An explainable machine learning pipeline for prediction of antimicrobial resistance in .

Bioinform Adv

August 2025

Department of Biophysics, University of Delhi, New Delhi, Delhi, 110021, India.

Aakriti Jain , Govinda Rao Dabburu , Bishal Samanta , Neelja Singhal , Manish Kumar

Motivation: Prediction of antimicrobial resistance in using machine learning and genomic sequences holds the potential to serve as comparable alternatives to laboratory based detection if not better. Additionally, model interpretability can further enhance the potential of these models paving way for their reproducibility.

Results: We have developed a machine-learning based 2-tier pipeline to predict resistance phenotype in using only genomic sequences as input in the form of k-mers.

View Article and Find Full Text PDF

Similar Publications

Robust 16S rRNA classification based on a compressed LCA index.

Genome Res

August 2025

Johns Hopkins University;

Omar Y Ahmed , Christina Boucher , Ben Langmead

Taxonomic sequence classification is a computational problem central to the study of metagenomics and evolution Advances in compressed indexing with the -index enable full-text pattern matching against large sequence collections. But the data structures that link pattern sequences to their clades of origin still do not scale well to large collections. Previous work proposed the document array profiles, which use () words of space where is the number of maximal-equal letter runs in the Burrows-Wheeler transform and is the number of distinct genomes.

View Article and Find Full Text PDF

Similar Publications