Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Using -mers to find sequence matches is increasingly used in many bioinformatic applications, including metagenomic sequence classification. The accuracy of these down-stream applications relies on the density of the reference databases, which, luckily, are rapidly growing. While the increased density provides hope for dramatic improvements in accuracy, scalability is a concern. Reference -mers are kept in the memory during the query time, and saving all -mers of these ever-expanding databases is fast becoming impractical. Several strategies for subsampling have been proposed, including minimizers and finding taxon-specific -mers. However, we contend that these strategies are inadequate, especially when reference sets are taxonomically imbalanced, as are most microbial libraries. In this paper, we explore approaches for selecting a fixed-size subset of -mers present in an ultra-large dataset to include in a library such that the classification of reads suffers the least. Our experiments demonstrate the limitations of existing approaches, especially for novel and poorly sampled groups. We propose a library construction algorithm called KRANK (K-mer RANKer) that combines several components, including a hierarchical selection strategy with adaptive size restrictions and an equitable coverage strategy. We implement KRANK in highly optimized code and combine it with the locality-sensitive-hashing classifier CONSULT-II to build a taxonomic classification and profiling method. On several benchmarks, KRANK -mer selection dramatically reduces memory consumption with minimal loss in classification accuracy. We show in extensive analyses based on CAMI benchmarks that KRANK outperforms -mer-based alternatives in terms of taxonomic profiling and comes close to the best marker-based methods in terms of accuracy.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11257464PMC
http://dx.doi.org/10.1101/2024.02.12.580015DOI Listing

Publication Analysis

Top Keywords

-mer selection
8
classification accuracy
8
benchmarks krank
8
-mers
5
memory-bound -mer
4
selection large
4
large evolutionary
4
evolutionary diverse
4
reference
4
diverse reference
4

Similar Publications

The gas-phase structures of dibenzo-24-crown-8 (DB24C8) and dinaphtho-24-crown-8 (DN24C8) complexes with divalent metal ions (Mg, Ca, Sr, Ba, Fe, Ni, and Zn) were investigated by cryogenic ion mobility-mass spectrometry (IM-MS) in combination with density functional theory calculations. Several complexes, particularly those of DN24C8, exhibited multiple coexisting conformers. DFT-optimized structures were classified based on the relative orientation of the two aromatic rings in the crown ether.

View Article and Find Full Text PDF

Survivin, a protein overexpressed in various fetal and malignant tumor tissues, induces tumor progression and resistance to cancer therapy. Cell surface vimentin has -acetylglucosamine (GlcNAc)-binding activities in several cell types including tumor cells. Furthermore, GlcNAc-bearing polymers downregulate the expression of the survivin-encoding baculoviral inhibitor of apoptosis protein repeat-containing protein 5 ().

View Article and Find Full Text PDF

Somatic hypermutation (SHM) is the diversity-generating process in antibody affinity maturation. Probabilistic models of SHM are needed for analyzing rare mutations, understanding the selective forces guiding affinity maturation, and understanding the underlying biochemical process. High-throughput data offers the potential to develop and fit models of SHM on relevant data sets.

View Article and Find Full Text PDF

Fostemsavir analog BMS-818251 has enhanced viral neutralization potency and similar escape mutation profile.

Antimicrob Agents Chemother

August 2025

Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA.

BMS-818251, a fostemsavir analog, is a next-generation HIV-1 attachment inhibitor with enhanced potency and a similar resistance profile. By using viral outgrowth assays with HIV+ donor samples, we demonstrate here that BMS-818251 exhibits superior viral suppression compared to temsavir, the active form of fostemsavir. To map potential resistance pathways, we employed deep mutational scanning and pseudotyped virus neutralization assays to identify escape mutations within the HIV-1 envelope glycoprotein (Env).

View Article and Find Full Text PDF

Synthesis of a Pseudocytidine Nucleoside to Form a Stable and Selective Base Pair with Iso-guanosine in RNA.

Org Lett

September 2025

Faculty of Medicine, Dentistry and Pharmaceutical Sciences, Okayama University, 1-1-1 Tsushima-naka, Kita-ku, Okayama 700-8530, Japan.

Non-natural base pair formation provides insight into new functions of nucleic acids. Therefore, various artificial base pairs have been developed in both DNA and RNA. In this work, we successfully synthesized pseudocytidine from commercially available pseudouridine to form base pairs with isoguanine, also known as 2-OH-adenine, in RNA.

View Article and Find Full Text PDF