CONSULT-II: accurate taxonomic identification and profiling using locality-sensitive hashing.

Ali Osman Berk Şapcı , Eleonora Rachtman , Siavash Mirarab

Bioinformatics

Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, CA 92093, United States.

Published: March 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Motivation: Taxonomic classification of short reads and taxonomic profiling of metagenomic samples are well-studied yet challenging problems. The presence of species belonging to groups without close representation in a reference dataset is particularly challenging. While k-mer-based methods have performed well in terms of running time and accuracy, they tend to have reduced accuracy for such novel species. Thus, there is a growing need for methods that combine the scalability of k-mers with increased sensitivity.

Results: Here, we show that using locality-sensitive hashing (LSH) can increase the sensitivity of the k-mer-based search. Our method, which combines LSH with several heuristics techniques including soft lowest common ancestor labeling and voting, is more accurate than alternatives in both taxonomic classification of individual reads and abundance profiling.

Availability And Implementation: CONSULT-II is implemented in C++, and the software, together with reference libraries, is publicly available on GitHub https://github.com/bo1929/CONSULT-II.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10985673	PMC
http://dx.doi.org/10.1093/bioinformatics/btae150	DOI Listing

Publication Analysis

Top Keywords

locality-sensitive hashing

taxonomic classification

consult-ii accurate

taxonomic

accurate taxonomic

taxonomic identification

identification profiling

profiling locality-sensitive

hashing motivation

motivation taxonomic

Similar Publications

Multi-metric locality sensitive hashing enhances alignment accuracy of bisulfite sequencing reads: BisHash.

Bioinform Adv

July 2025

Department of Computer Engineering, Sharif University of Technology, Tehran, 1458889694, Iran.

Hassan Nikaein , Ali Sharifi-Zarchi

Motivation: Locality-Sensitive Hashing (LSH) is a widely used algorithm for estimating similarity between large datasets in bioinformatics, with applications in genome assembly, sequence alignment, and metagenomics. However, traditional single-metric LSH approaches often lead to inefficiencies, particularly when handling biological data where regions may have diverse evolutionary histories or structural properties. This limitation can reduce accuracy in sequence alignment, variant calling, and functional analysis.

View Article and Find Full Text PDF

Similar Publications

Metagenomic sequence classification based on local sensitive hashing and Bi-LSTM.

J Bioinform Comput Biol

August 2025

School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, P. R. China.

Yan Qian , Lei Xiao , Yiding Zhou , Li Deng

Current metagenomic classification methods are limited by short -mer lengths and database dependency, resulting in insufficient taxonomic resolution at the species and genus level. This study proposes the first method integrating Locality-Sensitive Hashing (LSH) and Bidirectional Long-Short Term Memory (Bi-LSTM) networks for metagenomic sequence classification. The approach reduces runtime reliance on reference databases by learning discriminative features directly from sequences, while supporting long -mers.

View Article and Find Full Text PDF

Similar Publications

TCR2HLA: calibrated inference of HLA genotypes from TCR repertoires enables identification of immunologically relevant metaclonotypes.

bioRxiv

July 2025

Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, USA.

Koshlan Mayer-Blackwell , Anastasia Minervina , Mikhail Pogorelyy , Puneet Rawat , Melanie R Shapiro

T cell receptors (TCRs) recognize peptides presented by polymorphic human leukocyte antigen (HLA) molecules, but HLA genotype data are often missing from TCR repertoire sequencing studies. To address this, we developed TCR2HLA, an open-source tool that infers HLA genotypes from TCRβ repertoires. Expanding on work linking public TRBV-CDR3 sequences to HLA genotypes, we incorporated "quasi-public" metaclonotypes - composed of rarer TCRβ sequences with shared amino acid features - enriched by HLA genotypes.

View Article and Find Full Text PDF

Similar Publications

A -mer-based maximum likelihood method for estimating distances of reads to genomes enables genome-wide phylogenetic placement.

bioRxiv

July 2025

Ali Osman Berk Şapcı , Siavash Mirarab

Comparing each sequencing read in a sample to large databases of known genomes has become a fundamental tool with wide-ranging applications, including metagenomics. These comparisons can be based on read-to-genome alignment, which is relatively slow, especially if done with the high sensitivity needed to characterize queries without a close representation in the reference dataset. A more scalable alternative is assigning taxonomic labels to reads using signatures such as k-mer presence/absence.

View Article and Find Full Text PDF

Similar Publications

Locality-Sensitive Hashing-Based Data Set Reduction for Deep Potential Training.

J Chem Theory Comput

June 2025

Department of Chemistry, Indian Institute of Technology, Delhi, Hauz Khas, New Delhi 110016, India.

Anmol , Anuj Kumar Sirohi , Neha , Jayadeva , Sandeep Kumar

Machine learning methods provide a great scope for developing ab initio quality potentials for diverse systems, ranging from simple fluids to complex solids. However, these methods typically require extensive data sets for effective model training, and the accuracy of the ML potential is highly dependent on data quality, necessitating expensive ab initio calculations. To address this challenge, we present a novel method based on locality-sensitive hashing, designed to minimize the data set size, thereby reducing the number of expensive quantum chemical calculations while preserving the data set's diversity and accuracy.

View Article and Find Full Text PDF

Similar Publications