Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Recent advances in sequencing technology have considerably promoted genomics research by providing high-throughput sequencing economically. This great advancement has resulted in a huge amount of sequencing data. Clustering analysis is powerful to study and probe the large-scale sequence data. A number of available clustering methods have been developed in the last decade. Despite numerous comparison studies being published, we noticed that they have two main limitations: only traditional alignment-based clustering methods are compared and the evaluation metrics heavily rely on labeled sequence data. In this study, we present a comprehensive benchmark study for sequence clustering methods. Specifically, i) alignment-based clustering algorithms including classical (e.g., CD-HIT, UCLUST, VSEARCH) and recently proposed methods (e.g., MMseq2, Linclust, edClust) are assessed; ii) two alignment-free methods (e.g., LZW-Kernel and Mash) are included to compare with alignment-based methods; and iii) different evaluation measures based on the true labels (supervised metrics) and the input data itself (unsupervised metrics) are applied to quantify their clustering results. The aims of this study are to help biological analyzers in choosing one reasonable clustering algorithm for processing their collected sequences, and furthermore, motivate algorithm designers to develop more efficient sequence clustering approaches.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2023.3253138DOI Listing

Publication Analysis

Top Keywords

sequence clustering
12
clustering methods
12
clustering
9
sequence data
8
alignment-based clustering
8
methods
6
sequence
5
comparison methods
4
methods biological
4
biological sequence
4

Similar Publications

Distinct codon usage signatures reflecting evolutionary and pathogenic adaptation in the Acinetobacter baumannii complex.

Eur J Clin Microbiol Infect Dis

September 2025

School of Bioengineering and Biosciences, Department of Biochemistry, Lovely Professional University, Punjab, 144411, India.

Purpose: This study investigates codon usage and amino acid usage bias in the genus Acinetobacter to uncover the evolutionary forces shaping these patterns and their implications for pathogenicity and biotechnology.

Methods: Codon usage patterns were examined in representative genomes of the genus Acinetobacter using standard codon bias indices, including GC content, relative synonymous codon usage (RSCU), effective number of codons (ENC), and codon adaptation index (CAI). Neutrality and parity plots were employed to evaluate the relative influence of mutational pressure and natural selection on codon preferences.

View Article and Find Full Text PDF

Nisin-like biosynthetic gene clusters are widely distributed across microbiomes.

mBio

September 2025

APC Microbiome Ireland, Biosciences Institute, Biosciences Research Institute, University College, Cork, Ireland.

Bacteriocins are antimicrobial peptides/proteins that can have narrow or broad inhibitory spectra and remarkable potency against clinically relevant pathogens. One such bacteriocin that is extensively used in the food industry and with potential for biotherapeutic application is the post-translationally modified peptide, nisin. Recent studies have shown the impact of nisin on the gastrointestinal microbiome, but relatively little is known of how abundant nisin production is in nature, the breadth of existing variants, and their antimicrobial potency.

View Article and Find Full Text PDF

emerged in Chicago, IL, USA, in 2016 and has since become endemic. We used whole-genome sequencing (WGS) of 494 isolates, epidemiologic metadata and patient transfer data to describe the transmission of among Chicago healthcare facilities between 2016 and 2021. In total, 99% of isolates formed a single clade IV phylogenetic lineage, suggesting a single introduction.

View Article and Find Full Text PDF

Background: Parkinson's disease (PD) often presents with lateralized motor symptoms at onset, reflecting asymmetric degeneration of the substantia nigra (SN). Neuromelanin (NM) loss and iron accumulation are hallmarks of SN pathology in PD, but their spatial distribution and interrelationship in PD patients with right-sided (PDR) or left-sided (PDL) motor symptom onset remain unclear.

Purpose: To investigate the spatial vulnerability and interrelationship of NM and iron in the SN among PDR, PDL, and healthy controls (HCs) using MRI.

View Article and Find Full Text PDF

Background: Metabolic reprogramming is an important hallmark of cervical cancer (CC), and extensive studies have provided important information for translational and clinical oncology. Here we sought to determine metabolic association with molecular aberrations, telomere maintenance and outcomes in CC.

Methods: RNA sequencing data from TCGA cohort of CC was analyzed for their metabolic gene expression profile and consensus clustering was then performed to classify tumors into different groups/subtypes.

View Article and Find Full Text PDF