Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm.

BMC Bioinformatics

Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Baltimore, 20742, Maryland.

Published: July 2015


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here.

Results: All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios.

Conclusions: The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4496851PMC
http://dx.doi.org/10.1186/s12859-015-0625-xDOI Listing

Publication Analysis

Top Keywords

bit score
24
edge-weighting metrics
12
blast-based edge-weighting
8
markov clustering
8
popular nle
8
metrics
6
sequences
6
bit
6
score
6
evaluation blast-based
4

Similar Publications

Intracameral lidocaine reduces pain in cataract surgery, but only a little bit.

Can J Ophthalmol

September 2025

Department of Ophthalmology and Visual Sciences, University of Alberta, Royal Alexandra Hospital, Edmonton, AB, Canada.. Electronic address:

Objective: To assess the effectiveness of intracameral lidocaine during routine cataract surgery.

Design: Prospective, single-blinded, randomized, controlled trial.

Participants: Adult patients undergoing routine phacoemulsification cataract extraction in a chartered surgical facility in Edmonton, Alberta, Canada, were enrolled in the study.

View Article and Find Full Text PDF

Introduction: Colorectal cancer (CRC) ranks as the second deadliest cancer globally, impacting patients' quality of life. Colonoscopy is the primary screening method for detecting adenomas and polyps, crucial for reducing long-term CRC risk, but it misses about 30% of cases. Efforts to improve detection rates include using AI to enhance colonoscopy.

View Article and Find Full Text PDF

Objectives: Epstein-Barr virus (EBV) reactivation following allogeneic hematopoietic cell transplantation (allo-HCT) is associated with increased mortality and possible post-transplant lymphoproliferative disorder (PTLD). With the lack of prophylactic agents, identifying modifiable risk factors to prevent EBV-related mortality is desired. Cytomegalovirus (CMV) DNAemia has been previously associated with EBV DNAemia; the impact of letermovir prophylaxis on this association remains unclear.

View Article and Find Full Text PDF

This study introduces PicoSleepNet, an ultra-lightweight sleep stage classification method that utilizes a spiking neural network (SNN) with single-channel electroencephalogram (EEG) signals. Traditional methods use multi-bit Nyquist sampling and dense computing, which result in high complexity and power consumption, hindering their deployment on wearable devices. To address these limitations, we propose an innovative pipeline combining single-bit sub-Nyquist level-crossing sampling (LCS) and sparse computing based on SNN.

View Article and Find Full Text PDF

DB-SNet: A dual branch network for aortic component segmentation and lesion localization.

Comput Med Imaging Graph

September 2025

School of Medical Technology, Beijing Institute of Technology, No.5 Zhongguancun South Street, Haidian District, Beijing 100081, China; Department of Radiology, Chinese PLA General Hospital, No. 28 Fuxing Road, Haidian District, Beijing 100853, China. Electronic address:

Accurate segmentation of aortic components, such as lumen, calcification, and false lumen, and associated lesions, including aneurysm, stenosis, and dissection in CT angiography (CTA) scans is crucial for cardiovascular diagnosis and treatment planning. However, most existing automated methods generate binary masks with limited clinical utility and rely on separate computational pipelines for anatomical and lesion segmentation, resulting in higher resource demands. To address these limitations, we propose DB-SNet, a dual-branch 3D segmentation network based on the MedNeXt architecture.

View Article and Find Full Text PDF