98%
921
2 minutes
20
Background: Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here.
Results: All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios.
Conclusions: The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4496851 | PMC |
http://dx.doi.org/10.1186/s12859-015-0625-x | DOI Listing |
Can J Ophthalmol
September 2025
Department of Ophthalmology and Visual Sciences, University of Alberta, Royal Alexandra Hospital, Edmonton, AB, Canada.. Electronic address:
Objective: To assess the effectiveness of intracameral lidocaine during routine cataract surgery.
Design: Prospective, single-blinded, randomized, controlled trial.
Participants: Adult patients undergoing routine phacoemulsification cataract extraction in a chartered surgical facility in Edmonton, Alberta, Canada, were enrolled in the study.
Surg Endosc
September 2025
Department of Gastroenterology and Hepatology, Tianjin Medical University General Hospital, Anshan Road No.154, Tianjin, 300052, China.
Introduction: Colorectal cancer (CRC) ranks as the second deadliest cancer globally, impacting patients' quality of life. Colonoscopy is the primary screening method for detecting adenomas and polyps, crucial for reducing long-term CRC risk, but it misses about 30% of cases. Efforts to improve detection rates include using AI to enhance colonoscopy.
View Article and Find Full Text PDFClin Microbiol Infect
August 2025
Department of Infectious Diseases, Infection Control and Employee Health, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
Objectives: Epstein-Barr virus (EBV) reactivation following allogeneic hematopoietic cell transplantation (allo-HCT) is associated with increased mortality and possible post-transplant lymphoproliferative disorder (PTLD). With the lack of prophylactic agents, identifying modifiable risk factors to prevent EBV-related mortality is desired. Cytomegalovirus (CMV) DNAemia has been previously associated with EBV DNAemia; the impact of letermovir prophylaxis on this association remains unclear.
View Article and Find Full Text PDFIEEE J Biomed Health Inform
August 2025
This study introduces PicoSleepNet, an ultra-lightweight sleep stage classification method that utilizes a spiking neural network (SNN) with single-channel electroencephalogram (EEG) signals. Traditional methods use multi-bit Nyquist sampling and dense computing, which result in high complexity and power consumption, hindering their deployment on wearable devices. To address these limitations, we propose an innovative pipeline combining single-bit sub-Nyquist level-crossing sampling (LCS) and sparse computing based on SNN.
View Article and Find Full Text PDFComput Med Imaging Graph
September 2025
School of Medical Technology, Beijing Institute of Technology, No.5 Zhongguancun South Street, Haidian District, Beijing 100081, China; Department of Radiology, Chinese PLA General Hospital, No. 28 Fuxing Road, Haidian District, Beijing 100853, China. Electronic address:
Accurate segmentation of aortic components, such as lumen, calcification, and false lumen, and associated lesions, including aneurysm, stenosis, and dissection in CT angiography (CTA) scans is crucial for cardiovascular diagnosis and treatment planning. However, most existing automated methods generate binary masks with limited clinical utility and rely on separate computational pipelines for anatomical and lesion segmentation, resulting in higher resource demands. To address these limitations, we propose DB-SNet, a dual-branch 3D segmentation network based on the MedNeXt architecture.
View Article and Find Full Text PDF