Boosting AlphaFold Protein Tertiary Structure Prediction through MSA Engineering and Extensive Model Sampling and Ranking in CASP16.

bioRxiv

Department of Electrical Engineering & Computer Science, NextGen Precision Health, University of Missouri, Columbia, Missouri, 65211, United States of America.

Published: June 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

AlphaFold2 and AlphaFold3 have revolutionized protein structure prediction by enabling high-accuracy tertiary structure predictions for most single-chain proteins. However, obtaining high-quality predictions for some hard protein targets with shallow or noisy multiple sequence alignments (MSAs) and complicated multi-domain architectures remains challenging. Here, we present MULTICOM4, an integrative protein structure prediction system that uses diverse MSA generation, large-scale model sampling, and an ensemble model quality assessment (QA) strategy of combining individual QA methods to improve model generation and ranking of AlphaFold2 and AlphaFold3. In the 16th Critical Assessment of Techniques for Protein Structure Prediction (CASP16), our predictors built on MULTICOM4 ranked among the top performers out of 120 predictors in tertiary structure prediction and outperformed a standard AlphaFold3 predictor. The average TM-score of our best performing predictor MULTCOM's top-1 prediction for 84 CASP16 domain is 0.902. It achieved high accuracy (TM-score > 0.9) for 73.8% of the 84 domains and correct fold predictions (TM-score > 0.5) for 97.6% domains in terms of top-1 prediction. In terms of best-of-top-5 prediction, it predicted correct folds for all the domains. The results show that MSA engineering through the use of different protein sequence databases, alignment tools, and domain segmentation as well as extensive model sampling are the key to generate accurate and correct structural models. Additionally, using multiple complementary QA methods and model clustering can improve the robustness and reliability of model ranking.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12258999PMC
http://dx.doi.org/10.1101/2025.06.06.658338DOI Listing

Publication Analysis

Top Keywords

structure prediction
20
tertiary structure
12
model sampling
12
protein structure
12
prediction
8
msa engineering
8
extensive model
8
alphafold2 alphafold3
8
prediction casp16
8
top-1 prediction
8

Similar Publications

Systematic analyses uncover plasma proteins linked to incident cardiovascular diseases.

Protein Cell

August 2025

Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China.

Cardiovascular disease (CVD) research is hindered by limited comprehensive analyses of plasma proteome across disease subtypes. Here, we systematically investigated the associations between plasma proteins and cardiovascular outcomes in 53,026 UK Biobank participants over a 14-year follow-up. Association analyses identified 3,089 significant associations involving 892 unique protein analytes across 13 CVD outcomes.

View Article and Find Full Text PDF

Maximizing theoretical and practical storage capacity in single-layer feedforward neural networks.

Front Comput Neurosci

August 2025

Department of Biomedical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, United States.

Artificial neural networks are limited in the number of patterns that they can store and accurately recall, with capacity constraints arising from factors such as network size, architectural structure, pattern sparsity, and pattern dissimilarity. Exceeding these limits leads to recall errors, eventually leading to catastrophic forgetting, which is a major challenge in continual learning. In this study, we characterize the theoretical maximum memory capacity of single-layer feedforward networks as a function of these parameters.

View Article and Find Full Text PDF

In this contribution, Molecular Electron Density Theory (MEDT) is employed to investigate the (3 + 2) cycloaddition reaction between ()--methyl--(2-furyl)-nitrone 1 and but-2-ynedioic acid 2. DFT calculations at the M06-2X-D3/6-311+G(d,p) level of theory under solvent-free conditions at room temperature show that this reaction proceeds CA3-Z diastereoselectivity, with the formation of the CA3-Z cycloadduct being both thermodynamically and kinetically more favoured than the CA4-Z one. Reactivity parameters obtained from CDFT calculations reveal that compound 1 predominantly behaves as a nucleophile with moderate electrophilic features, in contrast to compound 2, which demonstrates strong electrophilicity and limited nucleophilic ability.

View Article and Find Full Text PDF

Structure and function of the topsoil microbiome in Chinese terrestrial ecosystems.

Front Microbiol

August 2025

State Key Laboratory of Ecological Safety and Sustainable Development in Arid Lands, Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Lanzhou, China.

While soil microorganisms underpin terrestrial ecosystem functioning, how their functional potential adapts across environmental gradients remains poorly understood, particularly for ubiquitous taxa. Employing a comprehensive metagenomic approach across China's six major terrestrial ecosystems (41 topsoil samples, 0-20 cm depth), we reveal a counterintuitive pattern: oligotrophic environments (deserts, karst) harbor microbiomes with significantly greater metabolic pathway diversity (KEGG) compared to resource-rich ecosystems. We provide a systematic catalog of key functional genes governing biogeochemical cycles in these soils, identifying: 6 core CAZyme genes essential for soil organic carbon (SOC) decomposition and biosynthesis; 62 nitrogen (N)-cycling genes (KOs) across seven critical enzymatic clusters; 15 sulfur (S)-cycling genes (KOs) within three key enzymatic clusters.

View Article and Find Full Text PDF

Gene mutation estimations via mutual information and Ewens sampling based CNN & machine learning algorithms.

J Appl Stat

February 2025

Department of Mathematics and State Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, People's Republic of China.

We conduct gene mutation rate estimations via developing mutual information and Ewens sampling based convolutional neural network (CNN) and machine learning algorithms. More precisely, we develop a systematic methodology through constructing a CNN. Meanwhile, we develop two machine learning algorithms to study protein production with target gene sequences and protein structures.

View Article and Find Full Text PDF