Boosting AlphaFold Protein Tertiary Structure Prediction through MSA Engineering and Extensive Model Sampling and Ranking in CASP16.

Jian Liu , Pawan Neupane , Jianlin Cheng

bioRxiv

Department of Electrical Engineering & Computer Science, NextGen Precision Health, University of Missouri, Columbia, Missouri, 65211, United States of America.

Published: June 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

AlphaFold2 and AlphaFold3 have revolutionized protein structure prediction by enabling high-accuracy tertiary structure predictions for most single-chain proteins. However, obtaining high-quality predictions for some hard protein targets with shallow or noisy multiple sequence alignments (MSAs) and complicated multi-domain architectures remains challenging. Here, we present MULTICOM4, an integrative protein structure prediction system that uses diverse MSA generation, large-scale model sampling, and an ensemble model quality assessment (QA) strategy of combining individual QA methods to improve model generation and ranking of AlphaFold2 and AlphaFold3. In the 16th Critical Assessment of Techniques for Protein Structure Prediction (CASP16), our predictors built on MULTICOM4 ranked among the top performers out of 120 predictors in tertiary structure prediction and outperformed a standard AlphaFold3 predictor. The average TM-score of our best performing predictor MULTCOM's top-1 prediction for 84 CASP16 domain is 0.902. It achieved high accuracy (TM-score > 0.9) for 73.8% of the 84 domains and correct fold predictions (TM-score > 0.5) for 97.6% domains in terms of top-1 prediction. In terms of best-of-top-5 prediction, it predicted correct folds for all the domains. The results show that MSA engineering through the use of different protein sequence databases, alignment tools, and domain segmentation as well as extensive model sampling are the key to generate accurate and correct structural models. Additionally, using multiple complementary QA methods and model clustering can improve the robustness and reliability of model ranking.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12258999	PMC
http://dx.doi.org/10.1101/2025.06.06.658338	DOI Listing

Publication Analysis

Top Keywords

structure prediction

tertiary structure

model sampling

protein structure

prediction

msa engineering

extensive model

alphafold2 alphafold3

prediction casp16

top-1 prediction

Similar Publications

Systematic analyses uncover plasma proteins linked to incident cardiovascular diseases.

Protein Cell

August 2025

Department of Neurology and National Center for Neurological Disorders, Huashan Hospital, State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China.

Yi-Lin Chen , Ji-Jing Wang , Jia You , Ji-Yun Cheng , Ze-Yu Li

Cardiovascular disease (CVD) research is hindered by limited comprehensive analyses of plasma proteome across disease subtypes. Here, we systematically investigated the associations between plasma proteins and cardiovascular outcomes in 53,026 UK Biobank participants over a 14-year follow-up. Association analyses identified 3,089 significant associations involving 892 unique protein analytes across 13 CVD outcomes.

View Article and Find Full Text PDF

Similar Publications

Maximizing theoretical and practical storage capacity in single-layer feedforward neural networks.

Front Comput Neurosci

August 2025

Department of Biomedical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, United States.

Zane Z Chou , Jean-Marie C Bouteiller

Artificial neural networks are limited in the number of patterns that they can store and accurately recall, with capacity constraints arising from factors such as network size, architectural structure, pattern sparsity, and pattern dissimilarity. Exceeding these limits leads to recall errors, eventually leading to catastrophic forgetting, which is a major challenge in continual learning. In this study, we characterize the theoretical maximum memory capacity of single-layer feedforward networks as a function of these parameters.

View Article and Find Full Text PDF

Similar Publications

MEDT insights into the mechanism and selectivity of the (3 + 2) cycloaddition of ()--methyl--(2-furyl)-nitrone with but-2-ynedioic acid and the bioactivity of the reaction products.

RSC Adv

September 2025

Process and Environmental Engineering Laboratory (LIPE), Faculty of Chemistry, University of Science and Technology of Oran Mohamed Boudiaf P. O. Box 1503, El Mnaouer 31000 Oran Algeria.

Mohamed Chellegui , Lakhdar Benhamed , Raad Nasrullah Salih , Ines Salhi , Sofiane Benmetir

In this contribution, Molecular Electron Density Theory (MEDT) is employed to investigate the (3 + 2) cycloaddition reaction between ()--methyl--(2-furyl)-nitrone 1 and but-2-ynedioic acid 2. DFT calculations at the M06-2X-D3/6-311+G(d,p) level of theory under solvent-free conditions at room temperature show that this reaction proceeds CA3-Z diastereoselectivity, with the formation of the CA3-Z cycloadduct being both thermodynamically and kinetically more favoured than the CA4-Z one. Reactivity parameters obtained from CDFT calculations reveal that compound 1 predominantly behaves as a nucleophile with moderate electrophilic features, in contrast to compound 2, which demonstrates strong electrophilicity and limited nucleophilic ability.

View Article and Find Full Text PDF

Similar Publications

Structure and function of the topsoil microbiome in Chinese terrestrial ecosystems.

Front Microbiol

August 2025

State Key Laboratory of Ecological Safety and Sustainable Development in Arid Lands, Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Lanzhou, China.

Yuqiang Li , Yulong Duan , Junbiao Zhang , Evangelos Petropoulos , Jianhua Zhao

While soil microorganisms underpin terrestrial ecosystem functioning, how their functional potential adapts across environmental gradients remains poorly understood, particularly for ubiquitous taxa. Employing a comprehensive metagenomic approach across China's six major terrestrial ecosystems (41 topsoil samples, 0-20 cm depth), we reveal a counterintuitive pattern: oligotrophic environments (deserts, karst) harbor microbiomes with significantly greater metabolic pathway diversity (KEGG) compared to resource-rich ecosystems. We provide a systematic catalog of key functional genes governing biogeochemical cycles in these soils, identifying: 6 core CAZyme genes essential for soil organic carbon (SOC) decomposition and biosynthesis; 62 nitrogen (N)-cycling genes (KOs) across seven critical enzymatic clusters; 15 sulfur (S)-cycling genes (KOs) within three key enzymatic clusters.

View Article and Find Full Text PDF

Similar Publications

Gene mutation estimations via mutual information and Ewens sampling based CNN & machine learning algorithms.

J Appl Stat

February 2025

Department of Mathematics and State Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, People's Republic of China.

Wanyang Dai

We conduct gene mutation rate estimations via developing mutual information and Ewens sampling based convolutional neural network (CNN) and machine learning algorithms. More precisely, we develop a systematic methodology through constructing a CNN. Meanwhile, we develop two machine learning algorithms to study protein production with target gene sequences and protein structures.

View Article and Find Full Text PDF

Similar Publications