Integrating Reinforcement Learning and Monte Carlo Tree Search for enhanced neoantigen vaccine design.

Yicheng Lin , Jiakang Ma , Haozhe Yuan , Ziqiang Chen , Xingyu Xu , Mengping Jiang , Jialiang Zhu , Weida Meng , Wenqing Qiu , Yun Liu

Brief Bioinform

MOE Key Laboratory of Metabolism and Molecular Medicine, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences and Shanghai Xuhui Central Hospital, Fudan University, 131 DongAn Road, Shanghai, 200032, China.

Published: March 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Recent advances in cancer immunotherapy have highlighted the potential of neoantigen-based vaccines. However, the design of such vaccines is hindered by the possibility of weak binding affinity between the peptides and the patient's specific human leukocyte antigen (HLA) alleles, which may not elicit a robust adaptive immune response. Triggering cross-immunity by utilizing peptide mutations that have enhanced binding affinity to target HLA molecules, while preserving their homology with the original one, can be a promising avenue for neoantigen vaccine design. In this study, we introduced UltraMutate, a novel algorithm that combines Reinforcement Learning and Monte Carlo Tree Search, which identifies peptide mutations that not only exhibit enhanced binding affinities to target HLA molecules but also retains a high degree of homology with the original neoantigen. UltraMutate outperformed existing state-of-the-art methods in identifying affinity-enhancing mutations in an independent test set consisting of 3660 peptide-HLA pairs. UltraMutate further showed its applicability in the design of peptide vaccines for Human Papillomavirus and Human Cytomegalovirus, demonstrating its potential as a promising tool in the advancement of personalized immunotherapy.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11107383	PMC
http://dx.doi.org/10.1093/bib/bbae247	DOI Listing

Publication Analysis

Top Keywords

reinforcement learning

learning monte

monte carlo

carlo tree

tree search

neoantigen vaccine

vaccine design

binding affinity

peptide mutations

enhanced binding

Similar Publications

Perceived reward certainty in the assessment of delay discounting.

J Exp Anal Behav

September 2025

Fralin Biomedical Research Institute at VTC, Roanoke, VA, United States of America.

Haylee Downey , Alicia Alvarez , Wenyan Ji , Alicia Lozano , Alexandra Hanlon

Reward delays are often associated with reduced probability of reward, although standard assessments of delay discounting do not specify degree of reward certainty. Thus, the extent to which estimates of delay discounting are influenced by uncontrolled variance in perceived reward certainty remains unclear. Here we examine 370 participants who were randomly assigned to complete a delay discounting task when reward certainty was either unspecified (n=184) or specified as 100% (n = 186) in the task trials and task instructions.

View Article and Find Full Text PDF

Similar Publications

Smart load balancing in cloud computing: Integrating feature selection with advanced deep learning models.

PLoS One

September 2025

College of Business Administration, Northern Border University (NBU), Arar, Kingdom of Saudi Arabia.

Yousef Sanjalawe , Salam Fraihat , Salam Al-E'mari , Mosleh Abualhaj , Sharif Makhadmeh

The increasing dependence on cloud computing as a cornerstone of modern technological infrastructures has introduced significant challenges in resource management. Traditional load-balancing techniques often prove inadequate in addressing cloud environments' dynamic and complex nature, resulting in suboptimal resource utilization and heightened operational costs. This paper presents a novel smart load-balancing strategy incorporating advanced techniques to mitigate these limitations.

View Article and Find Full Text PDF

Similar Publications

Rule following as choice: The role of reinforcement rate and rule accuracy on rule-following behavior.

J Exp Anal Behav

September 2025

Laboratorio de Análisis de la Conducta, Universidad Nacional Autónoma de México. Facultad de Estudios Superiores Iztacala.

David Ruiz , Adam Fox , Raúl Narayanam Rodriguez

Rules can control the listener's behavior, yet few studies have examined variables that quantitatively determine the extent of this control relative to other rules and contingencies. To explore these variables, we employed a novel procedure that required a choice between rules. Participants clicked two buttons on a computer screen to earn points exchangeable for money.

View Article and Find Full Text PDF

Similar Publications

Multiagent Inductive Policy Optimization.

IEEE Trans Neural Netw Learn Syst

September 2025

Yubo Huang , Xiaowei Zhao

Policy optimization methods are promising to tackle high-complexity reinforcement learning (RL) tasks with multiple agents. In this article, we derive a general trust region for policy optimization methods by considering the effect of subpolicy combinations among agents in multiagent environments. Based on this trust region, we propose an inductive objective to train the policy function, which can ensure agents learn monotonically improving policies.

View Article and Find Full Text PDF

Similar Publications

Bicriteria Policy Optimization for High-Accuracy Reinforcement Learning.

IEEE Trans Neural Netw Learn Syst

September 2025

Guojian Zhan , Xiangteng Zhang , Feihong Zhang , Letian Tao , Shengbo Eben Li

In essence, reinforcement learning (RL) solves optimal control problem (OCP) by employing a neural network (NN) to fit the optimal policy from state to action. The accuracy of policy approximation is often very low in complex control tasks, leading to unsatisfactory control performance compared with online optimal controllers. A primary reason is that the landscape of value function is always not only rugged in most areas but also flat on the bottom, which damages the convergence to the minimum point.

View Article and Find Full Text PDF

Similar Publications