MolEM: a unified generative framework for molecular graphs and sequential orders.

Brief Bioinform

College of Computer Science, Sichuan University, No.24 South Section 1, Yihuan Road, Chengdu 610065, China.

Published: March 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Structure-based drug design aims to generate molecules that fill the cavity of the protein pocket with a high binding affinity. Many contemporary studies employ sequential generative models. Their standard training method is to sequentialize molecular graphs into ordered sequences and then maximize the likelihood of the resulting sequences. However, the exact likelihood is computationally intractable, which involves a sum over all possible sequential orders. Molecular graphs lack an inherent order and the number of orders is factorial in the graph size. To avoid the intractable full space of factorially-many orders, existing works pre-define a fixed node ordering scheme such as depth-first search to sequentialize the 3D molecular graphs. In these cases, the training objectives are loose lower bounds of the exact likelihoods which are suboptimal for generation. To address the challenges, we propose a unified generative framework named MolEM to learn the 3D molecular graphs and corresponding sequential orders jointly. We derive a tight lower bound of the likelihood and maximize it via variational expectation-maximization algorithm, opening a new line of research in learning-based ordering schemes for 3D molecular graph generation. Besides, we first incorporate the molecular docking method QuickVina 2 to manipulate the binding poses, leading to accurate and flexible ligand conformations. Experimental results demonstrate that MolEM significantly outperforms baseline models in generating molecules with high binding affinities and realistic structures. Our approach efficiently approximates the true marginal graph likelihood and identifies reasonable orderings for 3D molecular graphs, aligning well with relevant chemical priors.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11957264PMC
http://dx.doi.org/10.1093/bib/bbaf094DOI Listing

Publication Analysis

Top Keywords

molecular graphs
24
sequential orders
12
unified generative
8
generative framework
8
molecular
8
high binding
8
sequentialize molecular
8
graphs
6
orders
5
molem unified
4

Similar Publications

Isoform-specific expression patterns have been linked to stress-related psychiatric disorders such as major depressive disorder (MDD). To further explore their involvement, we constructed co-expression networks using total gene expression (TE) and isoform ratio (IR) data from affected ( = 210, 81% with depressive symptoms) and unaffected ( = 95) individuals. Networks were validated using advanced graph generation methods.

View Article and Find Full Text PDF

Predicting Antibody-Antigen (Ab-Ag) docking and structure-based design represent significant long-term and therapeutically important challenges in computational biology. We present SAGERank, a general, configurable deep learning framework for antibody design using Graph Sample and Aggregate Networks. SAGERank successfully predicted the majority of epitopes in a cancer target dataset.

View Article and Find Full Text PDF

Drug-induced hepatotoxicity (DIH), characterized by diverse phenotypes and complex mechanisms, remains a critical challenge in drug discovery. To systematically decode this diversity and complexity, we propose a multi-dimensional computational framework integrating molecular structure analysis with disease pathogenesis exploration, focusing on drug-induced intrahepatic cholestasis (DIIC) as a representative DIH subtype. First, a graph-based modularity maximization algorithm identified DIIC risk genes, forming a DIIC module and eight disease pathogenesis clusters.

View Article and Find Full Text PDF

P-glycoprotein (P-gp) is a transmembrane protein widely involved in the absorption, distribution, metabolism, excretion, and toxicity (ADMET) of drugs within the human body. Accurate prediction of P-gp inhibitors and substrates is crucial for drug discovery and toxicological assessment. However, existing models rely on limited molecular information, leading to suboptimal model performance for predicting P-gp inhibitors and substrates.

View Article and Find Full Text PDF