Benchmarking Cross-Docking Strategies in Kinase Drug Discovery.

J Chem Inf Model

In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Augustenburger Platz 1, 13353 Berlin, Germany.

Published: December 2024


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

In recent years, machine learning has transformed many aspects of the drug discovery process, including small molecule design, for which the prediction of bioactivity is an integral part. Leveraging structural information about the interactions between a small molecule and its protein target has great potential for downstream machine learning scoring approaches but is fundamentally limited by the accuracy with which protein-ligand complex structures can be predicted in a reliable and automated fashion. With the goal of finding practical approaches to generating useful kinase-inhibitor complex geometries for downstream machine learning scoring approaches, we present a kinase-centric docking benchmark assessing the performance of different classes of docking and pose selection strategies to assess how well experimentally observed binding modes are recapitulated in a realistic cross-docking scenario. The assembled benchmark data set focuses on the well-studied protein kinase family and comprises a subset of 589 protein structures cocrystallized with 423 ATP-competitive ligands. We find that the docking methods biased by the cocrystallized ligand, utilizing shape overlap with or without maximum common substructure matching, are more successful in recovering binding poses than standard physics-based docking alone. Also, docking into multiple structures significantly increases the chance of generating a low root-mean-square deviation (RMSD) docking pose. Docking utilizing an approach that combines all three methods (Posit) into structures with the most similar cocrystallized ligands according to the maximum common substructure (MCS) proved to be the most efficient way to reproduce binding poses, achieving a success rate of 70.4% across all included systems. The studied docking and pose selection strategies, which utilize the OpenEye Toolkits, were implemented into pipelines of the KinoML framework, allowing automated and reliable protein-ligand complex generation for future downstream machine learning tasks. Although focused on protein kinases, we believe that the general findings can also be transferred to other protein families.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11661510PMC
http://dx.doi.org/10.1021/acs.jcim.4c00905DOI Listing

Publication Analysis

Top Keywords

machine learning
16
downstream machine
12
docking pose
12
drug discovery
8
small molecule
8
learning scoring
8
scoring approaches
8
protein-ligand complex
8
docking
8
pose selection
8

Similar Publications

Background: A clear understanding of minimal clinically important difference (MCID) and substantial clinical benefit (SCB) is essential for effectively implementing patient-reported outcome measurements (PROMs) as a performance measure for total knee arthroplasty (TKA). Since not achieving MCID and SCB may reflect suboptimal surgical benefit, the primary aim of this study was to use machine learning to predict patients who may not achieve the threshold-based outcomes (i.e.

View Article and Find Full Text PDF

Arthroplasty surgery is a common and successful end-stage intervention for advanced osteoarthritis. Yet, postoperative outcomes vary significantly among patients, leading to a plethora of measures and associated measurement approaches to monitor patient outcomes. Traditional approaches rely heavily on patient-reported outcome measures (PROMs), which are widely used, but often lack sensitivity to detect function changes (e.

View Article and Find Full Text PDF

Automatic markerless estimation of infant posture and motion from ordinary videos carries great potential for movement studies "in the wild", facilitating understanding of motor development and massively increasing the chances of early diagnosis of disorders. There has been a rapid development of human pose estimation methods in computer vision, thanks to advances in deep learning and machine learning. However, these methods are trained on datasets that feature adults in different contexts.

View Article and Find Full Text PDF

This study aims to investigate the predictive value of combined phenotypic age and phenotypic age acceleration (PhenoAgeAccel) for benign prostatic hyperplasia (BPH) and develop a machine learning-based risk prediction model to inform precision prevention and clinical management strategies. The study analyzed data from 784 male participants in the US National Health and Nutrition Examination Survey (NHANES, 2001-2008). Phenotypic age was derived from chronological age and nine serum biomarkers.

View Article and Find Full Text PDF

Bariatric surgery is an effective treatment for morbid obesity, but patient outcomes differ greatly because of a variety of phenotypes, comorbidities, and postoperative adherence. In bariatric care, artificial intelligence (AI) and machine learning (ML) are becoming revolutionary tools because traditional predictive models based on BMI and demographic variables are unable to account for these complexities. To put it simply, AI is a branch of computer science that enables machines to perform tasks that typically require human intelligence.

View Article and Find Full Text PDF