To understand the benefits and drawbacks of 3D combinatorial and deep learning generators, a novel benchmark was created focusing on the recreation of important protein-ligand interactions and 3D ligand conformations. Using the BindingMOAD data set with a hold-out blind set, the sequential graph neural network generators, Pocket2Mol and PocketFlow, diffusion models, DiffSBDD and MolSnapper, and combinatorial genetic algorithms, AutoGrow4 and LigBuilderV3, were evaluated. It was discovered that deep learning methods fail to generate structurally valid molecules and 3D conformations, whereas combinatorial methods are slow and generate molecules that are prone to failing 2D MOSES filters.
View Article and Find Full Text PDFPredicting reaction yields in synthetic chemistry remains a significant challenge. This study systematically evaluates the impact of tokenization, molecular representation, pretraining data, and adversarial training on a BERT-based model for yield prediction of Buchwald-Hartwig and Suzuki-Miyaura coupling reactions using publicly available HTE data sets. We demonstrate that molecular representation choice (SMILES, DeepSMILES, SELFIES, Morgan fingerprint-based notation, IUPAC names) has minimal impact on model performance, while typically BPE and SentencePiece tokenization outperform other methods.
View Article and Find Full Text PDFThe typical way in which lead optimisation (LO) series are represented in the medicinal chemistry literature is as Markush structures and associated R-group tables. The Markush structure shows a central core or molecular scaffold that is common to the series with R groups that indicate the points of variability that have been explored in the series. The associated R-group table shows the substituent combinations that exist in individual molecules in the series together with properties of those compounds.
View Article and Find Full Text PDFThe design of compounds during hit-to-lead often seeks to explore a vector from a core scaffold to form additional interactions with the target protein. A rational approach to this is to probe the region of a protein accessed by a vector with a systematic placement of pharmacophore features in 3D, particularly when bound structures are not available. Herein, we present bbSelect, an open-source tool built to map the placements of pharmacophore features in 3D Euclidean space from a library of R-groups, employing partitioning to drive a diverse and systematic selection to a user-defined size.
View Article and Find Full Text PDFJ Chem Inf Model
February 2023
Accurate methods to predict solubility from molecular structure are highly sought after in the chemical sciences. To assess the state of the art, the American Chemical Society organized a "Second Solubility Challenge" in 2019, in which competitors were invited to submit blinded predictions of the solubilities of 132 drug-like molecules. In the first part of this article, we describe the development of two models that were submitted to the Blind Challenge in 2019 but which have not previously been reported.
View Article and Find Full Text PDFJ Chem Inf Model
March 2022
Accurate and rapid predictions of the binding affinity of a compound to a target are one of the ultimate goals of computer aided drug design. Alchemical approaches to free energy estimations follow the path from an initial state of the system to the final state through alchemical changes of the energy function during a molecular dynamics simulation. Herein, we explore the accuracy and efficiency of two such techniques: relative free energy perturbation (FEP) and multisite lambda dynamics (MSλD).
View Article and Find Full Text PDFOrg Biomol Chem
June 2021
The bromodomain-containing protein 4 (BRD4), a member of the bromodomain and extra-terminal domain (BET) family, plays a key role in several diseases, especially cancers. With increased interest in BRD4 as a therapeutic target, many X-ray crystal structures of the protein in complex with small molecule inhibitors are publicly available over the recent decade. In this study, we use this structural information to investigate the conformations of the first bromodomain (BD1) of BRD4.
View Article and Find Full Text PDFMachine learning approaches promise to accelerate and improve success rates in medicinal chemistry programs by more effectively leveraging available data to guide a molecular design. A key step of an automated computational design algorithm is molecule generation, where the machine is required to design high-quality, drug-like molecules within the appropriate chemical space. Many algorithms have been proposed for molecular generation; however, a challenge is how to assess the validity of the resulting molecules.
View Article and Find Full Text PDFDeep learning approaches have become popular in recent years in the field of molecular design. While a variety of different methods are available, it is still a challenge to assess and compare their performance. A particularly promising approach for automated drug design is to use recurrent neural networks (RNNs) as SMILES generators and train them with the learning procedure called "transfer learning".
View Article and Find Full Text PDFA key component of automated molecular design is the generation of compound ideas for subsequent filtering and assessment. Recently deep learning approaches have been explored as alternatives to traditional de novo molecular design techniques. Deep learning algorithms rely on learning from large pools of molecules represented as molecular graphs (generally SMILES), and several approaches can be used to tailor the generated molecules to defined regions of chemical space.
View Article and Find Full Text PDFHigh-throughput screening (HTS) hits include compounds with undesirable properties. Many filters have been described to identify such hits. Notably, pan-assay interference compounds (PAINS) has been adopted by the community as the standard term to refer to such filters, and very useful guidelines have been adopted by the American Chemical Society (ACS) and subsequently triggered a healthy scientific debate about the pitfalls of draconian use of filters.
View Article and Find Full Text PDFFragment-based drug discovery (FBDD) is well suited for discovering both drug leads and chemical probes of protein function; it can cover broad swaths of chemical space and allows the use of creative chemistry. FBDD is widely implemented for lead discovery in industry but is sometimes used less systematically in academia. Design principles and implementation approaches for fragment libraries are continually evolving, and the lack of up-to-date guidance may prevent more effective application of FBDD in academia.
View Article and Find Full Text PDFInhibitors of mitochondrial branched chain aminotransferase (BCATm), identified using fragment screening, are described. This was carried out using a combination of STD-NMR, thermal melt (Tm), and biochemical assays to identify compounds that bound to BCATm, which were subsequently progressed to X-ray crystallography, where a number of exemplars showed significant diversity in their binding modes. The hits identified were supplemented by searching and screening of additional analogues, which enabled the gathering of further X-ray data where the original hits had not produced liganded structures.
View Article and Find Full Text PDFNat Rev Drug Discov
July 2015
The pharmaceutical industry remains under huge pressure to address the high attrition rates in drug development. Attempts to reduce the number of efficacy- and safety-related failures by analysing possible links to the physicochemical properties of small-molecule drug candidates have been inconclusive because of the limited size of data sets from individual companies. Here, we describe the compilation and analysis of combined data on the attrition of drug candidates from AstraZeneca, Eli Lilly and Company, GlaxoSmithKline and Pfizer.
View Article and Find Full Text PDFThe hybridization of hits, identified by complementary fragment and high throughput screens, enabled the discovery of the first series of potent inhibitors of mitochondrial branched-chain aminotransferase (BCATm) based on a 2-benzylamino-pyrazolo[1,5-a]pyrimidinone-3-carbonitrile template. Structure-guided growth enabled rapid optimization of potency with maintenance of ligand efficiency, while the focus on physicochemical properties delivered compounds with excellent pharmacokinetic exposure that enabled a proof of concept experiment in mice. Oral administration of 2-((4-chloro-2,6-difluorobenzyl)amino)-7-oxo-5-propyl-4,7-dihydropyrazolo[1,5-a]pyrimidine-3-carbonitrile 61 significantly raised the circulating levels of the branched-chain amino acids leucine, isoleucine, and valine in this acute study.
View Article and Find Full Text PDFJ Comput Aided Mol Des
April 2013
We describe the QSAR Workbench, a system for the building and analysis of QSAR models. The system is built around the Pipeline Pilot workflow tool and provides access to a variety of model building algorithms for both continuous and categorical data. Traditionally models are built on a one by one basis and fully exploring the model space of algorithms and descriptor subsets is a time consuming basis.
View Article and Find Full Text PDFACS Med Chem Lett
January 2011
Traditional lead optimization projects involve long synthesis and testing cycles, favoring extensive structure-activity relationship (SAR) analysis and molecular design steps, in an attempt to limit the number of cycles that a project must run to optimize a development candidate. Microfluidic-based chemistry and biology platforms, with cycle times of minutes rather than weeks, lend themselves to unattended autonomous operation. The bottleneck in the lead optimization process is therefore shifted from synthesis or test to SAR analysis and design.
View Article and Find Full Text PDFThe impact of carboaromatic, heteroaromatic, carboaliphatic and heteroaliphatic ring counts and fused aromatic ring count on several developability measures (solubility, lipophilicity, protein binding, P450 inhibition and hERG binding) is the topic for this review article. Recent results indicate that increasing ring counts have detrimental effects on developability in the order carboaromatics≫heteroaromatics>carboaliphatics>heteroaliphatics, with heteroaliphatics exerting a beneficial effect in many cases. Increasing aromatic ring count exerts effects on several developability parameters that are lipophilicity- and size-independent, and fused aromatic systems have a beneficial effect relative to their nonfused counterparts.
View Article and Find Full Text PDFJ Chem Inf Model
October 2010
Previous studies of the analysis of molecular matched pairs (MMPs) have often assumed that the effect of a substructural transformation on a molecular property is independent of the context (i.e., the local structural environment in which that transformation occurs).
View Article and Find Full Text PDFJ Chem Inf Model
February 2009
Neighborhood behavior describes the extent to which small structural changes defined by a molecular descriptor are likely to lead to small property changes. This study evaluates two methods for the quantification of neighborhood behavior: the optimal diagonal method of Patterson et al. and the optimality criterion method of Horvath and Jeandenans.
View Article and Find Full Text PDFA multiobjective evolutionary algorithm (MOEA) is described for evolving multiple structure-activity relationships (SARs). The SARs are encoded in easy-to-interpret reduced graph queries which describe features that are preferentially present in active compounds compared to inactives. The MOEA addresses a limitation associated with many machine learning methods; that is, the inherent tradeoff that exists in recall and precision which is usually handled by combining the two objectives into a single measure with a consequent loss of control.
View Article and Find Full Text PDFJ Chem Inf Model
August 2008
A new machine learning method is presented for extracting interpretable structure-activity relationships from screening data. The method is based on an evolutionary algorithm and reduced graphs and aims to evolve a reduced graph query (subgraph) that is present within the active compounds and absent from the inactives. The reduced graph representation enables heterogeneous compounds, such as those found in high-throughput screening data, to be captured in a single representation with the resulting query encoding structure-activity information in a form that is readily interpretable by a chemist.
View Article and Find Full Text PDFWe present a comparative assessment of several state-of-the-art machine learning tools for mining drug data, including support vector machines (SVMs) and the ensemble decision tree methods boosting, bagging, and random forest, using eight data sets and two sets of descriptors. We demonstrate, by rigorous multiple comparison statistical tests, that these techniques can provide consistent improvements in predictive performance over single decision trees. However, within these methods, there is no clearly best-performing algorithm.
View Article and Find Full Text PDFData mining is a fast-growing field that is finding application across a wide range of industries. HTS is a crucial part of the drug discovery process at most large pharmaceutical companies. Accurate analysis of HTS data is, therefore, vital to drug discovery.
View Article and Find Full Text PDFJ Chem Inf Model
September 2006
Reduced graph representations of chemical structures have been shown to be effective in similarity searching applications where they offer comparable performance to other 2D descriptors in terms of recall experiments. They have also been shown to complement existing descriptors and to offer potential to scaffold hop from one chemical series to another. Various methods have been developed for quantifying the similarity between reduced graphs including fingerprint approaches, graph matching, and an edit distance method.
View Article and Find Full Text PDF