MoleculeNet: a benchmark for molecular machine learning.

Zhenqin Wu , Bharath Ramsundar , Evan N Feinberg , Joseph Gomes , Caleb Geniesse , Aneesh S Pappu , Karl Leswing , Vijay Pande

Chem Sci

Department of Chemistry , Stanford University, Stanford , CA 94305 , USA . Email:

Published: January 2018

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5868307	PMC
http://dx.doi.org/10.1039/c7sc02664a	DOI Listing

Publication Analysis

Top Keywords

machine learning

molecular machine

benchmark molecular

learning algorithms

proposed methods

learnable representations

learning

molecular

machine

moleculenet

Similar Publications

A multi-omics recovery factor predicts long COVID in the IMPACC study.

J Clin Invest

September 2025

The University of Texas at Austin, Austin, United States of America.

Gisela Gabernet , Jessica Maciuch , Jeremy P Gygi , John F Moore , Annmarie Hoch

Background: Following SARS-CoV-2 infection, ~10-35% of COVID-19 patients experience long COVID (LC), in which debilitating symptoms persist for at least three months. Elucidating biologic underpinnings of LC could identify therapeutic opportunities.

Methods: We utilized machine learning methods on biologic analytes provided over 12-months after hospital discharge from >500 COVID-19 patients in the IMPACC cohort to identify a multi-omics "recovery factor", trained on patient-reported physical function survey scores.

View Article and Find Full Text PDF

Similar Publications

Deep reinforcement learning control unlocks enhanced heat transfer in turbulent convection.

Proc Natl Acad Sci U S A

September 2025

Max Planck Institute for Solar System Research, Göttingen 37077, Germany.

Zisong Zhou , Xiaojue Zhu

Turbulent convection governs heat transport in both natural and industrial settings, yet optimizing it under extreme conditions remains a significant challenge. Traditional control strategies, such as predefined temperature modulation, struggle to achieve substantial enhancement. Here, we introduce a deep reinforcement learning (DRL) framework that autonomously discovers optimal control policies to maximize heat transfer in turbulent Rayleigh-Bénard convection.

View Article and Find Full Text PDF

Similar Publications

AI-enhanced predictive modeling for treatment duration and personalized treatment planning of cleft lip and palate therapy.

Int J Comput Assist Radiol Surg

September 2025

Division of Plastic and Reconstructive Surgery, Neonatal and Pediatric Craniofacial Airway Orthodontics, Department of Surgery, Stanford University School of Medicine, 770 Welch Road, Palo Alto, CA, 94394, USA.

Artur Aharonyan , Syed Anwar , HyeRan Choo

Background: Alveolar molding plate treatment (AMPT) plays a critical role in preparing neonates with cleft lip and palate (CLP) for the first reconstruction surgery (cleft lip repair). However, determining the number of adjustments to AMPT in near-normalizing cleft deformity prior to surgery is a challenging task, often affecting the treatment duration. This study explores the use of machine learning in predicting treatment duration based on three-dimensional (3D) assessments of the pre-treatment maxillary cleft deformity as part of individualized treatment planning.

View Article and Find Full Text PDF

Similar Publications

Response to Cao et al.: Expanding machine learning-based non-invasive non-selective beta blockers monitoring.

Hepatol Int

September 2025

Department of Biomedical Informatics and Data Science, Yale School of Medicine, PO Box 208009, New Haven, CT, 06520-8009, USA.

Mauro Giuffrè

View Article and Find Full Text PDF

Similar Publications

Machine learning for myocarditis diagnosis using cardiovascular magnetic resonance: a systematic review, diagnostic test accuracy meta-analysis, and comparison with human physicians.

Int J Cardiovasc Imaging

September 2025

Klinikum Fürth, Friedrich-Alexander-University Erlangen- Nürnberg, Fürth, Germany.

Paweł Łajczak , Oguz Kagan Sahin , Jakub Matyja , Luis Rene Puglla Sanchez , Iqbal Farhan Sayudo

Myocarditis is an inflammation of heart tissue. Cardiovascular magnetic resonance imaging (CMR) has emerged as an important non-invasive imaging tool for diagnosing myocarditis, however, interpretation remains a challenge for novice physicians. Advancements in machine learning (ML) models have further improved diagnostic accuracy, demonstrating good performance.

View Article and Find Full Text PDF

Similar Publications