Article Synopsis

  • The process of conducting chemical reactions is complex and relies heavily on years of lab experience or existing protocols.
  • Data-driven approaches like retrosynthetic models are useful but still require expert intervention to translate proposed methods into actual procedures.
  • This study introduces models that predict synthesis steps from chemical equations, utilizing a dataset of over 690,000 equations to achieve over 50% accuracy in producing executable procedures without human input.

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

The experimental execution of chemical reactions is a context-dependent and time-consuming process, often solved using the experience collected over multiple decades of laboratory work or searching similar, already executed, experimental protocols. Although data-driven schemes, such as retrosynthetic models, are becoming established technologies in synthetic organic chemistry, the conversion of proposed synthetic routes to experimental procedures remains a burden on the shoulder of domain experts. In this work, we present data-driven models for predicting the entire sequence of synthesis steps starting from a textual representation of a chemical equation, for application in batch organic chemistry. We generated a data set of 693,517 chemical equations and associated action sequences by extracting and processing experimental procedure text from patents, using state-of-the-art natural language models. We used the attained data set to train three different models: a nearest-neighbor model based on recently-introduced reaction fingerprints, and two deep-learning sequence-to-sequence models based on the Transformer and BART architectures. An analysis by a trained chemist revealed that the predicted action sequences are adequate for execution without human intervention in more than 50% of the cases.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8102565PMC
http://dx.doi.org/10.1038/s41467-021-22951-1DOI Listing

Publication Analysis

Top Keywords

experimental procedures
8
chemical reactions
8
organic chemistry
8
data set
8
action sequences
8
models
5
inferring experimental
4
procedures text-based
4
text-based representations
4
chemical
4

Similar Publications

Background: High-density lipoprotein (HDL) function, rather than its concentration, plays a crucial role in the development of coronary artery disease (CAD). Diminished HDL antioxidant properties, indicated by elevated oxidized HDL (nHDL) and diminished paraoxonase-1 (PON-1) activity, may contribute to vascular dysfunction and inflammation. Data on these associations in CAD patients, including acute coronary syndrome (ACS), remain limited.

View Article and Find Full Text PDF

Integrative profiling of lung cancer biomarkers EGFR, ALK, KRAS, and PD-1 with emphasis on nanomaterials-assisted immunomodulation and targeted therapy.

Front Immunol

September 2025

Department of Thoracic Surgery, Shenzhen People's Hospital (The First Affiliated Hospital, Southern University of Science and Technology; The Second Clinical Medical College, Jinan University), Shenzhen, Guangdong, China.

Background: Lung cancer remains the leading cause of cancer-related mortality globally, primarily due to late-stage diagnosis, molecular heterogeneity, and therapy resistance. Key biomarkers such as EGFR, ALK, KRAS, and PD-1 have revolutionized precision oncology; however, comprehensive structural and clinical validation of these targets is crucial to enhance therapeutic efficacy.

Methods: Protein sequences for EGFR, ALK, KRAS, and PD-1 were retrieved from UniProt and modeled using SWISS-MODEL to generate high-confidence 3D structures.

View Article and Find Full Text PDF

Background: Sleep and frailty are established influencing factors for cardiometabolic diseases (CMDs). However, their joint effects on cardiometabolic multimorbidity (CMM) in older adults remain poorly understood. This study aimed to assess the joint effect of sleep health and frailty on CMD prevalence and severity, with an emphasis on subgroup-specific health risk profiles.

View Article and Find Full Text PDF

Influenza viruses can be aerosolized when slaughtering infected chickens, which increases the risk of zoonotic transmission. We conducted pilot experiments to measure the concentrations of airborne particles <2.5 μm during slaughtering and defeathering of chickens to help identify methods that can minimize workers' exposure to potentially hazardous aerosol particles.

View Article and Find Full Text PDF

Background: The advent of neuroleptics and antidepressant therapy marked a significant step forward in clinical psychiatry. Numerous experiments worldwide had been dedicated to a search for the potential neurobiological mechanisms underlying the potency of new psychopharmacological drugs. The first laboratory of psychopharmacology in the USSR was established in 1960 at the Leningrad Psychoneurological Institute.

View Article and Find Full Text PDF