Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: Structure search is one of the valuable capabilities of small-molecule databases. Fingerprint-based screening methods are usually employed to enhance the search performance by reducing the number of calls to the verification procedure. In substructure search, fingerprints are designed to capture important structural aspects of the molecule to aid the decision about whether the molecule contains a given substructure. Currently available cartridges typically provide acceptable search performance for processing user queries, but do not scale satisfactorily with dataset size.

Results: We present Sachem, a new open-source chemical cartridge that implements two substructure search methods: The first is a performance-oriented reimplementation of substructure indexing based on the OrChem fingerprint, and the second is a novel method that employs newly designed fingerprints stored in inverted indices. We assessed the performance of both methods on small, medium, and large datasets containing 1, 10, and 94 million compounds, respectively. Comparison of Sachem with other freely available cartridges revealed improvements in overall performance, scaling potential and screen-out efficiency.

Conclusions: The Sachem cartridge allows efficient substructure searches in databases of all sizes. The sublinear performance scaling of the second method and the ability to efficiently query large amounts of pre-extracted information may together open the door to new applications for substructure searches.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5966370PMC
http://dx.doi.org/10.1186/s13321-018-0282-yDOI Listing

Publication Analysis

Top Keywords

substructure search
12
chemical cartridge
8
search performance
8
performance scaling
8
substructure searches
8
substructure
7
search
6
performance
5
sachem
4
sachem chemical
4

Similar Publications

Sources for commercially available compounds have been experiencing continuous growth for several years, reaching their peak in billion- to trillion-sized combinatorial Chemical Spaces. To assess the quality of a compound collection to provide relevant chemistry, a benchmark set of pharmaceutically relevant structures is required that enables an unbiased comparison. For this purpose, the ChEMBL database was mined for molecules displaying biological activity, and three benchmark sets of successive orders of magnitude were created by systematic filtering and processing: Set ("large-sized," 379k), Set ("medium-sized," 25k), and Set ("small-sized," 3k).

View Article and Find Full Text PDF

Deep learning with leagues championship algorithm based intrusion detection on cybersecurity driven industrial IoT systems.

Sci Rep

August 2025

Department of Computer Science and Artificial Intelligence, College of Computing, Umm Al-Qura University, Makkah, Saudi Arabia.

The Internet of Things (IoT) presents significant advantages to day-to-day life across a wide range of application domains, including healthcare automation, transportation, and smart environments. However, owing to the constraints of limited resources and computation abilities, IoT networks are subject to different cyber-attacks. Incorporating IDS into the cybersecurity-driven IIoT process contains cautious deployment, planning, and progressing management.

View Article and Find Full Text PDF

Adenine/adenine contacts defined in the G-quadruplex for highly selective Tb binding.

Dalton Trans

September 2025

Key Laboratory of the Ministry of Education for Advanced Catalysis Materials, College of Chemistry and Materials Science, Zhejiang Normal University, Jinhua 321004, Zhejiang, China.

The folding of nucleic acids is strongly dependent on metal ions. For example, G-quadruplex (G4) structures are usually determined by the specific K/Na coordination. However, G4 constructs highly selective to high-valent metal ions (for instance, trivalent metal ions) have not been found due to their strong and nonspecific phosphate backbone interactions.

View Article and Find Full Text PDF

The ALICE Collaboration reports measurements of the large relative transverse momentum (k_{T}) component of jet substructure in pp and Pb-Pb collisions at center-of-mass energy per nucleon pair sqrt[s_{NN}]=5.02  TeV. Enhancement in the yield of such large-k_{T} emissions in head-on Pb-Pb collisions is predicted to arise from partonic scattering with quasiparticles of the quark-gluon plasma.

View Article and Find Full Text PDF

Discovering Molecular Insights in Organic Optoelectronics with Knowledge-Informed Interpretable Deep Learning.

J Chem Theory Comput

July 2025

Key Laboratory of Organic Integrated Circuits, Ministry of Education and Tianjin Key Laboratory of Molecular Optoelectronic Sciences, Department of Chemistry, School of Science, Tianjin University, Tianjin 300072, China.

Deep learning holds significant promise for accelerating molecular screening and materials design. However, the black-box nature of current models limits their ability to generate fundamentally new chemical knowledge and insights. Here, we propose LUMIA (Learning and Understanding Molecular Insights with Artificial Intelligence), an innovative interpretable deep learning framework integrating chemistry-informed contrastive learning and Monte Carlo tree search (MCTS).

View Article and Find Full Text PDF