98%
921
2 minutes
20
Background: Structure search is one of the valuable capabilities of small-molecule databases. Fingerprint-based screening methods are usually employed to enhance the search performance by reducing the number of calls to the verification procedure. In substructure search, fingerprints are designed to capture important structural aspects of the molecule to aid the decision about whether the molecule contains a given substructure. Currently available cartridges typically provide acceptable search performance for processing user queries, but do not scale satisfactorily with dataset size.
Results: We present Sachem, a new open-source chemical cartridge that implements two substructure search methods: The first is a performance-oriented reimplementation of substructure indexing based on the OrChem fingerprint, and the second is a novel method that employs newly designed fingerprints stored in inverted indices. We assessed the performance of both methods on small, medium, and large datasets containing 1, 10, and 94 million compounds, respectively. Comparison of Sachem with other freely available cartridges revealed improvements in overall performance, scaling potential and screen-out efficiency.
Conclusions: The Sachem cartridge allows efficient substructure searches in databases of all sizes. The sublinear performance scaling of the second method and the ability to efficiently query large amounts of pre-extracted information may together open the door to new applications for substructure searches.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5966370 | PMC |
http://dx.doi.org/10.1186/s13321-018-0282-y | DOI Listing |
J Chem Inf Model
September 2025
BioSolveIT GmbH, Sankt Augustin 53757, Germany.
Sources for commercially available compounds have been experiencing continuous growth for several years, reaching their peak in billion- to trillion-sized combinatorial Chemical Spaces. To assess the quality of a compound collection to provide relevant chemistry, a benchmark set of pharmaceutically relevant structures is required that enables an unbiased comparison. For this purpose, the ChEMBL database was mined for molecules displaying biological activity, and three benchmark sets of successive orders of magnitude were created by systematic filtering and processing: Set ("large-sized," 379k), Set ("medium-sized," 25k), and Set ("small-sized," 3k).
View Article and Find Full Text PDFSci Rep
August 2025
Department of Computer Science and Artificial Intelligence, College of Computing, Umm Al-Qura University, Makkah, Saudi Arabia.
The Internet of Things (IoT) presents significant advantages to day-to-day life across a wide range of application domains, including healthcare automation, transportation, and smart environments. However, owing to the constraints of limited resources and computation abilities, IoT networks are subject to different cyber-attacks. Incorporating IDS into the cybersecurity-driven IIoT process contains cautious deployment, planning, and progressing management.
View Article and Find Full Text PDFDalton Trans
September 2025
Key Laboratory of the Ministry of Education for Advanced Catalysis Materials, College of Chemistry and Materials Science, Zhejiang Normal University, Jinhua 321004, Zhejiang, China.
The folding of nucleic acids is strongly dependent on metal ions. For example, G-quadruplex (G4) structures are usually determined by the specific K/Na coordination. However, G4 constructs highly selective to high-valent metal ions (for instance, trivalent metal ions) have not been found due to their strong and nonspecific phosphate backbone interactions.
View Article and Find Full Text PDFPhys Rev Lett
July 2025
INFN, Sezione di Pavia, Pavia, Italy.
The ALICE Collaboration reports measurements of the large relative transverse momentum (k_{T}) component of jet substructure in pp and Pb-Pb collisions at center-of-mass energy per nucleon pair sqrt[s_{NN}]=5.02 TeV. Enhancement in the yield of such large-k_{T} emissions in head-on Pb-Pb collisions is predicted to arise from partonic scattering with quasiparticles of the quark-gluon plasma.
View Article and Find Full Text PDFJ Chem Theory Comput
July 2025
Key Laboratory of Organic Integrated Circuits, Ministry of Education and Tianjin Key Laboratory of Molecular Optoelectronic Sciences, Department of Chemistry, School of Science, Tianjin University, Tianjin 300072, China.
Deep learning holds significant promise for accelerating molecular screening and materials design. However, the black-box nature of current models limits their ability to generate fundamentally new chemical knowledge and insights. Here, we propose LUMIA (Learning and Understanding Molecular Insights with Artificial Intelligence), an innovative interpretable deep learning framework integrating chemistry-informed contrastive learning and Monte Carlo tree search (MCTS).
View Article and Find Full Text PDF