Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: Molecular structures can be represented as strings of special characters using SMILES. Since each molecule is represented as a string, the similarity between compounds can be computed using SMILES-based string similarity functions. Most previous studies on drug-target interaction prediction use 2D-based compound similarity kernels such as SIMCOMP. To the best of our knowledge, using SMILES-based similarity functions, which are computationally more efficient than the 2D-based kernels, has not been investigated for this task before.

Results: In this study, we adapt and evaluate various SMILES-based similarity methods for drug-target interaction prediction. In addition, inspired by the vector space model of Information Retrieval we propose cosine similarity based SMILES kernels that make use of the Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) weighting approaches. We also investigate generating composite kernels by combining our best SMILES-based similarity functions with the SIMCOMP kernel. With this study, we provided a comparison of 13 different ligand similarity functions, each of which utilizes the SMILES string of molecule representation. Additionally, TF and TF-IDF based cosine similarity kernels are proposed.

Conclusion: The more efficient SMILES-based similarity functions performed similarly to the more complex 2D-based SIMCOMP kernel in terms of AUC-ROC scores. The TF-IDF based cosine similarity obtained a better AUC-PR score than the SIMCOMP kernel on the GPCR benchmark data set. The composite kernel of TF-IDF based cosine similarity and SIMCOMP achieved the best AUC-PR scores for all data sets.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4797122PMC
http://dx.doi.org/10.1186/s12859-016-0977-xDOI Listing

Publication Analysis

Top Keywords

similarity functions
24
smiles-based similarity
16
cosine similarity
16
similarity
13
drug-target interaction
12
interaction prediction
12
simcomp kernel
12
tf-idf based
12
based cosine
12
compound similarity
8

Similar Publications

Tires are complex polymeric materials composed of rubber elastomers (both natural and synthetic), fillers, steel wire, textiles, and a range of antioxidant and curing systems. These constituents are distributed differently among the various tire parts, which are classified based on their function and proximity to the rim. This study presents a rapid and sensitive approach for the characterization of tire components using mild thermal desorption/pyrolysis (TDPy) coupled to direct analysis in real-time mass spectrometry (DART-MS).

View Article and Find Full Text PDF

Solvation Structure of Np in a Noncomplexing Environment.

Inorg Chem

September 2025

Pacific Northwest National Laboratory, Richland, Washington 99352, United States.

The solvation structure of an Np ion in an aqueous, noncomplexing and nonoxidizing environment of trifluoromethanesulfonic (triflic) acid was investigated with X-ray absorption spectroscopy (XAS) combined with ab initio molecular dynamics (AIMD) and time-dependent density functional theory (TDDFT) calculations. Np L-edge X-ray absorption near-edge structure (XANES) and extended X-ray absorption fine structure (EXAFS) data were collected for Np in 1, 3, and 7 M triflic acid using a laboratory-scale spectrometer and separately at a synchrotron facility, producing data sets in excellent agreement. TDDFT calculations revealed a weak pre-edge feature not previously reported for Np L-edge XANES.

View Article and Find Full Text PDF

Genomic resequencing unravels species differentiation and polyploid origins in the aquatic plant genus Trapa.

Plant J

September 2025

State Key Laboratory of Plant Diversity and Specialty Crops, Wuhan Botanical Garden, Chinese Academy of Science, Wuhan, Hubei, 430074, China.

Trapa L. is a non-cereal aquatic crop with significant economic and ecological value. However, debates over its classification have caused uncertainties in species differentiation and the mechanisms of polyploid speciation.

View Article and Find Full Text PDF

ANASFV: a workflow for African swine fever virus whole-genome analysis.

Microb Genom

September 2025

Department of Infectious Diseases and Public Health, Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong Kong, Hong Kong, PR China.

African swine fever virus (ASFV) is highly transmissible and can cause up to 100% mortality in pigs. The virus has spread across most regions of Asia and Europe, resulting in the deaths of millions of pigs. A deep understanding of the genetic diversity and evolutionary dynamics of ASFV is necessary to effectively manage outbreaks.

View Article and Find Full Text PDF

Genomic and morphological characterization of a novel iridovirus, bivalve iridovirus 1 (BiIV1), infecting the common cockle ().

Microb Genom

September 2025

International Centre of Excellence for Aquatic Animal Health, The Centre for Environment, Fisheries and Aquaculture Science, Weymouth, DT4 8UB, UK.

High rates of mortality of the common cockle, , have occurred in the Wash Estuary, UK, since 2008. A previous study linked the mortalities to a novel genotype of , with a strong correlation between cockle moribundity and the presence of . Here, we characterize a novel iridovirus, identified by chance during metagenomic sequencing of a gradient purification of cells, with the presence also correlated to cockle moribundity.

View Article and Find Full Text PDF