SiteFerret: Beyond Simple Pocket Identification in Proteins.

J Chem Theory Comput

CONCEPT Lab, Istituto Italiano di Tecnologia, Via Melen - 83, B Block, 16152 Genova, Italy.

Published: August 2023


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

We present a novel method for the automatic detection of pockets on protein molecular surfaces. The algorithm is based on an ad hoc hierarchical clustering of virtual probe spheres obtained from the geometrical primitives used by the NanoShaper software to build the solvent-excluded molecular surface. The final ranking of putative pockets is based on the Isolation Forest method, an unsupervised learning approach originally developed for anomaly detection. A detailed importance analysis of pocket features provides insight into which geometrical (clustering) and chemical (amino acidic composition) properties characterize a good binding site. The method also provides a segmentation of pockets into smaller subpockets. We prove that subpockets are a convenient representation to pinpoint the binding site with great precision. SiteFerret is outstanding in its versatility, accurately predicting a wide range of binding sites, from those binding small molecules to those binding peptides, including difficult shallow sites.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10413863PMC
http://dx.doi.org/10.1021/acs.jctc.2c01306DOI Listing

Publication Analysis

Top Keywords

binding site
8
binding
5
siteferret simple
4
simple pocket
4
pocket identification
4
identification proteins
4
proteins novel
4
novel method
4
method automatic
4
automatic detection
4

Similar Publications

Glycocins are a growing family of ribosomally synthesized and posttranslationally modified peptides (RiPPs) that are O- and/or S-glycosylated. Using a sequence similarity network of putative glycosyltransferases, the thg biosynthetic gene cluster was identified in the genome of Thermoanaerobacterium thermosaccharolyticum. Heterologous expression in Escherichia coli showed that the glycosyltransferase (ThgS) encoded in the biosynthetic gene cluster (BGC) adds N-acetyl-glucosamine (GlcNAc) to Ser and Cys residues of ThgA.

View Article and Find Full Text PDF

The Discovery of RP-2119: A Potent, Selective, and Orally Bioavailable Polθ ATPase Inhibitor.

J Med Chem

September 2025

Repare Therapeutics, 7171 Frederick-Banting, Building 2, H4S 1Z9 Montréal, Québec, Canada.

DNA polymerase theta (Polθ) plays a critical role in repairing DNA double-strand breaks through microhomology-mediated end joining (MMEJ) and has emerged as a key synthetic lethal drug target in cancers with homologous recombination (HR) deficiencies. Its inhibition has shown a strong potential to synergize with PARP inhibitors, particularly in tumors with deleterious or mutations. Here, we describe the discovery and preclinical development of RP-2119, a selective, potent, and bioavailable Polθ ATPase inhibitor.

View Article and Find Full Text PDF

Carboxy-terminal tails (CTTs) of tubulin proteins are sites of regulating microtubule function. We previously conducted a genetic interaction screen and identified Kip3, a kinesin-8 motor, as potentially requiring the β-tubulin CTT (β-CTT) for function. Here we use budding yeast to define how β-CTT promotes Kip3 function and the features of β-CTT that are important for this mechanism.

View Article and Find Full Text PDF

DNA replication requires recruitment of Cdc45 and GINS into the MCM double hexamer by initiation factors to form an active helicase, the Cdc45-MCM-GINS (CMG) complex, at the replication origins. The initiation factor Sld3 is a central regulator of Cdc45 and GINS recruitment, working with Sld7 together. However, the mechanism through which Sld3 regulates CMG complex formation remains unclear.

View Article and Find Full Text PDF

Predicting nucleic acid binding sites by attention map-guided graph convolutional network with protein language embeddings and physicochemical information.

Brief Bioinform

August 2025

School of Information and Artificial Intelligence, Anhui Agricultural University, 130 Changjiang Road, Shushan District, Hefei, Anhui 230036, China.

Protein-nucleic acid binding sites play a crucial role in biological processes such as gene expression, signal transduction, replication, and transcription. In recent years, with the development of artificial intelligence, protein language models, graph neural networks, and transformer architectures have been adopted to develop both structure-based and sequence-based predictive models. Structure-based methods benefit from the spatial relationship between residues and have shown promising performance.

View Article and Find Full Text PDF