Natural language processing in text mining for structural modeling of protein complexes.

BMC Bioinformatics

Center for Computational Biology and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, 66047, USA.

Published: March 2018


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: Structural modeling of protein-protein interactions produces a large number of putative configurations of the protein complexes. Identification of the near-native models among them is a serious challenge. Publicly available results of biomedical research may provide constraints on the binding mode, which can be essential for the docking. Our text-mining (TM) tool, which extracts binding site residues from the PubMed abstracts, was successfully applied to protein docking (Badal et al., PLoS Comput Biol, 2015; 11: e1004630). Still, many extracted residues were not relevant to the docking.

Results: We present an extension of the TM tool, which utilizes natural language processing (NLP) for analyzing the context of the residue occurrence. The procedure was tested using generic and specialized dictionaries. The results showed that the keyword dictionaries designed for identification of protein interactions are not adequate for the TM prediction of the binding mode. However, our dictionary designed to distinguish keywords relevant to the protein binding sites led to considerable improvement in the TM performance. We investigated the utility of several methods of context analysis, based on dissection of the sentence parse trees. The machine learning-based NLP filtered the pool of the mined residues significantly more efficiently than the rule-based NLP. Constraints generated by NLP were tested in docking of unbound proteins from the DOCKGROUND X-ray benchmark set 4. The output of the global low-resolution docking scan was post-processed, separately, by constraints from the basic TM, constraints re-ranked by NLP, and the reference constraints. The quality of a match was assessed by the interface root-mean-square deviation. The results showed significant improvement of the docking output when using the constraints generated by the advanced TM with NLP.

Conclusions: The basic TM procedure for extracting protein-protein binding site residues from the PubMed abstracts was significantly advanced by the deep parsing (NLP techniques for contextual analysis) in purging of the initial pool of the extracted residues. Benchmarking showed a substantial increase of the docking success rate based on the constraints generated by the advanced TM with NLP.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5838950PMC
http://dx.doi.org/10.1186/s12859-018-2079-4DOI Listing

Publication Analysis

Top Keywords

constraints generated
12
natural language
8
language processing
8
structural modeling
8
protein complexes
8
binding mode
8
binding site
8
site residues
8
residues pubmed
8
pubmed abstracts
8

Similar Publications

Forensic applications of compound genetic markers: trends and future directions.

Sci Justice

September 2025

School of Life Sciences, University of KwaZulu-Natal, Private Bag X54001, Westville, Durban 4000, South Africa. Electronic address:

A compound marker integrates two or more genetic markers into a single assay. The application of compound markers enhances the predictive accuracy of genetic testing by leveraging the strengths of different genetic variations while mitigating the limitations of individual markers. Compound markers include SNP-SNPs, SNP-STRs, DIP-SNPs, DIP-STRs, Multi-In/Dels, CpG-SNPs, CpG-STRs/CpG-In/Del, and Methylation-Microhaplotypes.

View Article and Find Full Text PDF

Deep feature extraction and swarm-optimized enhanced extreme learning machine for motor imagery recognition in stroke patients.

J Neurosci Methods

September 2025

Department of Computer Science and Engineering, IIT (ISM) Dhanbad, Dhanbad, 826004, Jharkhand, India. Electronic address:

Background: Interpretation of motor imagery (MI) in brain-computer interface (BCI) applications is largely driven by the use of electroencephalography (EEG) signals. However, precise classification in stroke patients remains challenging due to variability, non-stationarity, and abnormal EEG patterns.

New Methods: To address these challenges, an integrated architecture is proposed, combining multi-domain feature extraction with evolutionary optimization for enhanced EEG-based MI classification.

View Article and Find Full Text PDF

Background: Total knee and hip arthroplasty (TKA and THA) are among the most performed elective procedures. Rising demand and the resource-intensive nature of these procedures have contributed to longer wait times despite significant health care investment. Current scheduling methods often rely on average surgical durations, overlooking patient-specific variability.

View Article and Find Full Text PDF

Exploring ecosystem services and interconnections in nearshore islands for spatial planning: Insights from China.

J Environ Manage

September 2025

School of Marine Science and Engineering, Nanjing Normal University, Nanjing, Jiangsu, 210023, China; Coastal Zone Resources and Environment Engineering Research Center of Jiangsu Province, Nanjing, Jiangsu, 210023, China. Electronic address:

As climate change, urbanization, and marine exploitation intensify, understanding nearshore island ecosystem services (IESs) is essential for ensuring ecological protection and sustainable development. This study maps the spatiotemporal dynamics of six key ecosystem services (ESs) across 295 nearshore Chinese islands, including food production (FP), water yield (WY), soil conservation (SC), carbon storage (CS), and habitat quality (HQ) (2000-2022), and tourism and recreation (TR) (2012-2022). Using spatial autocorrelation, Slope trend analysis, per-pixel Pearson correlation, and K-means clustering, the study quantifies the trade-offs and synergies, identifies constraint characteristics, and delineates ecological functional zones for island classification.

View Article and Find Full Text PDF

Positive Neutrino Masses with DESI DR2 via Matter Conversion to Dark Energy.

Phys Rev Lett

August 2025

National Astronomical Observatories, Chinese Academy of Sciences, A20 Datun Road, Chaoyang District, Beijing, 100101, Peoples Republic of China.

The Dark Energy Spectroscopic Instrument (DESI) is a massively parallel spectroscopic survey on the Mayall telescope at Kitt Peak, which has released measurements of baryon acoustic oscillations determined from over 14 million extragalactic targets. We combine DESI Data Release 2 with CMB datasets to search for evidence of matter conversion to dark energy (DE), focusing on a scenario mediated by stellar collapse to cosmologically coupled black holes (CCBHs). In this physical model, which has the same number of free parameters as ΛCDM, DE production is determined by the cosmic star formation rate density (SFRD), allowing for distinct early- and late-time cosmologies.

View Article and Find Full Text PDF