Evaluating GPT and BERT models for protein-protein interaction identification in biomedical text.

Hasin Rehana , Nur Bengisu Çam , Mert Basmaci , Jie Zheng , Christianah Jemiyo , Yongqun He , Arzucan Özgür , Junguk Hur

Bioinform Adv

Department of Biomedical Sciences, School of Medicine and Health Sciences, University of North Dakota, Grand Forks, ND 58202, United States.

Published: September 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Motivation: Detecting protein-protein interactions (PPIs) is crucial for understanding genetic mechanisms, disease pathogenesis, and drug design. As biomedical literature continues to grow rapidly, there is an increasing need for automated and accurate extraction of these interactions to facilitate scientific discovery. Pretrained language models, such as generative pretrained transformers and bidirectional encoder representations from transformers, have shown promising results in natural language processing tasks.

Results: We evaluated the performance of PPI identification using multiple transformer-based models across three manually curated gold-standard corpora: Learning Language in Logic with 164 interactions in 77 sentences, Human Protein Reference Database with 163 interactions in 145 sentences, and Interaction Extraction Performance Assessment with 335 interactions in 486 sentences. Models based on bidirectional encoder representations achieved the best overall performance, with BioBERT achieving the highest recall of 91.95% and F1 score of 86.84% on the Learning Language in Logic dataset. Despite not being explicitly trained for biomedical texts, GPT-4 showed commendable performance, comparable to the bidirectional encoder models. Specifically, GPT-4 achieved the highest precision of 88.37%, a recall of 85.14%, and an F1 score of 86.49% on the same dataset. These results suggest that GPT-4 can effectively detect protein interactions from text, offering valuable applications in mining biomedical literature.

Availability And Implementation: The source code and datasets used in this study are available at https://github.com/hurlab/PPI-GPT-BERT.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11419952	PMC
http://dx.doi.org/10.1093/bioadv/vbae133	DOI Listing

Publication Analysis

Top Keywords

bidirectional encoder

encoder representations

learning language

language logic

interactions

models

evaluating gpt

gpt bert

bert models

models protein-protein

Similar Publications

Analyzing Reddit Social Media Content in the United States Related to H5N1: Sentiment and Topic Modeling Study.

J Med Internet Res

September 2025

Artificial Intelligence and Mathematical Modeling Lab, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.

Oscar Pang , Zahra Movahedi Nia , Murray Gillies , Doris Leung , Nicola Bragazzi

Background: The H5N1 avian influenza A virus represents a serious threat to both animal and human health, with the potential to escalate into a global pandemic. Effective monitoring of social media during H5N1 avian influenza outbreaks could potentially offer critical insights to guide public health strategies. Social media platforms like Reddit, with their diverse and region-specific communities, provide a rich source of data that can reveal collective attitudes, concerns, and behavioral trends in real time.

View Article and Find Full Text PDF

Similar Publications

Enhancing fake news detection with transformer-based deep learning: A multidisciplinary approach.

PLoS One

September 2025

Department of Computer Science, COMSATS University Islamabad, Sahiwal, Pakistan.

Nabeel Raza , Said Jadid Abdulkadir , Yawar Abbas Abid , Sami S Albouq , Ayed Alwadain

The widespread dissemination of fake news presents a critical challenge to the integrity of digital information and erodes public trust. This urgent problem necessitates the development of sophisticated and reliable automated detection mechanisms. This study addresses this gap by proposing a robust fake news detection framework centred on a transformer-based architecture.

View Article and Find Full Text PDF

Similar Publications

Long sequence temporal knowledge tracing for student performance prediction via integrating LSTM and informer.

PLoS One

September 2025

School of Electrical and Information Engineering, Hunan Institute of Technology, Hengyang, Hunan, China.

Ailian Gao , Zenglei Liu

Knowledge tracing can reveal students' level of knowledge in relation to their learning performance. Recently, plenty of machine learning algorithms have been proposed to exploit to implement knowledge tracing and have achieved promising outcomes. However, most of the previous approaches were unable to cope with long sequence time-series prediction, which is more valuable than short sequence prediction that is extensively utilized in current knowledge-tracing studies.

View Article and Find Full Text PDF

Similar Publications

Comparing Performance of Large Language Model-Based Tools on Patient-Driven Glaucoma Inquiries.

J Glaucoma

September 2025

Harvard Medical School, Boston, MA.

Dhruva Gupta , Sarah L Wagner , Alexandra G Castillejos Ellenthal , Andrew W Gross , Edward S Lu

Purpose: Large language models (LLMs) can assist patients who seek medical knowledge online to guide their own glaucoma care. Understanding the differences in LLM performance on glaucoma-related questions can inform patients about the best resources to obtain relevant information.

Methods: This cross-sectional study evaluated the accuracy, comprehensiveness, quality, and readability of LLM-generated responses to glaucoma inquiries.

View Article and Find Full Text PDF

Similar Publications

PreRBP: Interpretable Deep Learning for RNA-Protein Binding Site Prediction with Attention Mechanism.

Anal Biochem

September 2025

School of Computer Science and Engineering, Southeast University, Nanjing 210000, China.

Huixian Chen , Yun Zuo , Xiangrong Liu , Xiangxiang Zeng , Zhaohong Deng

In the complex process of gene expression and regulation, RNA-binding proteins occupy a pivotal position for RNA. Accurate prediction of RNA-protein binding sites can help researchers better understand RNA-binding proteins and their related mechanisms. And prediction techniques based on machine learning algorithms are both cost-effective and efficient in identifying these binding sites.

View Article and Find Full Text PDF

Similar Publications