Is ChatGPT Ready for Public Use in Organ-Specific Drug Toxicity Research?

Skylar Connor , Leihong Wu , Ruth A Roberts , Weida Tong

Drug Discov Today

National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR 72079, USA. Electronic address:

Published: February 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

The growing impact of large language models (LLMs), such as ChatGPT, prompts questions about the reliability of their application in public health. We compared drug toxicity assessments by GPT-4 for liver, heart, and kidney against expert assessments using US Food and Drug Administration (FDA) drug-labeling documents. Two approaches were assessed: a 'General prompt', mimicking the conversational style used by the general public, and an 'Expert prompt' engineered to represent an approach of an expert. The Expert prompt achieved higher accuracy (64-75%) compared with the General prompt (48-72%), but the overall performance was moderate, indicating that caution is needed when using GPT-4 for public health. To improve reliability, an advanced framework,such as Retrieval Augmented Generation (RAG), might be required to leverage knowledge embedded in GPT-4.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.drudis.2025.104297	DOI Listing

Publication Analysis

Top Keywords

drug toxicity

public health

chatgpt ready

public

ready public

public organ-specific

organ-specific drug

toxicity research?

research? growing

growing impact

A PHP Error was encountered