What is it like to be a bot? The world according to GPT-4.

Front Psychol

Trinity College, Hartford, CT, United States.

Published: August 2024


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

The recent explosion of Large Language Models (LLMs) has provoked lively debate about "emergent" properties of the models, including intelligence, insight, creativity, and meaning. These debates are rocky for two main reasons: The emergent properties sought are not well-defined; and the grounds for their dismissal often rest on a fallacious appeal to extraneous factors, like the LLM training regime, or fallacious assumptions about processes within the model. The latter issue is a particular roadblock for LLMs because their internal processes are largely unknown - they are colossal black boxes. In this paper, I try to cut through these problems by, first, identifying one salient feature shared by systems we regard as intelligent/conscious/sentient/etc., namely, their responsiveness to environmental conditions that may not be near in space and time. They engage with subjective worlds ("s-worlds") which may or may not conform to the actual environment. Observers can infer s-worlds from behavior alone, enabling hypotheses about perception and cognition that do not require evidence from the internal operations of the systems in question. The reconstruction of s-worlds offers a framework for comparing cognition across species, affording new leverage on the possible sentience of LLMs. Here, we examine one prominent LLM, OpenAI's GPT-4. Inquiry into the emergence of a complex subjective world is facilitated with philosophical phenomenology and cognitive ethology, examining the pattern of errors made by GPT-4 and proposing their origin in the absence of an analogue of the human subjective awareness of time. This deficit suggests that GPT-4 ultimately lacks a capacity to construct a stable perceptual world; the temporal vacuum undermines any capacity for GPT-4 to construct a consistent, continuously updated, model of its environment. Accordingly, none of GPT-4's statements are epistemically secure. Because the anthropomorphic illusion is so strong, I conclude by suggesting that GPT-4 works with its users to construct improvised works of fiction.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11339530PMC
http://dx.doi.org/10.3389/fpsyg.2024.1292675DOI Listing

Publication Analysis

Top Keywords

gpt-4
5
bot? gpt-4
4
gpt-4 explosion
4
explosion large
4
large language
4
language models
4
models llms
4
llms provoked
4
provoked lively
4
lively debate
4

Similar Publications

Designing Patient-Centered Communication Aids in Pediatric Surgery Using Large Language Models.

J Pediatr Surg

September 2025

Harvard Medical School, Boston, MA, United States; Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center, Mass General Brigham, Boston, MA, United States. Electronic address:

Introduction: Large language models (LLMs) have been shown to translate information from highly specific domains into lay-digestible terms. Pediatric surgery remains an area in which it is difficult to communicate clinical information in an age-appropriate manner, given the vast diversity in language comprehension levels across patient populations and the complexity of procedures performed. This study evaluates LLMs as tools for generating explanations of common pediatric surgeries to increase efficiency and quality of communication.

View Article and Find Full Text PDF

Introduction: Artificial intelligence tools show promise in supplementing traditional physician assistant education, particularly in developing clinical reasoning skills. However, limited research exists on custom Generative Pretrained Transformer (GPT) applications in physician assistant (PA) education. This study evaluated student experiences and perceptions of a custom GPT-based clinical reasoning tool.

View Article and Find Full Text PDF

The increasing complexity and volume of radiology reports present challenges for timely critical findings communication. To evaluate the performance of two out-of-the-box LLMs in detecting and classifying critical findings in radiology reports using various prompt strategies. The analysis included 252 radiology reports of varying modalities and anatomic regions extracted from the MIMIC-III database, divided into a prompt engineering tuning set of 50 reports, a holdout test set of 125 reports, and a pool of 77 remaining reports used as examples for few-shot prompting.

View Article and Find Full Text PDF

Background: Recent studies suggest that large language models (LLMs) such as ChatGPT are useful tools for medical students or residents when preparing for examinations. These studies, especially those conducted with multiple-choice questions, emphasize that the level of knowledge and response consistency of the LLMs are generally acceptable; however, further optimization is needed in areas such as case discussion, interpretation, and language proficiency. Therefore, this study aimed to evaluate the performance of six distinct LLMs for Turkish and English neurosurgery multiple-choice questions and assess their accuracy and consistency in a specialized medical context.

View Article and Find Full Text PDF