98%
921
2 minutes
20
The recent explosion of Large Language Models (LLMs) has provoked lively debate about "emergent" properties of the models, including intelligence, insight, creativity, and meaning. These debates are rocky for two main reasons: The emergent properties sought are not well-defined; and the grounds for their dismissal often rest on a fallacious appeal to extraneous factors, like the LLM training regime, or fallacious assumptions about processes within the model. The latter issue is a particular roadblock for LLMs because their internal processes are largely unknown - they are colossal black boxes. In this paper, I try to cut through these problems by, first, identifying one salient feature shared by systems we regard as intelligent/conscious/sentient/etc., namely, their responsiveness to environmental conditions that may not be near in space and time. They engage with subjective worlds ("s-worlds") which may or may not conform to the actual environment. Observers can infer s-worlds from behavior alone, enabling hypotheses about perception and cognition that do not require evidence from the internal operations of the systems in question. The reconstruction of s-worlds offers a framework for comparing cognition across species, affording new leverage on the possible sentience of LLMs. Here, we examine one prominent LLM, OpenAI's GPT-4. Inquiry into the emergence of a complex subjective world is facilitated with philosophical phenomenology and cognitive ethology, examining the pattern of errors made by GPT-4 and proposing their origin in the absence of an analogue of the human subjective awareness of time. This deficit suggests that GPT-4 ultimately lacks a capacity to construct a stable perceptual world; the temporal vacuum undermines any capacity for GPT-4 to construct a consistent, continuously updated, model of its environment. Accordingly, none of GPT-4's statements are epistemically secure. Because the anthropomorphic illusion is so strong, I conclude by suggesting that GPT-4 works with its users to construct improvised works of fiction.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11339530 | PMC |
http://dx.doi.org/10.3389/fpsyg.2024.1292675 | DOI Listing |
J Pediatr Surg
September 2025
Harvard Medical School, Boston, MA, United States; Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center, Mass General Brigham, Boston, MA, United States. Electronic address:
Introduction: Large language models (LLMs) have been shown to translate information from highly specific domains into lay-digestible terms. Pediatric surgery remains an area in which it is difficult to communicate clinical information in an age-appropriate manner, given the vast diversity in language comprehension levels across patient populations and the complexity of procedures performed. This study evaluates LLMs as tools for generating explanations of common pediatric surgeries to increase efficiency and quality of communication.
View Article and Find Full Text PDFInt J Surg
September 2025
The Third Affiliated Hospital of Zhejiang Chinese Medical University, Hangzhou, Zhejiang, China.
J Physician Assist Educ
September 2025
Andrew P. Chastain, DMS, PA-C, is an assistant professor at Butler University, Indianapolis, Indiana.
Introduction: Artificial intelligence tools show promise in supplementing traditional physician assistant education, particularly in developing clinical reasoning skills. However, limited research exists on custom Generative Pretrained Transformer (GPT) applications in physician assistant (PA) education. This study evaluated student experiences and perceptions of a custom GPT-based clinical reasoning tool.
View Article and Find Full Text PDFAJR Am J Roentgenol
September 2025
Department of Radiology, Stanford University, Stanford, CA, USA.
The increasing complexity and volume of radiology reports present challenges for timely critical findings communication. To evaluate the performance of two out-of-the-box LLMs in detecting and classifying critical findings in radiology reports using various prompt strategies. The analysis included 252 radiology reports of varying modalities and anatomic regions extracted from the MIMIC-III database, divided into a prompt engineering tuning set of 50 reports, a holdout test set of 125 reports, and a pool of 77 remaining reports used as examples for few-shot prompting.
View Article and Find Full Text PDFActa Neurochir (Wien)
September 2025
Department of Neurosurgery, Istinye University, Istanbul, Turkey.
Background: Recent studies suggest that large language models (LLMs) such as ChatGPT are useful tools for medical students or residents when preparing for examinations. These studies, especially those conducted with multiple-choice questions, emphasize that the level of knowledge and response consistency of the LLMs are generally acceptable; however, further optimization is needed in areas such as case discussion, interpretation, and language proficiency. Therefore, this study aimed to evaluate the performance of six distinct LLMs for Turkish and English neurosurgery multiple-choice questions and assess their accuracy and consistency in a specialized medical context.
View Article and Find Full Text PDF