Re-evaluating Theory of Mind evaluation in large language models.

Philos Trans R Soc Lond B Biol Sci

Department of Psychology, Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA, USA.

Published: August 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

The question of whether large language models (LLMs) possess Theory of Mind (ToM)-often defined as the ability to reason about others' mental states-has sparked significant scientific and public interest. However, the evidence as to whether LLMs possess ToM is mixed, and the recent growth in evaluations has not resulted in a convergence. Here, we take inspiration from cognitive science to re-evaluate the state of ToM evaluation in LLMs. We argue that a major reason for the disagreement on whether LLMs have ToM is a lack of clarity on whether models should be expected to match human behaviours, or the computations underlying those behaviours. We also highlight ways in which current evaluations may be deviating from 'pure' measurements of ToM abilities, which also contributes to the confusion. We conclude by discussing several directions for future research, including the relationship between ToM and pragmatic communication, which could advance our understanding of artificial systems as well as human cognition.This article is part of the theme issue 'At the heart of human communication: new views on the complex relationship between pragmatics and Theory of Mind'.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12351311PMC
http://dx.doi.org/10.1098/rstb.2023.0499DOI Listing

Publication Analysis

Top Keywords

theory mind
8
large language
8
language models
8
llms possess
8
tom
5
re-evaluating theory
4
mind evaluation
4
evaluation large
4
models question
4
question large
4

Similar Publications

The human mind constructs and updates models of events during comprehension. Event models are multidimensional, multi-timescale, and structured. They enable prediction, shape memory formation, and facilitate action control.

View Article and Find Full Text PDF

Applied Behavior Analysis in the Crosshairs: Neurodiversity, the Intact Mind, and Autism Politics.

Perspect Behav Sci

September 2025

History and Sociology of Science Department, University of Pennsylvania, 249 South 36th Street, Philadelphia, PA 19104 USA.

Recent attacks on applied behavior analysis (ABA) by neurodiversity advocates share a common theme with opposition to other supports, such as subminimum wage vocational programs and congregate residential settings: the intact mind assumption, which maintains that even profoundly autistic people have typical intelligence, even if they present as severely cognitively impaired. This article examines the history of the intact mind assumption, which was largely shaped by psychoanalytic theory in the mid-20 century, as well as its impact on contemporary disability policy and practice.

View Article and Find Full Text PDF

Purpose: This study evaluates the effectiveness of integrating case-based mind maps and reflective journals within Kolb's experiential learning framework in advanced nursing education.

Methods: An design compared 2023 (control group,  = 46) and 2024 (experimental group,  = 57) cohorts of nursing master's students. The experimental group received a Kolb-based intervention comprising: case analysis (concrete experience), reflective journals (reflective observation), mind maps (abstract conceptualization), and peer-led simulations (active experimentation).

View Article and Find Full Text PDF

Here, we will review the developmental literature on how infants and young children learn about emotions. We take a process-based perspective, highlighting how the protracted trajectory of emotional development unfolds concurrently with changes in children's cognitive abilities, and how variability based on context, culture, and experience shape this trajectory over time. We will also emphasize the role of input into this development, a factor that has often been ignored.

View Article and Find Full Text PDF

As adults, we do not expect ignorant agents to behave randomly or always get things wrong. Instead, we expect them to act reasonably, guided by past experiences. We test whether 4-to-6-year-olds share this intuition and use it to infer others' knowledge, or whether they rely on a simple "ignorance = error" heuristic identified in past work.

View Article and Find Full Text PDF