Evaluating the ability of large Language models to predict human social decisions.

Sci Rep

Department of Applied Psychology, School of Humanities and Social Science, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Boulevard, 518172, Shenzhen, China.

Published: September 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Recent advances in large language models (LLMs) have highlighted their potential to predict human decisions. In two studies, we compared predictions by GPT-3.5 and GPT-4 across 51 scenarios (9,600 responses) against published data from 2,104 human participants within an evolutionary-psychology framework. We further examined our findings with GPT-4o across eight social-group and kinship conditions (1,600 responses). Our results revealed behavioral differences between humans and LLMs' predictions: Humans showed a greater sensitivity to kinship and group size than the LLMs when making life-death decisions. LLMs align closer with humans with a higher risk-seeking preference in financial domains. While human choices followed Prospect theory's value function (risk-averse in gains, risk-seeking in losses), LLMs often predicted reversed patterns. GPT-3.5 matched the average level of human risk preference but showed reversed framing effects; GPT-4 was indiscriminately risk-averse across social contexts. While humans were more risk-seeking in small or kin groups than in large groups, GPT-4o made the opposite predictions. Our results suggest a set of criteria for a psychological version of the Turing Test reflected in framing effects and social context-dependent risk preference involving kinship, group size, social relations, sense of fairness, self-age awareness, public vs. personal properties, and social group-dependent aspiration levels.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12405550	PMC
http://dx.doi.org/10.1038/s41598-025-17188-7	DOI Listing

Publication Analysis

Top Keywords

large language

language models

predict human

kinship group

group size

risk preference

framing effects

human

social

evaluating ability

Similar Publications

Patient-reported outcomes after lobectomy vs. segmentectomy for early-stage non-small cell lung cancer.

Surg Endosc

September 2025

Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China.

Qian Hong , Yan Wang , Fengyan Ma , Yinyan Gao , Guochao Zhang

Background: Surgical resection is the cornerstone for early-stage non-small cell lung cancer (NSCLC), with lobectomy historically standard. Evolving techniques have spurred debate comparing lobectomy and segmentectomy. This study analyzed early postoperative patient-reported symptoms and functional status in patients with early NSCLC undergoing either procedure.

View Article and Find Full Text PDF

Similar Publications

The imitation game: large language models versus multidisciplinary tumor boards: benchmarking AI against 21 sarcoma centers from the ring trial.

J Cancer Res Clin Oncol

September 2025

Department of Surgery, Mannheim School of Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.

Cheng-Peng Li , Aimé Terence Kalisa , Siyer Roohani , Kamal Hummedah , Franka Menge

Purpose: The study aims to compare the treatment recommendations generated by four leading large language models (LLMs) with those from 21 sarcoma centers' multidisciplinary tumor boards (MTBs) of the sarcoma ring trial in managing complex soft tissue sarcoma (STS) cases.

Methods: We simulated STS-MTBs using four LLMs-Llama 3.2-vison: 90b, Claude 3.

View Article and Find Full Text PDF

Similar Publications

Association Between Conversational Multitasking and Clinician Work Behaviors at a Large US Health Care System: Cohort Study.

J Med Internet Res

September 2025

Washington University in St. Louis, 660 South Euclid Avenue, Campus Box 8054, St Louis, MO, United States, 1 3142737801.

Linlin Xia , Daphne Lew , Laura Baratta , Elise Eiden , Sunny Lou

Background: Clinical communication is central to the delivery of effective, timely, and safe patient care. The use of text-based tools for clinician-to-clinician communication-commonly referred to as secure messaging-has increased exponentially over the past decade. The use of secure messaging has a potential impact on clinician work behaviors, workload, and cognitive burden.

View Article and Find Full Text PDF

Similar Publications

Artificial Intelligence in allergy and immunology: recent developments, implementation challenges, and the road towards clinical impact.

J Allergy Clin Immunol

September 2025

University of Groningen, University Medical Center Groningen, Beatrix Children's Hospital, Department of Pediatric Pulmonology and Pediatric Allergology, Groningen, the Netherlands; University of Groningen, University Medical Center Groningen, Groningen Research Institute for Asthma and COPD (GRIAC)

Merlijn van Breugel , Matt Greenhawt , Ibon Eguiluz-Gracia , Maria Jose Torres Jaen , Aikaterini Anagnostou

Artificial intelligence (AI) is increasingly recognized for its capacity to transform medicine. While publications applying AI in allergy and immunology have increased, clinical implementation substantially lags behind other specialties. By mid-2024, over 1,000 FDA-approved AI-enabled medical devices existed, but none specifically addressed allergy and immunology.

View Article and Find Full Text PDF

Similar Publications

[Artificial Intelligence Methods - a Perspective for Cardiovascular Telemedicine?].

Dtsch Med Wochenschr

September 2025

Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Charité Universitätsmedizin Berlin, Berlin, Deutschland.

Meike Hiddemann , Kerstin Köhler , Wilhelm Haverkamp , Juliane Köhler , Maximilian Bauser

Since 2022, an estimated 150000 to 200000 patients with heart failure (HF) in Germany have met the inclusion criteria for HF telemonitoring in accordance with the Federal Joint Committee's (G-BA) decision. Currently, only a few artificial intelligence (AI) applications are used in standard cardiovascular telemedicine care. However, AI applications could improve the predictive accuracy of existing telemedical sensor technology by recognising patterns across multiple data sources.

View Article and Find Full Text PDF

Similar Publications