Spin-glass model of in-context learning.

Phys Rev E

Sun Yat-sen University, PMI Lab, School of Physics, Guangzhou 510275, People's Republic of China.

Published: July 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Large language models show a surprising in-context learning ability-being able to use a prompt to form a prediction for a query, yet without additional training, in stark contrast to old-fashioned supervised learning. Providing a mechanistic interpretation and linking the empirical phenomenon to physics are thus challenging and remain unsolved. We study a simple yet expressive transformer with linear attention and map this structure to a spin glass model with real-valued spins, where the couplings and fields explain the intrinsic disorder in data. The spin glass model explains how the weight parameters interact with each other during pretraining, and further clarifies why an unseen function can be predicted by providing only a prompt yet without further training. Our theory reveals that for single-instance learning, increasing the task diversity leads to the emergence of in-context learning, by allowing the Boltzmann distribution to converge to a unique correct solution of weight parameters. Therefore, the pretrained transformer displays a prediction power in a prompt setting. The proposed analytically tractable model thus offers a promising avenue for thinking about how to interpret many intriguing but puzzling properties of large language models.

Download full-text PDF

Source
http://dx.doi.org/10.1103/5l5m-4nk5DOI Listing

Publication Analysis

Top Keywords

in-context learning
12
large language
8
language models
8
spin glass
8
glass model
8
weight parameters
8
learning
5
spin-glass model
4
model in-context
4
learning large
4

Similar Publications

An experiment using a predictive learning task with college students evaluated the impact of a stimulus associated with extinction on an AAB renewal design. Four groups of participants learned a specific relationship between two cues (X and Y) and two outcomes (O1 and O2) in Context A during the first phase. Subsequently, both cues were subjected to extinction in the same Context A.

View Article and Find Full Text PDF

Biomedical named entity recognition (NER) is a high-utility natural language processing (NLP) task, and large language models (LLMs) show promise particularly in few-shot settings (i.e., limited training data).

View Article and Find Full Text PDF

Evaluating large language model-generated brain MRI protocols: performance of GPT4o, o3-mini, DeepSeek-R1 and Qwen2.5-72B.

Eur Radiol

September 2025

Institute of Diagnostic and Interventional Neuroradiology, TUM University Hospital, School of Medicine and Health, Technical University of Munich, Munich, Germany.

Objectives: To evaluate the potential of LLMs to generate sequence-level brain MRI protocols.

Materials And Methods: This retrospective study employed a dataset of 150 brain MRI cases derived from local imaging request forms. Reference protocols were established by two neuroradiologists.

View Article and Find Full Text PDF

Medical Entity Linking in Low-Resource Settings with Fine-Tuning-Free LLMs.

Stud Health Technol Inform

September 2025

Chair of Medical Informatics, Institute of AI and Informatics in Medicine (AIIM), TUM University Hospital, Technical University of Munich, Munich, Germany.

Introduction: Medical entity linking is an important task in biomedical natural language processing, aiming to align textual mentions of medical concepts with standardized concepts in ontologies. Most existing approaches rely on supervised models or domain-specific embeddings, which require large datasets and significant computational resources.

Objective: The objective of this work is (1) to investigate the effectiveness of large language models (LLMs) in improving both candidate generation and disambiguation for medical entity linking through synonym expansion and in-context learning, and (2) to evaluate this approach against traditional string-matching and supervised methods.

View Article and Find Full Text PDF

Performance and improvement strategies for adapting generative large language models for electronic health record applications: A systematic review.

Int J Med Inform

August 2025

Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA 02115, United States; Department of Medicine, Harvard Medical School, Boston, MA 02115, United States.

Purpose: To synthesize performance and improvement strategies for adapting generative LLMs in EHR analyses and applications.

Methods: We followed the PRISMA guidelines to conduct a systematic review of articles from PubMed and Web of Science published between January 1, 2023 and November 9, 2024. Multiple reviewers including biomedical informaticians and a clinician involved in the article reviewing process.

View Article and Find Full Text PDF