E-InMeMo: Enhanced Prompting for Visual In-Context Learning.

Jiahao Zhang , Bowen Wang , Hong Liu , Liangzhi Li , Yuta Nakashima , Hajime Nagahara

J Imaging

D3 Center, The University of Osaka, Osaka 565-0871, Japan.

Published: July 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Large-scale models trained on extensive datasets have become the standard due to their strong generalizability across diverse tasks. In-context learning (ICL), widely used in natural language processing, leverages these models by providing task-specific prompts without modifying their parameters. This paradigm is increasingly being adapted for computer vision, where models receive an input-output image pair, known as an in-context pair, alongside a query image to illustrate the desired output. However, the success of visual ICL largely hinges on the quality of these prompts. To address this, we propose nhanced struct re (E-InMeMo), a novel approach that incorporates learnable perturbations into in-context pairs to optimize prompting. Through extensive experiments on standard vision tasks, E-InMeMo demonstrates superior performance over existing state-of-the-art methods. Notably, it improves mIoU scores by 7.99 for foreground segmentation and by 17.04 for single object detection when compared to the baseline without learnable prompts. These results highlight E-InMeMo as a lightweight yet effective strategy for enhancing visual ICL.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12295390	PMC
http://dx.doi.org/10.3390/jimaging11070232	DOI Listing

Publication Analysis

Top Keywords

in-context learning

visual icl

e-inmemo

e-inmemo enhanced

enhanced prompting

prompting visual

in-context

visual in-context

learning large-scale

large-scale models

Similar Publications

Guideline adherence in surgical decisions for T1 colorectal cancer after endoscopic resection: large language models vs clinicians.

Int J Surg

September 2025

Digestive Endoscopy Center, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai, China.

Liangtang Zeng , Cao Qinxing , Junyuan Deng , Junnan Hu , Minghui Pang

Background: Patients with T1 colorectal cancer (CRC) often show poor adherence to guideline-recommended treatment strategies after endoscopic resection. To address this challenge and improve clinical decision-making, this study aims to compare the accuracy of surgical management recommendations between large language models (LLMs) and clinicians.

Methods: This retrospective study enrolled 202 patients with T1 CRC who underwent endoscopic resection at three hospitals.

View Article and Find Full Text PDF

Similar Publications

Performance of vision language models for optic disc swelling identification on fundus photographs.

Front Digit Health

August 2025

Department of Ophthalmology, Stanford University, Palo Alto, CA, United States.

Kelvin Zhenghao Li , Tuyet Thao Nguyen , Heather E Moss

Introduction: Vision language models (VLMs) combine image analysis capabilities with large language models (LLMs). Because of their multimodal capabilities, VLMs offer a clinical advantage over image classification models for the diagnosis of optic disc swelling by allowing a consideration of clinical context. In this study, we compare the performance of non-specialty-trained VLMs with different prompts in the classification of optic disc swelling on fundus photographs.

View Article and Find Full Text PDF

Similar Publications

The impact of an extinction reminder on AAB renewal is sensitive to the level of association with extinction.

Learn Behav

September 2025

Universidad Nacional Autónoma de México, Mexico, Mexico.

A Matías Gámez , Fátima Rojas-Iturria , Rodolfo Bernal-Gamboa

An experiment using a predictive learning task with college students evaluated the impact of a stimulus associated with extinction on an AAB renewal design. Four groups of participants learned a specific relationship between two cues (X and Y) and two outcomes (O1 and O2) in Context A during the first phase. Subsequently, both cues were subjected to extinction in the same Context A.

View Article and Find Full Text PDF

Similar Publications

Retrieval augmented generation based dynamic prompting for few-shot biomedical named entity recognition using large language models.

Res Sq

August 2025

Yao Ge , Sudeshna Das , Yuting Guo , Abeed Sarker

Biomedical named entity recognition (NER) is a high-utility natural language processing (NLP) task, and large language models (LLMs) show promise particularly in few-shot settings (i.e., limited training data).

View Article and Find Full Text PDF

Similar Publications

Evaluating large language model-generated brain MRI protocols: performance of GPT4o, o3-mini, DeepSeek-R1 and Qwen2.5-72B.

Eur Radiol

September 2025

Institute of Diagnostic and Interventional Neuroradiology, TUM University Hospital, School of Medicine and Health, Technical University of Munich, Munich, Germany.

Su Hwan Kim , Severin Schramm , Lena Schmitzer , Kerem Serguen , Sebastian Ziegelmayer

Objectives: To evaluate the potential of LLMs to generate sequence-level brain MRI protocols.

Materials And Methods: This retrospective study employed a dataset of 150 brain MRI cases derived from local imaging request forms. Reference protocols were established by two neuroradiologists.

View Article and Find Full Text PDF

Similar Publications