98%
921
2 minutes
20
Large-scale models trained on extensive datasets have become the standard due to their strong generalizability across diverse tasks. In-context learning (ICL), widely used in natural language processing, leverages these models by providing task-specific prompts without modifying their parameters. This paradigm is increasingly being adapted for computer vision, where models receive an input-output image pair, known as an in-context pair, alongside a query image to illustrate the desired output. However, the success of visual ICL largely hinges on the quality of these prompts. To address this, we propose nhanced struct re (E-InMeMo), a novel approach that incorporates learnable perturbations into in-context pairs to optimize prompting. Through extensive experiments on standard vision tasks, E-InMeMo demonstrates superior performance over existing state-of-the-art methods. Notably, it improves mIoU scores by 7.99 for foreground segmentation and by 17.04 for single object detection when compared to the baseline without learnable prompts. These results highlight E-InMeMo as a lightweight yet effective strategy for enhancing visual ICL.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12295390 | PMC |
http://dx.doi.org/10.3390/jimaging11070232 | DOI Listing |
Int J Surg
September 2025
Digestive Endoscopy Center, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai, China.
Background: Patients with T1 colorectal cancer (CRC) often show poor adherence to guideline-recommended treatment strategies after endoscopic resection. To address this challenge and improve clinical decision-making, this study aims to compare the accuracy of surgical management recommendations between large language models (LLMs) and clinicians.
Methods: This retrospective study enrolled 202 patients with T1 CRC who underwent endoscopic resection at three hospitals.
Front Digit Health
August 2025
Department of Ophthalmology, Stanford University, Palo Alto, CA, United States.
Introduction: Vision language models (VLMs) combine image analysis capabilities with large language models (LLMs). Because of their multimodal capabilities, VLMs offer a clinical advantage over image classification models for the diagnosis of optic disc swelling by allowing a consideration of clinical context. In this study, we compare the performance of non-specialty-trained VLMs with different prompts in the classification of optic disc swelling on fundus photographs.
View Article and Find Full Text PDFLearn Behav
September 2025
Universidad Nacional Autónoma de México, Mexico, Mexico.
An experiment using a predictive learning task with college students evaluated the impact of a stimulus associated with extinction on an AAB renewal design. Four groups of participants learned a specific relationship between two cues (X and Y) and two outcomes (O1 and O2) in Context A during the first phase. Subsequently, both cues were subjected to extinction in the same Context A.
View Article and Find Full Text PDFBiomedical named entity recognition (NER) is a high-utility natural language processing (NLP) task, and large language models (LLMs) show promise particularly in few-shot settings (i.e., limited training data).
View Article and Find Full Text PDFEur Radiol
September 2025
Institute of Diagnostic and Interventional Neuroradiology, TUM University Hospital, School of Medicine and Health, Technical University of Munich, Munich, Germany.
Objectives: To evaluate the potential of LLMs to generate sequence-level brain MRI protocols.
Materials And Methods: This retrospective study employed a dataset of 150 brain MRI cases derived from local imaging request forms. Reference protocols were established by two neuroradiologists.