Visual Pretraining via Contrastive Predictive Model for Pixel-Based Reinforcement Learning.

Tung M Luu , Thang Vu , Thanh Nguyen , Chang D Yoo

Sensors (Basel)

School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea.

Published: August 2022

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

In an attempt to overcome the limitations of reward-driven representation learning in vision-based reinforcement learning (RL), an unsupervised learning framework referred to as the visual pretraining via contrastive predictive model (VPCPM) is proposed to learn the representations detached from the policy learning. Our method enables the convolutional encoder to perceive the underlying dynamics through a pair of forward and inverse models under the supervision of the contrastive loss, thus resulting in better representations. In experiments with a diverse set of vision control tasks, by initializing the encoders with VPCPM, the performance of state-of-the-art vision-based RL algorithms is significantly boosted, with 44% and 10% improvement for RAD and DrQ at 100 steps, respectively. In comparison to the prior unsupervised methods, the performance of VPCPM matches or outperforms all the baselines. We further demonstrate that the learned representations successfully generalize to the new tasks that share a similar observation and action space.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9460564	PMC
http://dx.doi.org/10.3390/s22176504	DOI Listing

Publication Analysis

Top Keywords

visual pretraining

pretraining contrastive

contrastive predictive

predictive model

reinforcement learning

learning

model pixel-based

pixel-based reinforcement

learning attempt

attempt overcome

Similar Publications

Deep-learning based morphological segmentation of canine diffuse large B-cell lymphoma.

Front Vet Sci

August 2025

Pathobiology and Population Science, Royal Veterinary College, Hatfield, United Kingdom.

Kenneth Ancheta , Androniki Psifidi , Andrew D Yale , Sophie Le Calvez , Jonathan Williams

Diffuse large B-cell lymphoma is the most common type of non-Hodgkin lymphoma (NHL) in humans, accounting for about 30-40% of NHL cases worldwide. Canine diffuse large B-cell lymphoma (cDLBCL) is the most common lymphoma subtype in dogs and demonstrates an aggressive biologic behaviour. For tissue biopsies, current confirmatory diagnostic approaches for enlarged lymph nodes rely on expert histopathological assessment, which is time-consuming and requires specialist expertise.

View Article and Find Full Text PDF

Similar Publications

Use of artificial intelligence for classification of fractures around the elbow in adults according to the 2018 AO/OTA classification system.

BMC Musculoskelet Disord

September 2025

Department of Clinical Sciences at Danderyds Hospital, Department of Orthopedic Surgery, Karolinska Institutet, Stockholm, 182 88, Sweden.

Annelie Pettersson , Michael Axenhus , Teo Stukan , Oscar Ljungberg , Hans Nåsell

Background: This study evaluates the accuracy of an Artificial Intelligence (AI) system, specifically a convolutional neural network (CNN), in classifying elbow fractures using the detailed 2018 AO/OTA fracture classification system.

Methods: A retrospective analysis of 5,367 radiograph exams visualizing the elbow from adult patients (2002-2016) was conducted using a deep neural network. Radiographs were manually categorized according to the 2018 AO/OTA system by orthopedic surgeons.

View Article and Find Full Text PDF

Similar Publications

Temporal Modeling With Frozen Vision-Language Foundation Models for Parameter-Efficient Text-Video Retrieval.

IEEE Trans Neural Netw Learn Syst

September 2025

Leqi Shen , Tianxiang Hao , Tao He , Yifeng Zhang , Pengzhang Liu

Temporal modeling plays an important role in the effective adaption of the powerful pretrained text-image foundation model into text-video retrieval. However, existing methods often rely on additional heavy trainable modules, such as transformer or BiLSTM, which are inefficient. In contrast, we avoid introducing such heavy components by leveraging frozen foundation models.

View Article and Find Full Text PDF

Similar Publications

Scvi-hub: an actionable repository for model-driven single-cell analysis.

Nat Methods

September 2025

Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA.

Can Ergen , Valeh Valiollah Pour Amiri , Martin Kim , Ori Kronfeld , Aaron Streets

The growing availability of single-cell omics datasets presents new opportunities for reuse, while challenges in data transfer, normalization and integration remain a barrier. Here we present scvi-hub: a platform for efficiently sharing and accessing single-cell omics datasets using pretrained probabilistic models. It enables immediate execution of fundamental tasks like visualization, imputation, annotation and deconvolution on new query datasets using state-of-the-art methods, with massively reduced storage and compute requirements.

View Article and Find Full Text PDF

Similar Publications

Deep learning-based embedding of functional connectivity profiles for precision functional mapping.

Imaging Neurosci (Camb)

September 2025

Mallinckrodt Institute of Radiology, Washington University in St. Louis, St. Louis, MO, United States.

Jiaxin Cindy Tu , Jung-Hoon Kim , Chenyan Lu , Patrick H Luckett , Babatunde Adeyemo

Spatial similarity of functional connectivity profiles across matching anatomical locations in individuals is often calculated to delineate individual differences in functional networks. Likewise, spatial similarity is assessed across average functional connectivity profiles of groups to evaluate the maturity of functional networks during development. Despite its widespread use, spatial similarity is limited to comparing two samples at a time.

View Article and Find Full Text PDF

Similar Publications