Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual Bandits.

Neural Comput

Istituto Dalle Molle di Studi sull'Intelligenza Artificiale, Lugano 6962, Switzerland.

Published: October 2022


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

An agent in a nonstationary contextual bandit problem should balance between exploration and the exploitation of (periodic or structured) patterns present in its previous experiences. Handcrafting an appropriate historical context is an attractive alternative to transform a nonstationary problem into a stationary problem that can be solved efficiently. However, even a carefully designed historical context may introduce spurious relationships or lack a convenient representation of crucial information. In order to address these issues, we propose an approach that learns to represent the relevant context for a decision based solely on the raw history of interactions between the agent and the environment. This approach relies on a combination of features extracted by recurrent neural networks with a contextual linear bandit algorithm based on posterior sampling. Our experiments on a diverse selection of contextual and noncontextual nonstationary problems show that our recurrent approach consistently outperforms its feedforward counterpart, which requires handcrafted historical contexts, while being more widely applicable than conventional nonstationary bandit algorithms. Although it is very difficult to provide theoretical performance guarantees for our new approach, we also prove a novel regret bound for linear posterior sampling with measurement error that may serve as a foundation for future theoretical work.

Download full-text PDF

Source
http://dx.doi.org/10.1162/neco_a_01539DOI Listing

Publication Analysis

Top Keywords

posterior sampling
12
nonstationary contextual
8
historical context
8
nonstationary
5
recurrent neural-linear
4
neural-linear posterior
4
sampling nonstationary
4
contextual
4
contextual bandits
4
bandits agent
4

Similar Publications

Causally Sound Priors for Binary Experiments.

Bayesian Anal

January 2025

Department of Statistics, University of Washington, Seattle, USA.

We introduce the BREASE framework for the Bayesian analysis of randomized controlled trials with binary treatment and outcome. Approaching the problem from a causal inference perspective, we propose parameterizing the likelihood in terms of the aseline isk, fficacy, and dverse ide ffects of the treatment, along with a flexible, yet intuitive and tractable jointly independent beta prior distribution on these parameters, which we show to be a generalization of the Dirichlet prior for the joint distribution of potential outcomes. Our approach has a number of desirable characteristics when compared to current mainstream alternatives: (i) it naturally induces prior dependence between expected outcomes in the treatment and control groups; (ii) as the baseline risk, efficacy and risk of adverse side effects are quantities commonly present in the clinicians' vocabulary, the hyperparameters of the prior are directly interpretable, thus facilitating the elicitation of prior knowledge and sensitivity analysis; and (iii) we provide analytical formulae for the marginal likelihood, Bayes factor, and other posterior quantities, as well as an exact posterior sampling algorithm and an accurate and fast data-augmented Gibbs sampler in cases where traditional MCMC fails.

View Article and Find Full Text PDF

Aim: To systematically review neurocognitive outcomes associated with postoperative paediatric cerebellar mutism syndrome (pCMS), comparing children with and without pCMS after posterior fossa tumour surgery, and in relation to moderating demographic and clinical risk factors.

Method: PsycInfo, Medline, and Embase databases were systematically searched up to December 2024. Studies of children aged 2 to 18 years with pCMS who had undergone standardized neurocognitive assessment were included.

View Article and Find Full Text PDF

Tool use is a complex motor planning problem. Prior research suggests that planning to use tools involves resolving competition between different tool-related action representations. We therefore reasoned that competition may also be exacerbated with tools for which the motions of the tool and the hand are incongruent (e.

View Article and Find Full Text PDF

Objective: Previous studies of nerve distribution in the orofacial complex have focused primarily on the anatomic courses of nerve fibers and have rarely addressed the density of nerve distribution. The nerve distribution in the mandible was described in only one report which showed an increase in nerve distribution density moving from the alveolar crest toward the inferior alveolar nerve. However, no previous reports have focused on the nerve distribution density in the maxilla.

View Article and Find Full Text PDF

Summary: In Bayesian phylogenetic and phylodynamic studies it is common to summarise the posterior distribution of trees with a time-calibrated summary phylogeny. While the maximum clade credibility (MCC) tree is often used for this purpose, we here show that a novel summary tree method-the highest independent posterior subtree reconstruction, or HIPSTR-contains consistently higher supported clades over MCC. We also provide faster computational routines for estimating both summary trees in an updated version of TreeAnnotator X, an open-source software program that summarizes the information from a sample of trees and returns many helpful statistics such as individual clade credibilities contained in the summary tree.

View Article and Find Full Text PDF