Trans-dimensional MCMC for Bayesian policy learning

Matt Hoffman, Arnaud Doucet, Nando De Freitas, Ajay Jasra

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations


A recently proposed formulation of the stochastic planning and control problem as one of parameter estimation for suitable artificial statistical models has led to the adoption of inference algorithms for this notoriously hard problem. At the algorithmic level, the focus has been on developing Expectation-Maximization (EM) algorithms. In this paper, we begin by making the crucial observation that the stochastic control problem can be reinterpreted as one of trans-dimensional inference. With this new interpretation, we are able to propose a novel reversible jump Markov chain Monte Carlo (MCMC) algorithm that is more efficient than its EM counterparts. Moreover, it enables us to implement full Bayesian policy search, without the need for gradients and with one single Markov chain. The new approach involves sampling directly from a distribution that is proportional to the reward and, consequently, performs better than classic simulations methods in situations where the reward is a rare event.
Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference
StatePublished - Dec 1 2009
Externally publishedYes


Dive into the research topics of 'Trans-dimensional MCMC for Bayesian policy learning'. Together they form a unique fingerprint.

Cite this