Hindsight policy gradients

Paulo Rauber, Filipe Mutz, Avinash Ummadisingu, Jürgen Schmidhuber

Research output: Chapter in Book/Report/Conference proceedingConference contribution

22 Scopus citations

Abstract

A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended appears crucial to enable sample efficient learning. However, reinforcement learning agents have only recently been endowed with such capacity for hindsight. In this paper, we demonstrate how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms. Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency.
Original languageEnglish (US)
Title of host publication7th International Conference on Learning Representations, ICLR 2019
PublisherInternational Conference on Learning Representations, ICLR
StatePublished - Jan 1 2019
Externally publishedYes

Fingerprint

Dive into the research topics of 'Hindsight policy gradients'. Together they form a unique fingerprint.

Cite this