Exploring through Random Curiosity with General Value Functions

Aditya Ramesh, Louis Kirsch, Sjoerd van Steenkiste, Juergen Schmidhuber

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Exploration in reinforcement learning through intrinsic rewards has previously been addressed by approaches based on state novelty or artificial curiosity. In partially observable settings where observations look alike, state novelty can lead to intrinsic reward vanishing prematurely. On the other hand, curiosity-based approaches require modeling precise environment dynamics which are potentially quite complex. Here we propose random curiosity with general value functions (RC-GVF), an intrinsic reward function that connects state novelty and artificial curiosity. Instead of predicting the entire environment dynamics, RC-GVF predicts temporally extended values through general value functions (GVFs) and uses the prediction error as an intrinsic reward. In this way, our approach generalizes a popular approach called random network distillation (RND) by encouraging behavioral diversity and reduces the need for additional maximum entropy regularization. Our experiments on four procedurally generated partially observable environments indicate that our approach is competitive to RND and could be beneficial in environments that require behavioural exploration.
Original languageEnglish (US)
Title of host publication35th Deep RL workshop, Conference on Neural Information Processing Systems (NeurIPS 2021)
PublisherarXiv
StatePublished - 2022

Fingerprint

Dive into the research topics of 'Exploring through Random Curiosity with General Value Functions'. Together they form a unique fingerprint.

Cite this