An on-line algorithm for dynamic reinforcement learning and planning in reactive environments

Research output: Chapter in Book/Report/Conference proceedingConference contribution

47 Scopus citations

Abstract

An online learning algorithm for reinforcement learning with continually running recurrent networks in nonstationary reactive environments is described. Various kinds of reinforcement are considered as special types of input to an agent living in the environment. The agent's only goal is to maximize the amount of reinforcement received over time. Supervised learning techniques for recurrent networks serve to construct a differentiable model of the environmental dynamics which includes a model of future reinforcement. This model is used for learning goal-directed behavior in an online fashion. The possibility of using the system for planning future action sequences is investigated, and this approach is compared to approaches based on temporal difference methods. A connection to metalearning (learning how to learn) is noted.
Original languageEnglish (US)
Title of host publicationIJCNN. International Joint Conference on Neural Networks
PublisherPubl by IEEEPiscataway
Pages253-258
Number of pages6
DOIs
StatePublished - Jan 1 1990
Externally publishedYes

Fingerprint

Dive into the research topics of 'An on-line algorithm for dynamic reinforcement learning and planning in reactive environments'. Together they form a unique fingerprint.

Cite this