This paper describes a method for hierarchical reinforcement learning in which high-level policies automatically discover subgoals, and low-level policies learn to specialize for different subgoals. Subgoals are represented as desired abstract observations which cluster raw input data. High-level value functions cover the state space at a coarse level; low-level value functions cover only parts of the state space at a fine-grained level. An experiment shows that this method outperforms several flat reinforcement learning methods. A second experiment shows how problems of partial observability due to observation abstraction can be overcome using high-level policies with memory.
|Original language||English (US)|
|Title of host publication||Proceedings of the IASTED International Conference on Neural Networks and Computational Intelligence|
|Number of pages||6|
|State||Published - Dec 1 2004|