Abstract
A novel curious model-building control system is described which actively tries to provoke situations for which it learned to expect to learn something about the environment. Such a system has been implemented as a four-network system based on Watkins' Q-learning algorithm which can be used to maximize the expectation of the temporal derivative of the adaptive assumed reliability of future predictions. An experiment with an artificial nondeterministic environment demonstrates that the system can be superior to previous model-building control systems, which do not address the problem of modeling the reliability of the world model's predictions in uncertain environments and use ad-hoc methods (like random search) to train the world model.
Original language | English (US) |
---|---|
Title of host publication | 1991 IEEE International Joint Conference on Neural Networks - IJCNN '91 |
Publisher | Publ by IEEEPiscataway |
Pages | 1458-1463 |
Number of pages | 6 |
ISBN (Print) | 0780302273 |
DOIs | |
State | Published - Jan 1 1991 |
Externally published | Yes |