We study systems of multiple reinforcement learners. Each leads a single life lasting from birth to unknown death. In between it tries to accelerate reward intake. Its actions and learning algorithms consume part of its life - computational resources are limited. The expected reward for a certain behavior may change over time, partly because of other learners' actions and learning processes. For such reasons, previous approaches to multi-agent reinforcement learning are either limited or heuristic by nature. Using a simple backtracking method called the "success-story algorithm", however, at certain times called evaluation points each of our learners is able to establish success histories of behavior modifications: it simply undoes all those of the previous modifications that were not empirically observed to trigger lifelong reward accelerations (computation time for learning and testing is taken into account). Then it continues to act and learn until the next evaluation point. Success histories can be enforced despite interference from other learners. The principle allows for plugging in a wide variety of learning algorithms. An experiment illustrates its feasibility. © Springer-Verlag 1997.
|Original language||English (US)|
|Title of host publication||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Number of pages||12|
|State||Published - Dec 1 1997|
ASJC Scopus subject areas
- Theoretical Computer Science
- Computer Science(all)