TY - GEN
T1 - Goal-Conditioned Generators of Deep Policies
AU - Herrmann, Vincent
AU - Ramesh, Aditya
AU - Kirsch, Louis
AU - Schmidhuber, Juergen
N1 - KAUST Repository Item: Exported on 2023-09-08
Acknowledgements: We thank Mirek Strupl, Dylan Ashley, Robert Csord ´ as, Alek- ´ sandar Stanic and Anand Gopalakrishnan for their feed- ´ back. This work was supported by the ERC Advanced Grant (no: 742870), the Swiss National Science Foundation grant (200021 192356), and by the Swiss National Supercomputing Centre (CSCS, projects: s1090, s1154). We also thank NVIDIA Corporation for donating a DGX-1 as part of the Pioneers of AI Research Award and to IBM for donating a Minsky machine.
PY - 2023/6/26
Y1 - 2023/6/26
N2 - Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies, given goals encoded in special command inputs. Here we study goal-conditioned neural nets (NNs) that learn to generate deep NN policies in form of context-specific weight matrices, similar to Fast Weight Programmers and other methods from the 1990s. Using context commands of the form ``generate a policy that achieves a desired expected return,'' our NN generators combine powerful exploration of parameter space with generalization across commands to iteratively find better and better policies. A form of weight-sharing HyperNetworks and policy embeddings scales our method to generate deep NNs. Experiments show how a single learned policy generator can produce policies that achieve any return seen during training. Finally, we evaluate our algorithm on a set of continuous control tasks where it exhibits competitive performance.
AB - Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies, given goals encoded in special command inputs. Here we study goal-conditioned neural nets (NNs) that learn to generate deep NN policies in form of context-specific weight matrices, similar to Fast Weight Programmers and other methods from the 1990s. Using context commands of the form ``generate a policy that achieves a desired expected return,'' our NN generators combine powerful exploration of parameter space with generalization across commands to iteratively find better and better policies. A form of weight-sharing HyperNetworks and policy embeddings scales our method to generate deep NNs. Experiments show how a single learned policy generator can produce policies that achieve any return seen during training. Finally, we evaluate our algorithm on a set of continuous control tasks where it exhibits competitive performance.
UR - http://hdl.handle.net/10754/694216
UR - https://ojs.aaai.org/index.php/AAAI/article/view/25912
U2 - 10.1609/aaai.v37i6.25912
DO - 10.1609/aaai.v37i6.25912
M3 - Conference contribution
SP - 7503
EP - 7511
BT - Proceedings of the AAAI Conference on Artificial Intelligence
PB - Association for the Advancement of Artificial Intelligence (AAAI)
ER -