Self-imitation guided goal-conditioned reinforcement learning

Yao Li, Yu Hui Wang, Xiao Yang Tan*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Goal-conditioned reinforcement learning (GCRL) aims to control agents to reach desired goals, which poses a significant challenge due to task-specific variations in configurations. However, current GCRL methods suffer from limitations in sample efficiency and the need for substantial training data. While existing self-imitation-based GCRL approaches can improve sample efficiency, their scalability to large-scale tasks is limited. In this paper, we propose integrating self-imitation learning with goal-conditioned RL methods into a compatible and reasonable framework. Specifically, we introduce a novel target action value function to aggregate self-imitation learning and goal-conditioned reinforcement learning. The designed target value effectively combines these two policy training mechanisms to accomplish specific tasks. Moreover, we theoretically demonstrate that our approach can learn a superior policy compared to both self-imitation learning and goal-conditioned reinforcement learning. Additionally, experimental results showcase the stability and effectiveness of our method compared to existing approaches in various challenging robotic control tasks.

Original languageEnglish (US)
Article number109845
JournalPattern Recognition
Volume144
DOIs
StatePublished - Dec 2023

Keywords

  • Behavior cloning
  • Deterministic policy gradient
  • Goal-conditioned reinforcement learning
  • Self-imitation learning

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Self-imitation guided goal-conditioned reinforcement learning'. Together they form a unique fingerprint.

Cite this