Sequence generation with optimal-transport-enhanced reinforcement learning

Liqun Chen, Ke Bai, Chenyang Tao, Yizhe Zhang, Guoyin Wang, Wenlin Wang, Ricardo Henao, Lawrence Carin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations


Reinforcement learning (RL) has been widely used to aid training in language generation. This is achieved by enhancing standard maximum likelihood objectives with user-specified reward functions that encourage global semantic consistency. We propose a principled approach to address the difficulties associated with RL-based solutions, namely, high-variance gradients, uninformative rewards and brittle training. By leveraging the optimal transport distance, we introduce a regularizer that significantly alleviates the above issues. Our formulation emphasizes the preservation of semantic features, enabling end-to-end training instead of ad-hoc fine-tuning, and when combined with RL, it controls the exploration space for more efficient model updates. To validate the effectiveness of the proposed solution, we perform a comprehensive evaluation covering a wide variety of NLP tasks: machine translation, abstractive text summarization and image caption, with consistent improvements over competing solutions.
Original languageEnglish (US)
Title of host publicationAAAI 2020 - 34th AAAI Conference on Artificial Intelligence
PublisherAAAI press
Number of pages9
ISBN (Print)9781577358350
StatePublished - Jan 1 2020
Externally publishedYes


Dive into the research topics of 'Sequence generation with optimal-transport-enhanced reinforcement learning'. Together they form a unique fingerprint.

Cite this