Improving Text Generation with Student-Forcing Optimal Transport

Guoyin Wang, Chunyuan Li, Jianqiao Li, Hao Fu, Yuh-Chen Lin, Liqun Chen, Yizhe Zhang, Chenyang Tao, Ruiyi Zhang, Wenlin Wang, Dinghan Shen, Qian Yang, Lawrence Carin

Research output: Contribution to journalArticlepeer-review

12 Scopus citations
45 Downloads (Pure)

Abstract

Neural language models are often trained with maximum likelihood estimation (MLE), where the next word is generated conditioned on the ground-truth word tokens. During testing, however, the model is instead conditioned on previously generated tokens, resulting in what is termed exposure bias. To reduce this gap between training and testing, we propose using optimal transport (OT) to match the sequences generated in these two modes. An extension is further proposed to improve the OT learning, based on the structural and contextual information of the text sequences. The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
Original languageEnglish (US)
JournalArxiv preprint
StatePublished - Oct 12 2020
Externally publishedYes

Keywords

  • cs.CL
  • cs.LG

Fingerprint

Dive into the research topics of 'Improving Text Generation with Student-Forcing Optimal Transport'. Together they form a unique fingerprint.

Cite this