TY - JOUR
T1 - Syntactic Knowledge-Infused Transformer and BERT models
AU - Sundararaman, Dhanasekar
AU - Subramanian, Vivek
AU - Wang, Guoyin
AU - Si, Shijing
AU - Shen, Dinghan
AU - Wang, Dong
AU - Carin, Lawrence
N1 - Publisher Copyright:
© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org)
PY - 2021
Y1 - 2021
N2 - Attention-based deep learning models have demonstrated significant improvement over traditional algorithms in several NLP tasks. The Transformer, for instance, is an illustrative example that generates abstract representations of tokens that are input to an encoder based on their relationships to all tokens in a sequence. While recent studies have shown that such models are capable of learning syntactic features purely by seeing examples, we hypothesize that explicitly feeding this information to deep learning models can significantly enhance their performance in many cases. Leveraging syntactic information like part of speech (POS) may be particularly beneficial in limited-training-data settings for complex models such as the Transformer. In this paper, we verify this hypothesis by infusing syntactic knowledge into the Transformer. We find that this syntax-infused Transformer achieves an improvement of 0.7 BLEU when trained on the full WMT'14 English to German translation dataset and a maximum improvement of 1.99 BLEU points when trained on a fraction of the dataset. In addition, we find that the incorporation of syntax into BERT fine-tuning outperforms BERTBASE on all downstream tasks from the GLUE benchmark, including an improvement of 0.8% on CoLA.
AB - Attention-based deep learning models have demonstrated significant improvement over traditional algorithms in several NLP tasks. The Transformer, for instance, is an illustrative example that generates abstract representations of tokens that are input to an encoder based on their relationships to all tokens in a sequence. While recent studies have shown that such models are capable of learning syntactic features purely by seeing examples, we hypothesize that explicitly feeding this information to deep learning models can significantly enhance their performance in many cases. Leveraging syntactic information like part of speech (POS) may be particularly beneficial in limited-training-data settings for complex models such as the Transformer. In this paper, we verify this hypothesis by infusing syntactic knowledge into the Transformer. We find that this syntax-infused Transformer achieves an improvement of 0.7 BLEU when trained on the full WMT'14 English to German translation dataset and a maximum improvement of 1.99 BLEU points when trained on a fraction of the dataset. In addition, we find that the incorporation of syntax into BERT fine-tuning outperforms BERTBASE on all downstream tasks from the GLUE benchmark, including an improvement of 0.8% on CoLA.
KW - Semantics
KW - Syntax
KW - Transformers
UR - http://www.scopus.com/inward/record.url?scp=85122854973&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85122854973
SN - 1613-0073
VL - 3052
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 2021 International Conference on Information and Knowledge Management Workshops, CIKMW 2021
Y2 - 1 November 2021 through 5 November 2021
ER -