TY - GEN
T1 - Baseline needs more love: On simple word-embedding-based models and associated pooling mechanisms
AU - Shen, Dinghan
AU - Wang, Guoyin
AU - Wang, Wenlin
AU - Min, Martin Renqiang
AU - Su, Qinliang
AU - Zhang, Yizhe
AU - Li, Chunyuan
AU - Henao, Ricardo
AU - Carin, Lawrence
N1 - Generated from Scopus record by KAUST IRTS on 2021-02-09
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring a substantial number of parameters and expensive computations. However, there has not been a rigorous evaluation regarding the added value of sophisticated compositional functions. In this paper, we conduct a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models. Surprisingly, SWEMs exhibit comparable or even superior performance in the majority of cases considered. Based upon this understanding, we propose two additional pooling strategies over learned word embeddings: (i) a max-pooling operation for improved interpretability; and (ii) a hierarchical pooling operation, which preserves spatial (n-gram) information within text sequences. We present experiments on 17 datasets encompassing three tasks: (i) (long) document classification; (ii) text sequence matching; and (iii) short text tasks, including classification and tagging.
AB - Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring a substantial number of parameters and expensive computations. However, there has not been a rigorous evaluation regarding the added value of sophisticated compositional functions. In this paper, we conduct a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models. Surprisingly, SWEMs exhibit comparable or even superior performance in the majority of cases considered. Based upon this understanding, we propose two additional pooling strategies over learned word embeddings: (i) a max-pooling operation for improved interpretability; and (ii) a hierarchical pooling operation, which preserves spatial (n-gram) information within text sequences. We present experiments on 17 datasets encompassing three tasks: (i) (long) document classification; (ii) text sequence matching; and (iii) short text tasks, including classification and tagging.
UR - http://aclweb.org/anthology/P18-1041
UR - http://www.scopus.com/inward/record.url?scp=85056881841&partnerID=8YFLogxK
U2 - 10.18653/v1/p18-1041
DO - 10.18653/v1/p18-1041
M3 - Conference contribution
SN - 9781948087322
SP - 440
EP - 450
BT - ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
PB - Association for Computational Linguistics (ACL)[email protected]
ER -