TY - GEN
T1 - A Topic-aware Summarization Framework with Different Modal Side Information
AU - Chen, Xiuying
AU - Li, Mingzhe
AU - Gao, Shen
AU - Cheng, Xin
AU - Qiang, Yang
AU - Zhang, Qishen
AU - Gao, Xin
AU - Zhang, Xiangliang
N1 - KAUST Repository Item: Exported on 2023-07-21
Acknowledged KAUST grant number(s): FCC/1/1976-44-01, FCC/1/1976-45-01, REI/1/5234-01-01, RGC/3/4816-01-01
Acknowledgements: We would like to thank the anonymous reviewers for their constructive comments. The work was supported by King Abdullah University of Science and Technology (KAUST) through grant awards FCC/1/1976-44-01, FCC/1/1976-45-01, REI/1/5234-01-01, and RGC/3/4816-01-01.
PY - 2023/7/19
Y1 - 2023/7/19
N2 - Automatic summarization plays an important role in the exponential document growth on the Web. On content websites such as CNN.com and WikiHow.com, there often exist various kinds of side information along with the main document for attention attraction and easier understanding, such as videos, images, and queries. Such information can be used for better summarization, as they often explicitly or implicitly mention the essence of the article. However, most of the existing side-aware summarization methods are designed to incorporate either single-modal or multi-modal side information, and cannot effectively adapt to each other. In this paper, we propose a general summarization framework, which can flexibly incorporate various modalities of side information. The main challenges in designing a flexible summarization model with side information include: (1) the side information can be in textual or visualformat, and the model needs to align and unify it with the document into the same semantic space, (2) the side inputs can contain information from variousaspects, and the model should recognize the aspects useful for summarization. To address these two challenges, we first propose a unified topic encoder, which jointly discovers latent topics from the document and various kinds of side information. The learned topics flexibly bridge and guide the information flow between multiple inputs in a graph encoder through a topic-aware interaction. We secondly propose a triplet contrastive learning mechanism to align the single-modal or multi-modal information into a unified semantic space, where thesummary quality is enhanced by better understanding thedocument andside information. Results show that our model significantly surpasses strong baselines on three public single-modal or multi-modal benchmark summarization datasets.
AB - Automatic summarization plays an important role in the exponential document growth on the Web. On content websites such as CNN.com and WikiHow.com, there often exist various kinds of side information along with the main document for attention attraction and easier understanding, such as videos, images, and queries. Such information can be used for better summarization, as they often explicitly or implicitly mention the essence of the article. However, most of the existing side-aware summarization methods are designed to incorporate either single-modal or multi-modal side information, and cannot effectively adapt to each other. In this paper, we propose a general summarization framework, which can flexibly incorporate various modalities of side information. The main challenges in designing a flexible summarization model with side information include: (1) the side information can be in textual or visualformat, and the model needs to align and unify it with the document into the same semantic space, (2) the side inputs can contain information from variousaspects, and the model should recognize the aspects useful for summarization. To address these two challenges, we first propose a unified topic encoder, which jointly discovers latent topics from the document and various kinds of side information. The learned topics flexibly bridge and guide the information flow between multiple inputs in a graph encoder through a topic-aware interaction. We secondly propose a triplet contrastive learning mechanism to align the single-modal or multi-modal information into a unified semantic space, where thesummary quality is enhanced by better understanding thedocument andside information. Results show that our model significantly surpasses strong baselines on three public single-modal or multi-modal benchmark summarization datasets.
UR - http://hdl.handle.net/10754/693142
UR - https://dl.acm.org/doi/10.1145/3539618.3591630
U2 - 10.1145/3539618.3591630
DO - 10.1145/3539618.3591630
M3 - Conference contribution
BT - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - ACM
ER -