TY - GEN
T1 - Scientific Paper Extractive Summarization Enhanced by Citation Graphs
AU - Chen, Xiuying
AU - Li, Mingzhe
AU - Gao, Shen
AU - Yan, Rui
AU - Gao, Xin
AU - Zhang, Xiangliang
N1 - KAUST Repository Item: Exported on 2023-09-20
Acknowledged KAUST grant number(s): BAS/1/1635-01-01, FCC/1/1976-44-01, FCC/1/1976-45-01, URF/1/4663-01-01
Acknowledgements: We would like to thank the anonymous reviewers for their constructive comments. This work was supported by the SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence (SDAIA-KAUST AI). This publication is based upon work supported by the King Abdullah University of Science and Technology (KAUST) Office of Research Administration (ORA) under Award No FCC/1/1976-44-01, FCC/1/1976-45-01, URF/1/4663-01-01, and BAS/1/1635-01-01.
PY - 2022
Y1 - 2022
N2 - In a citation graph, adjacent paper nodes share related scientific terms and topics. The graph thus conveys unique structure information of document-level relatedness that can be utilized in the paper summarization task, for exploring beyond the intra-document information. In this work, we focus on leveraging citation graphs to improve scientific paper extractive summarization under different settings. We first propose a Multi-granularity Unsupervised Summarization model (MUS) as a simple and low-cost solution to the task. MUS finetunes a pre-trained encoder model on the citation graph by link prediction tasks. Then, the abstract sentences are extracted from the corresponding paper considering multi-granularity information. Preliminary results demonstrate that citation graph is helpful even in a simple unsupervised framework. Motivated by this, we next propose a Graph-based Supervised Summarization model (GSS) to achieve more accurate results on the task when large-scale labeled data are available. Apart from employing the link prediction as an auxiliary task, GSS introduces a gated sentence encoder and a graph information fusion module to take advantage of the graph information to polish the sentence representation. Experiments on a public benchmark dataset show that MUS and GSS bring substantial improvements over the prior state-of-the-art model.
AB - In a citation graph, adjacent paper nodes share related scientific terms and topics. The graph thus conveys unique structure information of document-level relatedness that can be utilized in the paper summarization task, for exploring beyond the intra-document information. In this work, we focus on leveraging citation graphs to improve scientific paper extractive summarization under different settings. We first propose a Multi-granularity Unsupervised Summarization model (MUS) as a simple and low-cost solution to the task. MUS finetunes a pre-trained encoder model on the citation graph by link prediction tasks. Then, the abstract sentences are extracted from the corresponding paper considering multi-granularity information. Preliminary results demonstrate that citation graph is helpful even in a simple unsupervised framework. Motivated by this, we next propose a Graph-based Supervised Summarization model (GSS) to achieve more accurate results on the task when large-scale labeled data are available. Apart from employing the link prediction as an auxiliary task, GSS introduces a gated sentence encoder and a graph information fusion module to take advantage of the graph information to polish the sentence representation. Experiments on a public benchmark dataset show that MUS and GSS bring substantial improvements over the prior state-of-the-art model.
UR - http://hdl.handle.net/10754/686472
UR - https://aclanthology.org/2022.emnlp-main.270
UR - http://www.scopus.com/inward/record.url?scp=85149435072&partnerID=8YFLogxK
U2 - 10.18653/v1/2022.emnlp-main.270
DO - 10.18653/v1/2022.emnlp-main.270
M3 - Conference contribution
SP - 4053
EP - 4062
BT - Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
PB - Association for Computational Linguistics (ACL)
ER -