TY - GEN
T1 - Flexible and Adaptable Summarization via Expertise Separation
AU - Chen, Xiuying
AU - Li, Mingzhe
AU - Gao, Shen
AU - Cheng, Xin
AU - Zhu, Qingqing
AU - Yan, Rui
AU - Gao, Xin
AU - Zhang, Xiangliang
N1 - Publisher Copyright:
© 2024 Owner/Author.
PY - 2024/7/10
Y1 - 2024/7/10
N2 - A proficient summarization model should exhibit both flexibility - the capacity to handle a range of in-domain summarization tasks, and adaptability - the competence to acquire new knowledge and adjust to unseen out-of-domain tasks. Unlike large language models (LLMs) that achieve this through parameter scaling, we propose a more parameter-efficient approach in this study. Our motivation rests on the principle that the general summarization ability to capture salient information can be shared across different tasks, while the domain-specific summarization abilities need to be distinct and tailored. Concretely, we propose MoeSumm, a Mixture-of-Expert Summarization architecture, which utilizes a main expert for gaining the general summarization capability and deputy experts that selectively collaborate to meet specific summarization task requirements. We further propose a max-margin loss to stimulate the separation of these abilities. Our model's distinct separation of general and domain-specific summarization abilities grants it with notable flexibility and adaptability, all while maintaining parameter efficiency. MoeSumm achieves flexibility by managing summarization across multiple domains with a single model, utilizing a shared main expert and selected deputy experts. It exhibits adaptability by tailoring deputy experts to cater to out-of-domain few-shot and zero-shot scenarios. Experimental results on 11 datasets show the superiority of our model compared with recent baselines and LLMs. We also provide statistical and visual evidence of the distinct separation of the two abilities in MoeSumm https://github.com/iriscxy/MoE_Summ
AB - A proficient summarization model should exhibit both flexibility - the capacity to handle a range of in-domain summarization tasks, and adaptability - the competence to acquire new knowledge and adjust to unseen out-of-domain tasks. Unlike large language models (LLMs) that achieve this through parameter scaling, we propose a more parameter-efficient approach in this study. Our motivation rests on the principle that the general summarization ability to capture salient information can be shared across different tasks, while the domain-specific summarization abilities need to be distinct and tailored. Concretely, we propose MoeSumm, a Mixture-of-Expert Summarization architecture, which utilizes a main expert for gaining the general summarization capability and deputy experts that selectively collaborate to meet specific summarization task requirements. We further propose a max-margin loss to stimulate the separation of these abilities. Our model's distinct separation of general and domain-specific summarization abilities grants it with notable flexibility and adaptability, all while maintaining parameter efficiency. MoeSumm achieves flexibility by managing summarization across multiple domains with a single model, utilizing a shared main expert and selected deputy experts. It exhibits adaptability by tailoring deputy experts to cater to out-of-domain few-shot and zero-shot scenarios. Experimental results on 11 datasets show the superiority of our model compared with recent baselines and LLMs. We also provide statistical and visual evidence of the distinct separation of the two abilities in MoeSumm https://github.com/iriscxy/MoE_Summ
KW - large language model
KW - mixture of experts
KW - text summarization
UR - http://www.scopus.com/inward/record.url?scp=85200560651&partnerID=8YFLogxK
U2 - 10.1145/3626772.3657789
DO - 10.1145/3626772.3657789
M3 - Conference contribution
AN - SCOPUS:85200560651
T3 - SIGIR 2024 - Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 2018
EP - 2027
BT - SIGIR 2024 - Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery, Inc
T2 - 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024
Y2 - 14 July 2024 through 18 July 2024
ER -