TY - GEN
T1 - Federated Domain Adaptation for Named Entity Recognition via Distilling with Heterogeneous Tag Sets
AU - Wang, Rui
AU - Yu, Tong
AU - Wu, Junda
AU - Zhao, Handong
AU - Kim, Sungchul
AU - Zhang, Ruiyi
AU - Mitra, Subrata
AU - Henao, Ricardo
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Federated learning involves collaborative training with private data from multiple platforms, while not violating data privacy. We study the problem of federated domain adaptation for Named Entity Recognition (NER), where we seek to transfer knowledge across different platforms with data of multiple domains. In addition, we consider a practical and challenging scenario, where NER datasets of different platforms of federated learning are annotated with heterogeneous tag sets, i.e., different sets of entity types. The goal is to train a global model with federated learning, such that it can predict with a complete tag set, i.e., with all the occurring entity types for data across all platforms. To cope with the heterogeneous tag sets in a multi-domain setting, we propose a distillation approach along with a mechanism of instance weighting to facilitate knowledge transfer across platforms. Besides, we release two re-annotated clinic NER datasets, for testing the proposed method in the clinic domain. Our method shows superior empirical performance for clinic NER with federated learning.
AB - Federated learning involves collaborative training with private data from multiple platforms, while not violating data privacy. We study the problem of federated domain adaptation for Named Entity Recognition (NER), where we seek to transfer knowledge across different platforms with data of multiple domains. In addition, we consider a practical and challenging scenario, where NER datasets of different platforms of federated learning are annotated with heterogeneous tag sets, i.e., different sets of entity types. The goal is to train a global model with federated learning, such that it can predict with a complete tag set, i.e., with all the occurring entity types for data across all platforms. To cope with the heterogeneous tag sets in a multi-domain setting, we propose a distillation approach along with a mechanism of instance weighting to facilitate knowledge transfer across platforms. Besides, we release two re-annotated clinic NER datasets, for testing the proposed method in the clinic domain. Our method shows superior empirical performance for clinic NER with federated learning.
UR - http://www.scopus.com/inward/record.url?scp=85175489392&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85175489392
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 7449
EP - 7463
BT - Findings of the Association for Computational Linguistics, ACL 2023
PB - Association for Computational Linguistics (ACL)
T2 - 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Y2 - 9 July 2023 through 14 July 2023
ER -