TY - GEN
T1 - Unsupervised Mitigation of Gender Bias by Character Components
T2 - 4th Workshop on Gender Bias in Natural Language Processing, GeBNLP 2022
AU - Chen, Xiuying
AU - Li, Mingzhe
AU - Yan, Rui
AU - Gao, Xin
AU - Zhang, Xiangliang
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - Word embeddings learned from massive text collections have demonstrated significant levels of discriminative biases. However, debiasing on the Chinese language, one of the most spoken languages, has been less explored. Meanwhile, existing literature relies on manually created supplementary data, which is time- and energy-consuming. In this work, we propose the first Chinese Gender-neutral word Embedding model (CGE) based on Word2vec, which learns gender-neutral word embeddings without any labeled data. Concretely, CGE utilizes and emphasizes the rich feminine and masculine information contained in radicals, i.e., a kind of component in Chinese characters, during the training procedure. This consequently alleviates discriminative gender biases. Experimental results show that our unsupervised method outperforms the state-of-the-art supervised debiased word embedding models without sacrificing the functionality of the embedding model.
AB - Word embeddings learned from massive text collections have demonstrated significant levels of discriminative biases. However, debiasing on the Chinese language, one of the most spoken languages, has been less explored. Meanwhile, existing literature relies on manually created supplementary data, which is time- and energy-consuming. In this work, we propose the first Chinese Gender-neutral word Embedding model (CGE) based on Word2vec, which learns gender-neutral word embeddings without any labeled data. Concretely, CGE utilizes and emphasizes the rich feminine and masculine information contained in radicals, i.e., a kind of component in Chinese characters, during the training procedure. This consequently alleviates discriminative gender biases. Experimental results show that our unsupervised method outperforms the state-of-the-art supervised debiased word embedding models without sacrificing the functionality of the embedding model.
UR - http://www.scopus.com/inward/record.url?scp=85137567959&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85137567959
T3 - GeBNLP 2022 - 4th Workshop on Gender Bias in Natural Language Processing, Proceedings of the Workshop
SP - 121
EP - 128
BT - GeBNLP 2022 - 4th Workshop on Gender Bias in Natural Language Processing, Proceedings of the Workshop
A2 - Hardmeier, Christian
A2 - Hardmeier, Christian
A2 - Basta, Christine
A2 - Christine, Basta
A2 - Costa-Jussa, Marta R.
A2 - Stanovsky, Gabriel
A2 - Gonen, Hila
PB - Association for Computational Linguistics (ACL)
Y2 - 15 July 2022
ER -