TY - JOUR
T1 - CMAL: Cost-effective Multi-label Active Learning by Querying Subexamples
AU - Yu, Guoxian
AU - Chen, Xia
AU - Domeniconi, Carlotta
AU - Wang, Jun
AU - Li, Zhao
AU - Zhang, Zili
AU - Zhang, Xiangliang
N1 - KAUST Repository Item: Exported on 2020-10-01
PY - 2020
Y1 - 2020
N2 - Multi-label active learning (MAL) aims to learn an accurate multi-label classifier by selecting which examples (or example-label pairs) will be annotated and reducing query effort. MAL is more complicated, since one example can be associated with a set of non-exclusive labels and the annotator has to scrutinize the whole example and label space to provide correct annotations. Instead of scrutinizing the whole example for annotation, we may just examine some of its subexamples with respect to a label for annotation. In this way, we can not only save the annotation cost but also speedup the annotation process. Given that, we a two-stage Cost-effective MAL strategy (CMAL) by querying subexamples. CMAL firstly selects the most informative example-label pairs by leveraging uncertainty, label correlation and label space sparsity. Next, CMAL greedily queries the most probable positive subexample-label pairs of the selected example-label pairs. In addition, we propose rCMAL to account for the representative of examples to more reliably select example-label pairs. Extensive experiments on multi-label datasets show that our proposed CMAL and rCMAL can better save the query cost than state-of-the-art MAL methods. The contribution of leveraging label correlation, label sparsity and representative for saving cost is also confirm.
AB - Multi-label active learning (MAL) aims to learn an accurate multi-label classifier by selecting which examples (or example-label pairs) will be annotated and reducing query effort. MAL is more complicated, since one example can be associated with a set of non-exclusive labels and the annotator has to scrutinize the whole example and label space to provide correct annotations. Instead of scrutinizing the whole example for annotation, we may just examine some of its subexamples with respect to a label for annotation. In this way, we can not only save the annotation cost but also speedup the annotation process. Given that, we a two-stage Cost-effective MAL strategy (CMAL) by querying subexamples. CMAL firstly selects the most informative example-label pairs by leveraging uncertainty, label correlation and label space sparsity. Next, CMAL greedily queries the most probable positive subexample-label pairs of the selected example-label pairs. In addition, we propose rCMAL to account for the representative of examples to more reliably select example-label pairs. Extensive experiments on multi-label datasets show that our proposed CMAL and rCMAL can better save the query cost than state-of-the-art MAL methods. The contribution of leveraging label correlation, label sparsity and representative for saving cost is also confirm.
UR - http://hdl.handle.net/10754/663804
UR - https://ieeexplore.ieee.org/document/9122440/
U2 - 10.1109/TKDE.2020.3003899
DO - 10.1109/TKDE.2020.3003899
M3 - Article
SN - 2326-3865
SP - 1
EP - 1
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
ER -