TY - GEN
T1 - What Do I Annotate Next? An Empirical Study of Active Learning for Action Localization
AU - Heilbron, Fabian Caba
AU - Lee, Joon Young
AU - Jin, Hailin
AU - Ghanem, Bernard
N1 - Publisher Copyright:
© 2018, Springer Nature Switzerland AG.
PY - 2018
Y1 - 2018
N2 - Despite tremendous progress achieved in temporal action localization, state-of-the-art methods still struggle to train accurate models when annotated data is scarce. In this paper, we introduce a novel active learning framework for temporal localization that aims to mitigate this data dependency issue. We equip our framework with active selection functions that can reuse knowledge from previously annotated datasets. We study the performance of two state-of-the-art active selection functions as well as two widely used active learning baselines. To validate the effectiveness of each one of these selection functions, we conduct simulated experiments on ActivityNet. We find that using previously acquired knowledge as a bootstrapping source is crucial for active learners aiming to localize actions. When equipped with the right selection function, our proposed framework exhibits significantly better performance than standard active learning strategies, such as uncertainty sampling. Finally, we employ our framework to augment the newly compiled Kinetics action dataset with ground-truth temporal annotations. As a result, we collect Kinetics-Localization, a novel large-scale dataset for temporal action localization, which contains more than 15K YouTube videos.
AB - Despite tremendous progress achieved in temporal action localization, state-of-the-art methods still struggle to train accurate models when annotated data is scarce. In this paper, we introduce a novel active learning framework for temporal localization that aims to mitigate this data dependency issue. We equip our framework with active selection functions that can reuse knowledge from previously annotated datasets. We study the performance of two state-of-the-art active selection functions as well as two widely used active learning baselines. To validate the effectiveness of each one of these selection functions, we conduct simulated experiments on ActivityNet. We find that using previously acquired knowledge as a bootstrapping source is crucial for active learners aiming to localize actions. When equipped with the right selection function, our proposed framework exhibits significantly better performance than standard active learning strategies, such as uncertainty sampling. Finally, we employ our framework to augment the newly compiled Kinetics action dataset with ground-truth temporal annotations. As a result, we collect Kinetics-Localization, a novel large-scale dataset for temporal action localization, which contains more than 15K YouTube videos.
KW - Active learning
KW - Temporal action localization
KW - Video annotation
KW - Video understanding
UR - http://www.scopus.com/inward/record.url?scp=85055137221&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-01252-6_13
DO - 10.1007/978-3-030-01252-6_13
M3 - Conference contribution
AN - SCOPUS:85055137221
SN - 9783030012519
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 212
EP - 229
BT - Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings
A2 - Ferrari, Vittorio
A2 - Sminchisescu, Cristian
A2 - Weiss, Yair
A2 - Hebert, Martial
PB - Springer Verlag
T2 - 15th European Conference on Computer Vision, ECCV 2018
Y2 - 8 September 2018 through 14 September 2018
ER -