TY - GEN
T1 - PIVOT
T2 - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023
AU - Villa, Andrés
AU - Alcázar, Juan León
AU - Alfarra, Motasem
AU - Alhamoud, Kumail
AU - Hurtado, Julio
AU - Heilbron, Fabian Caba
AU - Soto, Alvaro
AU - Ghanem, Bernard
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Modern machine learning pipelines are limited due to data availability, storage quotas, privacy regulations, and expensive annotation processes. These constraints make it difficult or impossible to train and update large-scale models on such dynamic annotated sets. Continual learning directly approaches this problem, with the ultimate goal of devising methods where a deep neural network effectively learns relevant patterns for new (unseen) classes, without significantly altering its performance on previously learned ones. In this paper, we address the problem of continual learning for video data. We introduce PIVOT, a novel method that leverages extensive knowledge in pre-trained models from the image domain, thereby reducing the number of trainable parameters and the associated forgetting. Unlike previous methods, ours is the first approach that effectively uses prompting mechanisms for continual learning without any in-domain pre-training. Our experiments show that PIVOT improves state-of-the-art methods by a significant 27% on the 20-task ActivityNet setup.
AB - Modern machine learning pipelines are limited due to data availability, storage quotas, privacy regulations, and expensive annotation processes. These constraints make it difficult or impossible to train and update large-scale models on such dynamic annotated sets. Continual learning directly approaches this problem, with the ultimate goal of devising methods where a deep neural network effectively learns relevant patterns for new (unseen) classes, without significantly altering its performance on previously learned ones. In this paper, we address the problem of continual learning for video data. We introduce PIVOT, a novel method that leverages extensive knowledge in pre-trained models from the image domain, thereby reducing the number of trainable parameters and the associated forgetting. Unlike previous methods, ours is the first approach that effectively uses prompting mechanisms for continual learning without any in-domain pre-training. Our experiments show that PIVOT improves state-of-the-art methods by a significant 27% on the 20-task ActivityNet setup.
KW - Video: Action and event understanding
UR - http://www.scopus.com/inward/record.url?scp=85167685523&partnerID=8YFLogxK
U2 - 10.1109/CVPR52729.2023.02319
DO - 10.1109/CVPR52729.2023.02319
M3 - Conference contribution
AN - SCOPUS:85167685523
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 24214
EP - 24223
BT - Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023
PB - IEEE Computer Society
Y2 - 18 June 2023 through 22 June 2023
ER -