TY - GEN
T1 - MovieCuts
T2 - 17th European Conference on Computer Vision, ECCV 2022
AU - Pardo, Alejandro
AU - Heilbron, Fabian Caba
AU - Alcázar, Juan León
AU - Thabet, Ali
AU - Ghanem, Bernard
N1 - Funding Information:
Acknowledgements. This work was supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research through the Visual Computing Center (VCC) funding.
Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Understanding movies and their structural patterns is a crucial task in decoding the craft of video editing. While previous works have developed tools for general analysis, such as detecting characters or recognizing cinematography properties at the shot level, less effort has been devoted to understanding the most basic video edit, the Cut. This paper introduces the Cut type recognition task, which requires modeling multi-modal information. To ignite research in this new task, we construct a large-scale dataset called MovieCuts, which contains 173, 967 video clips labeled with ten cut types defined by professionals in the movie industry. We benchmark a set of audio-visual approaches, including some dealing with the problem’s multi-modal nature. Our best model achieves 47.7% mAP, which suggests that the task is challenging and that attaining highly accurate Cut type recognition is an open research problem. Advances in automatic Cut-type recognition can unleash new experiences in the video editing industry, such as movie analysis for education, video re-editing, virtual cinematography, machine-assisted trailer generation, machine-assisted video editing, among others. Our data and code are publicly available: https://github.com/PardoAlejo/MovieCuts.
AB - Understanding movies and their structural patterns is a crucial task in decoding the craft of video editing. While previous works have developed tools for general analysis, such as detecting characters or recognizing cinematography properties at the shot level, less effort has been devoted to understanding the most basic video edit, the Cut. This paper introduces the Cut type recognition task, which requires modeling multi-modal information. To ignite research in this new task, we construct a large-scale dataset called MovieCuts, which contains 173, 967 video clips labeled with ten cut types defined by professionals in the movie industry. We benchmark a set of audio-visual approaches, including some dealing with the problem’s multi-modal nature. Our best model achieves 47.7% mAP, which suggests that the task is challenging and that attaining highly accurate Cut type recognition is an open research problem. Advances in automatic Cut-type recognition can unleash new experiences in the video editing industry, such as movie analysis for education, video re-editing, virtual cinematography, machine-assisted trailer generation, machine-assisted video editing, among others. Our data and code are publicly available: https://github.com/PardoAlejo/MovieCuts.
KW - Cinematography
KW - Cut-types
KW - Movie understanding
KW - Recognition
KW - Shot transition
KW - Video editing
UR - http://www.scopus.com/inward/record.url?scp=85142733880&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-20071-7_39
DO - 10.1007/978-3-031-20071-7_39
M3 - Conference contribution
AN - SCOPUS:85142733880
SN - 9783031200700
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 668
EP - 685
BT - Computer Vision – ECCV 2022 - 17th European Conference, Proceedings
A2 - Avidan, Shai
A2 - Brostow, Gabriel
A2 - Cissé, Moustapha
A2 - Farinella, Giovanni Maria
A2 - Hassner, Tal
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 23 October 2022 through 27 October 2022
ER -