TY - GEN
T1 - Learning to Cut by Watching Movies
AU - Pardo, Alejandro
AU - Heilbron, Fabian Caba
AU - Alcázar, Juan León
AU - Thabet, Ali
AU - Ghanem, Bernard
N1 - Funding Information:
We introduced the task of cut plausibility ranking for computational video editing. We proposed a proxy task that aligns with the actual video editing process by leveraging knowledge from already edited scenes. Additionally, we collected more than 260K edited video clips. Using this edited footage, we created the first method capable of ranking cuts automatically, which learns in a data-driven fashion. We benchmarked our method with a set of proposed metrics that reflect the model’s level of precision at retrieval and expertise at providing tighter cuts. Finally, we used our method in a real-case scenario, where our model ranked cuts from non-edited videos. We conducted a user study in which editors picked our model’s cuts more often compared to those made by the baselines. Yet, there is still a long way to match editors’ expertise in selecting the most smooth cuts. This work aims at opening the door for data-driven computational video editing to the research community. Future directions include the use of fine-grained features to learn more subtle patterns that approximate better the fine-grained process of cutting video. Additionally, other modalities such as speech and language could bring benefits for ranking video cuts. Acknowledgments This work was supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research through the Visual Computing Center (VCC) funding.
Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - Video content creation keeps growing at an incredible pace; yet, creating engaging stories remains challenging and requires non-trivial video editing expertise. Many video editing components are astonishingly hard to automate primarily due to the lack of raw video materials. This paper focuses on a new task for computational video editing, namely the task of raking cut plausibility. Our key idea is to leverage content that has already been edited to learn fine-grained audiovisual patterns that trigger cuts. To do this, we first collected a data source of more than 10K videos, from which we extract more than 255K cuts. We devise a model that learns to discriminate between real and artificial cuts via contrastive learning. We set up a new task and a set of baselines to benchmark video cut generation. We observe that our proposed model outperforms the baselines by large margins. To demonstrate our model in real-world applications, we conduct human studies in a collection of unedited videos. The results show that our model does a better job at cutting than random and alternative baselines.
AB - Video content creation keeps growing at an incredible pace; yet, creating engaging stories remains challenging and requires non-trivial video editing expertise. Many video editing components are astonishingly hard to automate primarily due to the lack of raw video materials. This paper focuses on a new task for computational video editing, namely the task of raking cut plausibility. Our key idea is to leverage content that has already been edited to learn fine-grained audiovisual patterns that trigger cuts. To do this, we first collected a data source of more than 10K videos, from which we extract more than 255K cuts. We devise a model that learns to discriminate between real and artificial cuts via contrastive learning. We set up a new task and a set of baselines to benchmark video cut generation. We observe that our proposed model outperforms the baselines by large margins. To demonstrate our model in real-world applications, we conduct human studies in a collection of unedited videos. The results show that our model does a better job at cutting than random and alternative baselines.
UR - http://www.scopus.com/inward/record.url?scp=85121194143&partnerID=8YFLogxK
U2 - 10.1109/ICCV48922.2021.00678
DO - 10.1109/ICCV48922.2021.00678
M3 - Conference contribution
AN - SCOPUS:85121194143
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 6838
EP - 6848
BT - Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021
Y2 - 11 October 2021 through 17 October 2021
ER -