TY - GEN
T1 - Context-aware learning for automatic sports highlight recognition
AU - Ghanem, Bernard
AU - Kreidieh, Maya
AU - Farra, Marc
AU - Zhang, Tianzhu
N1 - KAUST Repository Item: Exported on 2020-04-23
Acknowledgements: This study is supported by the research grant for the Human Sixth Sense Program at the Advanced Digital Sciences Center from Singa-pore’s Agency for Science, Technology and Research(A*STAR).
PY - 2012
Y1 - 2012
N2 - Video highlight recognition is the procedure in which a long video sequence is summarized into a shorter video clip that depicts the most 'salient' parts of the sequence. It is an important technique for content delivery systems and search systems which create multimedia content tailored to their users' needs. This paper deals specifically with capturing highlights inherent to sports videos, especially for American football. Our proposed system exploits the multimodal nature of sports videos (i.e. visual, audio, and text cues) to detect the most important segments among them. The optimal combination of these cues is learned in a data-driven fashion using user preferences (expert input) as ground truth. Unlike most highlight recognition systems in the literature that define a highlight to be salient only in its own right (globally salient), we also consider the context of each video segment w.r.t. the video sequence it belongs to (locally salient). To validate our method, we compile a large dataset of broadcast American football videos, acquire their ground truth highlights, and evaluate the performance of our learning approach.
AB - Video highlight recognition is the procedure in which a long video sequence is summarized into a shorter video clip that depicts the most 'salient' parts of the sequence. It is an important technique for content delivery systems and search systems which create multimedia content tailored to their users' needs. This paper deals specifically with capturing highlights inherent to sports videos, especially for American football. Our proposed system exploits the multimodal nature of sports videos (i.e. visual, audio, and text cues) to detect the most important segments among them. The optimal combination of these cues is learned in a data-driven fashion using user preferences (expert input) as ground truth. Unlike most highlight recognition systems in the literature that define a highlight to be salient only in its own right (globally salient), we also consider the context of each video segment w.r.t. the video sequence it belongs to (locally salient). To validate our method, we compile a large dataset of broadcast American football videos, acquire their ground truth highlights, and evaluate the performance of our learning approach.
UR - http://www.scopus.com/inward/record.url?scp=84874564687&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84874564687
SN - 9784990644109
T3 - Proceedings - International Conference on Pattern Recognition
SP - 1977
EP - 1980
BT - ICPR 2012 - 21st International Conference on Pattern Recognition
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 21st International Conference on Pattern Recognition, ICPR 2012
Y2 - 11 November 2012 through 15 November 2012
ER -