Inferring what the videographer wanted to capture

Abstract

Detecting important regions in videos has been extensively studied for past decades for their wide variety of applications including video summarization and retargeting. Visual attention models draw much attention for this purpose, which find visually salient regions. However, visual attention models ignore intentionally captured regions (ICRs) derived from videographers’ intentions, i.e., what the videographers wanted to capture in their videos. This paper proposes a Markov random field-based ICR model for finding them. Observing that a videographer’s intention is embedded into camera motion together with objects’ motion, our ICR model uses point trajectory-based features to distinguish ICRs from non-ICRs. It also leverages spatial and temporal consistency of ICRs to improve the performance. We have experimentally demonstrated our ICR model’s performance and the difference between ICRs and visually salient regions.

Publication
Proc. 2013 IEEE International Conference on Image Processing (ICIP)