Human action recognition-based video summarization for RGB-D personal sports video


Automatic sports video summarization poses the challenge of acquiring semantics of the original video, and existing work leverages various knowledge in application domains, e.g., structure of games and editing conventions. In this paper, we propose a personal sports video summarization method for self-recorded RGB-D videos, which became available to the public due to the commodification of off-the-shelf RGB-D sensors. We focus on sports whose games consist of a succession of actions and, unlike previous research, we use human action recognition on the depth sequences in order to acquire higher level semantics of the video. The recognition results are used along with an entropy-based activity measure to train a hidden Markov model of the highlights of different games to extract a summary from the original RGB-D video. We trained our novel highlights model with the subjective opinion of users with different experience in the sport. We took Kendo, a martial art, as an example sport to evaluate our method, and objectively/subjectively investigated the accuracy and quality of the generated summaries.

Proc. 2016 IEEE International Conference on Multimedia and Expo (ICME)