Computer Science - Computer Vision and Pattern Recognition

KnowIT VQA: Answering knowledge-based questions about videos

We propose a novel video understanding task by fusing knowledge-based and video question answering. First, we introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom. The dataset combines visual, …

BUDA.ART: A multimodal content-based analysis and retrieval system for Buddha statues

We introduce BUDA.ART, a system designed to assist researchers in Art History, to explore and analyze an archive of pictures of Buddha statues. The system combines different CBIR and classical retrieval techniques to assemble 2D pictures, 3D statue …

Context-aware embeddings for automatic art analysis

Automatic art analysis aims to classify and retrieve artistic representations from a collection of images by using computer vision and machine learning techniques. In this work, we propose to enhance visual representations from neural networks with …

Rethinking the evaluation of video summaries

Video summarization is a technique to create a short skim of the original video while preserving the main stories/content. There exists a substantial interest in automatizing this process due to the rapid growth of the available material. The recent …

Video summarization using deep semantic features

This paper presents a video summarization technique for an Internet video to provide a quick way to overview its content. This is a challenging problem because finding important or informative parts of the original video requires to understand its …

Learning joint representations of videos and sentences with web image search

Our objective is video retrieval based on natural language queries. In addition, we consider the analogous problem of retrieving sentences or generating descriptions given an input video. Recent work has addressed the problem by embedding visual and …