Knowledge VQA

Visual question answering (VQA) with knowledge is a task that requires knowledge to answer questions on images/video. This additional requirement of knowledge poses an interesting challenge on top of the classic VQA tasks. Specifically, a system needs to explore external knowledge sources to answer the questions correctly, as well as understanding the visual content.

We created a dedicated dataset for our knowledge VQA task and made it open to public, so that everyone can enjoy our new task. We have also publish several papers on this task.

Publications

  • Noa Garcia, Mayu Otani, Chenhui Chu, and Yuta Nakashima (2019). KnowIT VQA: Answering knowledge-based questions about videos. Proc. AAAI Conference on Artificial Intelligence, Feb. 2020.
  • Zekun Yang, Noa Garcia, Chenhui Chu, Mayu Otani, Yuta Nakashima, and Haruo Takemura (2020). BERT representations for video question answering. Proc.~IEEE Winter Conference on Applications of Computer Vision.
  • Noa Garcia, Chenhui Chu, Mayu Otani, and Yuta Nakashima (2019). Video meets knowledge in visual question answering. MIRU.
  • Zekun Yang, Noa Garcia, Chenhui Chu, Mayu Otani, Yuta Nakashima, and Haruo Takemura (2019). Video question answering with BERT. MIRU.
Avatar
Yuta Nakashima
Associate Professor

My research interests include computer vision, pattern recognition, natural langauge processing, and their applications.