Yuta Nakashima is an associate professor with Institute for Datability Science, Osaka University. His research interests include computer vision, pattern recognition, natural langauge processing, and their applications.


  • Computer Vision
  • Pattern Recognition
  • Natural Language Processing


  • PhD in Engineering, 2012

    Osaka University

  • ME, 2008

    Osaka University

  • BE, 2006

    Osaka University



Associate Professor

Institute for Datability Science, Osaka University

Jan 2017 – Present Osaka, Japan

Visiting Scholar

Robotics Institute, Carnegie Mellon University

Apr 2015 – Mar 2016 Pennsylvania, US

Assistant Professor

Nara Institute of Science and Technology

Apr 2012 – Dec 2016 Nara, Japan

Visiting Scholar

University of North Carolina at Charlotte

Feb 2012 – Mar 2012 North Carolina, US

JSPS Research Fellow (PD)

Osaia University

Feb 2012 – Mar 2012 Osaka, Japan

JSPS Research Fellow (DC2)

Osaia University

Oct 2008 – Jan 2012 Osaka, Japan



Recognition as Excellent Research Work and Collaboration

Open Paper Award

Infering what the videographer wanted to capture
Y. Nakashima and N. Yokoya

Recent Posts


Video Summary

Video summarization has been one of research topics that require deep understanding of video content. We explore various methods for automatic video summarization and also the limitation of current datasets.

Knowledge VQA

Visual question answering (VQA) with knowledge is a task that requires knowledge to answer questions on images/video. This additional requirement of knowledge poses an interesting challenge on top of the classic VQA tasks.

Recent Publications

ContextNet: Representation and exploration for painting classification and retrieval in context

In automatic art analysis, models that besides the visual elements of an artwork represent the relationships between the different …

KnowIT VQA: Answering knowledge-based questions about videos

We propose a novel video understanding task by fusing knowledge-based and video question answering. First, we introduce KnowIT VQA, a …

Speech-driven face reenactment for a video sequence

We present a system for reenacting a person’s face driven by speech. Given a video sequence with the corresponding audio track of …