Phrase localization-based visually grounded paraphrase identification

Type