Phrase localization-based visually grounded paraphrase identification