Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable Scene

编辑：映维 | 分类：CV / XR | 2022年3月23日

Note: We don't have the ability to review paper

PubDate: Mar 2022

Teams: Beijing University of Posts and Telecommunications;Tencent;Chinese Academy of Sciences;University of Trento

Writers: Duo Zheng, Fandong Meng, Qingyi Si, Hairun Fan, Zipeng Xu, Jie Zhou, Fangxiang Feng, Xiaojie Wang

PDF: Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable Scene

Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable Scene

Abstract

Visual dialog has witnessed great progress after introducing various vision-oriented goals into the conversation, especially such as GuessWhich and GuessWhat, where the only image is visible by either and both of the questioner and the answerer, respectively. Researchers explore more on visual dialog tasks in such kind of single- or perfectly co-observable visual scene, while somewhat neglect the exploration on tasks of non perfectly co-observable visual scene, where the images accessed by two agents may not be exactly the same, often occurred in practice. Although building common ground in non-perfectly co-observable visual scene through conversation is significant for advanced dialog agents, the lack of such dialog task and corresponding large-scale dataset makes it impossible to carry out in-depth research. To break this limitation, we propose an object-referring game in non-perfectly co-observable visual scene, where the goal is to spot the difference between the similar visual scenes through conversing in natural language. The task addresses challenges of the dialog strategy in non-perfectly co-observable visual scene and the ability of categorizing objects. Correspondingly, we construct a large-scale multimodal dataset, named SpotDiff, which contains 87k Virtual Reality images and 97k dialogs generated by self-play. Finally, we give benchmark models for this task, and conduct extensive experiments to evaluate its performance as well as analyze its main challenges.

本文链接：https://paper.nweon.com/11907

Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable Scene

您可能还喜欢...

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘

Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable Scene

您可能还喜欢...

Parametric Hear through Equalization for Augmented Reality Audio

Multiscale Sensor Fusion for Display-Centered Head Tracking

3D Face Reconstruction and Gaze Tracking in the HMD for Virtual Interaction

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘