空 挡 广 告 位 | 空 挡 广 告 位

Reasonable Perception: Connecting Vision and Language Systems for Validating Scene Descriptions

Note: We don't have the ability to review paper

PubDate: March 2018

Teams: Massachusetts Institute of Technology

Writers: Leilani H. Gilpin;Cagri Zaman;Danielle Olson;Ben Z. Yuan

PDF: Reasonable Perception: Connecting Vision and Language Systems for Validating Scene Descriptions

Abstract

Understanding explanations of machine perception is an important step towards developing accountable, trustworthy machines. Furthermore, speech and vision are the primary modalities by which humans collect information about the world, but the linking of visual and natural language domains is a relatively new pursuit in computer vision, and it is difficult to test performance in a safe environment. To couple human visual understanding and machine perception, we present an explanatory system for creating a library of possible context-specific actions associated with 3D objects in immersive virtual worlds. We also contribute a novel scene description dataset, generated natively in virtual reality containing speech, image, gaze, and acceleration data. We discuss the development of a hybrid machine learning algorithm linking vision data with environmental affordances in natural language. Our findings demonstrate that it is possible to develop a model which can generate interpretable verbal descriptions of possible actions associated with recognized 3D objects within immersive VR environments.

您可能还喜欢...

Paper