Visual to Sound: Generating Natural Sound for Videos in the Wild

编辑：映维 | 分类：Perception / XR | 2020年7月29日

Note: We don't have the ability to review paper

PubDate: Jun 2018

Teams: University of North Carolina at Chapel Hill；Adobe Research

Writers: Yipin Zhou, Zhaowen Wang, Chen Fang, Trung Bui, Tamara L. Berg

PDF: Visual to Sound: Generating Natural Sound for Videos in the Wild

Visual to Sound: Generating Natural Sound for Videos in the Wild

Abstract

As two of the five traditional human senses (sight, hearing, taste, smell, and touch), vision and sound are basic sources through which humans understand the world. Often correlated during natural events, these two modalities combine to jointly affect human perception. In this paper, we pose the task of generating sound given visual input. Such capabilities could help enable applications in virtual reality (generating sound for virtual scenes automatically) or provide additional accessibility to images or videos for people with visual impairments. As a first step in this direction, we apply learning-based methods to generate raw waveform samples given input video frames. We evaluate our models on a dataset of videos containing a variety of sounds (such as ambient sounds and sounds from people/animals). Our experiments show that the generated sounds are fairly realistic and have good temporal synchronization with the visual inputs.

本文链接：https://paper.nweon.com/4344

Visual to Sound: Generating Natural Sound for Videos in the Wild

您可能还喜欢...

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘

Visual to Sound: Generating Natural Sound for Videos in the Wild

您可能还喜欢...

“You Better Eat to Survive”: Exploring Cooperative Eating in Virtual Reality Games

From 2D to 3D: Facilitating Single-Finger Mid-Air Typing on Virtual Keyboards with Probabilistic Touch Modeling

HyperGuider: Virtual Reality Framework for Interactive Path Planning of Quadruped Robot in Cluttered and Multi-Terrain Environments

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘