Visual Acoustic Matching

编辑：映维 | 分类：Perception / XR | 2022年6月25日

Note: We don't have the ability to review paper

PubDate: Jun 2022

Teams: UT Austin, 2Stanford University, 3Reality Labs at Meta, 4Meta AI

Writers: Changan Chen, Ruohan Gao, Paul Calamia, Kristen Grauman

PDF: Visual Acoustic Matching

Project: Visual Acoustic Matching

Visual Acoustic Matching

Abstract

We introduce the visual acoustic matching task, in which an audio clip is transformed to sound like it was recorded in a target environment. Given an image of the target environment and a waveform for the source audio, the goal is to re-synthesize the audio to match the target room acoustics as suggested by its visible geometry and materials. To address this novel task, we propose a cross-modal transformer model that uses audio-visual attention to inject visual properties into the audio and generate realistic audio output. In addition, we devise a self-supervised training objective that can learn acoustic matching from in-the-wild Web videos, despite their lack of acoustically mismatched audio. We demonstrate that our approach successfully translates human speech to a variety of real-world environments depicted in images, outperforming both traditional acoustic matching and more heavily supervised baselines.

本文链接：https://paper.nweon.com/12424

Visual Acoustic Matching

您可能还喜欢...

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘

Visual Acoustic Matching

您可能还喜欢...

Video see-through augmented reality stereo microscope with customized interpupillary distance design

Effect of Sense of Embodiment on Curvature Redirected Walking Thresholds

Tackling Problems of Marker-Based Augmented Reality Under Water

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘