空 挡 广 告 位 | 空 挡 广 告 位

Visual Acoustic Matching

Note: We don't have the ability to review paper

PubDate: Jun 2022

Teams: UT Austin, 2Stanford University, 3Reality Labs at Meta, 4Meta AI

Writers: Changan Chen, Ruohan Gao, Paul Calamia, Kristen Grauman

PDF: Visual Acoustic Matching

Project: Visual Acoustic Matching

Abstract

We introduce the visual acoustic matching task, in which an audio clip is transformed to sound like it was recorded in a target environment. Given an image of the target environment and a waveform for the source audio, the goal is to re-synthesize the audio to match the target room acoustics as suggested by its visible geometry and materials. To address this novel task, we propose a cross-modal transformer model that uses audio-visual attention to inject visual properties into the audio and generate realistic audio output. In addition, we devise a self-supervised training objective that can learn acoustic matching from in-the-wild Web videos, despite their lack of acoustically mismatched audio. We demonstrate that our approach successfully translates human speech to a variety of real-world environments depicted in images, outperforming both traditional acoustic matching and more heavily supervised baselines.

您可能还喜欢...

Paper