Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data

编辑：映维 | 分类：Perception / XR | 2020年9月14日

Note: We don't have the ability to review paper

PubDate: July 11, 2020

Teams: Facebook Reality Labs

Writers: Haytham M. Fayek, Anurag Kumar

PDF: Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data

Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data

Abstract

Recognizing sounds is a key aspect of computational audio scene analysis and machine perception. In this paper, we advocate that sound recognition is inherently a multi-modal audiovisual task in that it is easier to differentiate sounds using both the audio and visual modalities as opposed to one or the other. We present an audiovisual fusion model that learns to recognize sounds from weakly labeled video recordings. The proposed fusion model utilizes an attention mechanism to dynamically combine the outputs of the individual audio and visual models. Experiments on the large scale sound events dataset, AudioSet, demonstrate the efficacy of the proposed model, which outperforms the single-modal models, and state-of-the-art fusion and multi-modal models. We achieve a mean Average Precision (mAP) of 46.16 on Audioset, outperforming prior state of the art by approximately +4.35 mAP (relative: 10.4%).

本文链接：https://paper.nweon.com/6534

Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data

您可能还喜欢...

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘

Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data

您可能还喜欢...

A Perceptual Evaluation of Generative Adversarial Network Real-Time Synthesized Drum Sounds in a Virtual Environment

Investigating Exit Choice in Built Environment Evacuation combining Immersive Virtual Reality and Discrete Choice Modelling

IllumiNet: Transferring Illumination from Planar Surfaces to Virtual Objects in Augmented Reality

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘