Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images

编辑：映维 | 分类：Perception / XR | 2022年11月1日

Note: We don't have the ability to review paper

PubDate: Oct 2022

Teams: Tokuyama College；The University of Tokyo；Doshisha University；Ritsumeikan University

Writers: Hien Ohnaka, Shinnosuke Takamichi, Keisuke Imoto, Yuki Okamoto, Kazuki Fujii, Hiroshi Saruwatari

PDF: Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images

Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images

Abstract

We propose a method for synthesizing environmental sounds from visually represented onomatopoeias and sound sources. An onomatopoeia is a word that imitates a sound structure, i.e., the text representation of sound. From this perspective, onoma-to-wave has been proposed to synthesize environmental sounds from the desired onomatopoeia texts. Onomatopoeias have another representation: visual-text representations of sounds in comics, advertisements, and virtual reality. A visual onomatopoeia (visual text of onomatopoeia) contains rich information that is not present in the text, such as a long-short duration of the image, so the use of this representation is expected to synthesize diverse sounds. Therefore, we propose visual onoma-to-wave for environmental sound synthesis from visual onomatopoeia. The method can transfer visual concepts of the visual text and sound-source image to the synthesized sound. We also propose a data augmentation method focusing on the repetition of onomatopoeias to enhance the performance of our method. An experimental evaluation shows that the methods can synthesize diverse environmental sounds from visual text and sound-source images.

本文链接：https://paper.nweon.com/13337

Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images

您可能还喜欢...

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘

Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images

您可能还喜欢...

Sound propagation in realistic interactive 3D scenes with parameterized sources using deep neural operators

Spatializing experience: a framework for the geolocalization, visualization and exploration of historical data using VR/AR technologies

Millimeter Wave and Free-space-optics for Future Dual-connectivity 6DOF Mobile Multi-user VR Streaming

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘