Voice Conversion System Based on Deep Neural Network Capable of Parallel Computation

编辑：映维 | 分类：XR | 2021年9月22日

Note: We don't have the ability to review paper

PubDate: August 2018

Teams: The University of Tokyo

Writers: Kunihiko Sato; Jun Rekimoto

PDF: Voice Conversion System Based on Deep Neural Network Capable of Parallel Computation

Voice Conversion System Based on Deep Neural Network Capable of Parallel Computation

Abstract

Voice conversion (VC) algorithms modify the speech of a particular speaker to resemble that of another speaker. Many existing virtual reality (VR) and augmented reality (AR) systems make it possible to change the appearance of users, and if VC is added, then users can also change their voice. State-of-the-art VC methods employ recurrent neural networks (RNNs), including long short-term memory (LSTM) networks, for generating converted speech. However, it is difficult for RNNs to perform parallel computations because the computations at each timestep depend on the results of a previous timestep, which prevents them from operating in real-time. In contrast, we propose a novel VC approach based on a dilated convolutional neural network (Dilated CNN), which is a deep neural network model that allows for parallel computation. We adapted the Dilated CNN model to perform convolutions in both the forward and reverse directions to ensure the learning is successful. In addition, to ensure the model can be parallelized during both the training and inference phases, we developed a model architecture that predicts all output values from the value of the input speech, and does not rely on predicted values for the next input. The results demonstrate that the proposed VC approach has a faster conversion rate relative to that of state-of-the-art methods, while improving speech quality a little and maintaining speaker similarity.

本文链接：https://paper.nweon.com/11232

Voice Conversion System Based on Deep Neural Network Capable of Parallel Computation

您可能还喜欢...

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘

Voice Conversion System Based on Deep Neural Network Capable of Parallel Computation

您可能还喜欢...

DeltaTouch: a 3D Haptic Display for Delivering Multimodal Tactile Stimuli at the Palm

Effect of Render Resolution on Gameplay Experience, Performance, and Simulator Sickness in Virtual Reality Games

Development of Head-Mounted Projection Displays for Distributed, Collaborative, Augmented Reality Applications

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘