LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization

编辑：映维 | 分类：CV / XR | 2021年6月23日

Note: We don't have the ability to review paper

PubDate: Jun 2021

Teams: Google Research Indian Institute of Technology Kharagpur

Writers: Avisek Lahiri, Vivek Kwatra, Christian Frueh, John Lewis, Chris Bregler

PDF: LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization

LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization

Abstract

In this paper, we present a video-based learning framework for animating personalized 3D talking faces from audio. We introduce two training-time data normalizations that significantly improve data sample efficiency. First, we isolate and represent faces in a normalized space that decouples 3D geometry, head pose, and texture. This decomposes the prediction problem into regressions over the 3D face shape and the corresponding 2D texture atlas. Second, we leverage facial symmetry and approximate albedo constancy of skin to isolate and remove spatio-temporal lighting variations. Together, these normalizations allow simple networks to generate high fidelity lip-sync videos under novel ambient illumination while training with just a single speaker-specific video. Further, to stabilize temporal dynamics, we introduce an auto-regressive approach that conditions the model on its previous visual state. Human ratings and objective metrics demonstrate that our method outperforms contemporary state-of-the-art audio-driven video reenactment benchmarks in terms of realism, lip-sync and visual quality scores. We illustrate several applications enabled by our framework.

本文链接：https://paper.nweon.com/10280

LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization

您可能还喜欢...

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘

LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization

您可能还喜欢...

Gaze-Vergence-Controlled See-Through Vision in Augmented Reality

FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation

Auditory Feedback to Make Walking in Virtual Reality More Accessible

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘