Lip-Synchronized 3D Facial Animation Using Audio-Driven Graph Convolutional Autoencoder
Abstract
The majority of state-of-the-art audio-driven facial animation methods implement a differentiable rendering phase within their models, and as such, their output is a 2D raster image. However, existing development pipelines for MR (Mixed Reality) applications utilize platform-specific render engines optimized for specific HMDs (Head-mounted displays), which in turn necessitates the use of a technique that works directly on the facial mesh geometry. This work proposes an innovative lip-synchronized, audio-driven 3D face animation method utilizing a graph convolutional autoencoder that learns detailed facial deformations of a talking subject while generating a compact latent representation of the 3D model. The representation is later conditioned with the processed audio data to achieve synchronized lip and jaw movement while retaining the subject’s facial features. The audio processing involves the extraction of semantic features, which strongly correlate with facial deformation and expression. Qualitative and quantitative experiments exhibit the method’s potential usage in MR applications, as well as shed light on some of the disadvantages of the current approaches.
Authors
- Ivaylo Bozhilov
- Krasimir Tonchev
- Nikolay Neshov
- Agata Manolova
Venue
2023 IEEE 12th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS)
Links
https://ieeexplore.ieee.org/document/10348935
