DOI: 10.1145/3811818 ISSN: 1551-6857

Speech-Driven 3D Facial Animation with Natural Head Movements

Kuiyuan Sun, Jichao Zhang, Wei Wang, Yao Zhao, Nicu Sebe

Speech-driven 3D facial animation has been studied for a long time, yet many challenges still prevent achieving truly natural results. One major issue is that current methods often overlook head motion during speech. To address this, we conducted a preliminary investigation and identified several key obstacles: the scarcity of 3D facial animation datasets with head motion and the limitations of regular sequence prediction models ( e.g. , Transformer, LSTM, and GRU) which are designed for discrete dynamic sequences and do not align well with continuous 3D head motion sequences. To solve these issues, we propose a head motion prediction module. This module uses audio and the initial motion state of the mesh to predict head motion. By employing ordinary differential equations (ODE) to model continuous dynamic sequences, it predicts head movements that closely resemble real head motion, making the animation more realistic and natural. Additionally, recognizing that the head motion generation given audio is not a one-to-one mapping problem, we introduce a noise module to help the head motion prediction module generate varied head motions given the same audio input. We also observed that previous facial animation methods primarily focus on generating vertices for the mouth region but use a single model to generate the entire face. This approach wastes some of the model’s fitting capacity on other regions. To solve this problem, we propose a cascaded mesh generation module that uses two modules to separately generate the vertex of mouth region and other facial regions. Extensive experiments and a perceptual user study show that our approach outperforms existing methods and produces relatively natural head motion.

More from our Archive