Name		Name	Last commit message	Last commit date
parent directory ..
data_processing		data_processing
demos		demos
evaluation		evaluation
talkinghead_conf		talkinghead_conf
training		training
utils		utils
LICENSE		LICENSE
README.md		README.md
download_assets.sh		download_assets.sh
teaser.png		teaser.png

README.md

EMOTE : Emotional Speech-Driven Animation with Content-Emotion Disentanglement

Radek Daněček · Kiran Chhatre . Shashank Tripathi . Yandong Wen . Michael J. Black · Timo Bolkart

Siggraph Asia 2023, Sydney, Australia

This is the official implementation of EMOTE : Emotional Speech-Driven Animation with Content-Emotion Disentanglement. EMOTE takes speech audio and an emotion and intensity labels on the input and produces a talking head avatar that correctly articulates the words spoken in the audio while expressing the specified emotion.

Warning

Some people experience issues getting the correct results EMOTE from the latest version of the repo. If the results have obviously wrong jitter, please default to the EMOTE 2.0 release commit:

git checkout 076e4acd5f476dba4d741462760d4011d341c4ec

News

(20th Dec 2023) Docker installation now available. Please go to the docker folder
(13th Dec. 2023) EMOTE v2 is now out! It is trained on a newer iteration of the ++data and should give better results overall. EMOTE v1 is still available. Please see the demo script for details.

Installation

Follow the steps at the root of this repo. If for some reason the environment from there is not valid, create one using a .yml file from envs.
In order to run the demos you will need to download and unzip a few assets. Run download_assets.sh to do that:

bash download_assets.sh

(Optional for inference, required for training) Basel Face Model texture space adapted to FLAME. Unfortunately, we are not allowed to distribute the texture space, since the license does not permit it. Therefore, please go to the BFM page sign up and dowload BFM. Then use the tool from this repo to convert the texture space to FLAME. Put the resulting texture model file file into ../../assets/FLAME/texture as FLAME_albedo_from_BFM.npz

Demos

Then activate your environment:

conda activate work38

Create a speech-driven animation for all of 8 basic emotions

If you want to run EMOTE on the demo audio file, run the following:

python demo/demo_eval_talking_head_on_audio.py

The script will save the output meshes and videos into .results/ for each of the 9 basic emotions.

To run the demo on any audio, run:

python demo/demo_eval_talking_head_on_audio.py --path_to_audio <your_wav_file> --output_folder <your_output_folder>

If you only want results for a particular emotion, specify --emotion followed by one of: Neutral, Happy, Sad, Surprise, Fear, Disgust,Anger, Contempt. You may also specify more of them and sepe.ch consists of:

MEAD data processing
- including pseudo-GT extraction
- for more information about this step go to data processing
Training video emotion classifier on MEAD
- this classifier predicts the emotion and intensity label on MEAD videos
- an sequence-aggregated emotion feature is a byproduct of the classifier and it will be used for the emotion loss in training EMOTE
- for more information about this step go to the Video Emotion Recognition project
Training the FLINT Motion Prior
- the motion prior is a critical component of EMOTE. It is impossible to apply perceptual losses and not get uncanny artifacts without it
- for more information, go to the Motion Prior project
Training the first stage of EMOTE
- training only with vertex error loss
- refer to training/train_emote_stage_1.py. Please read the instruction in the comments and then run the script.
Training the second stage of EMOTE
- finetuning the previous stage with neural renderering, with perceptual losses, with content-emotion disentanglement mechanism
- refer to training/train_emote_stage_1.py. Please read the instruction in the comments, replace the paths to the models you want to be finetuning with, and then run the script.

Citation

If you use this work in your publication, please cite the following:

@inproceedings{EMOTE,
  title = {Emotional Speech-Driven Animation with Content-Emotion Disentanglement},
  author = {Daněček, Radek and Chhatre, Kiran and Tripathi, Shashank and Wen, Yandong and Black, Michael and Bolkart, Timo},
  publisher = {ACM},
  month = dec,
  year = {2023},
  doi = {10.1145/3610548.3618183},
  url = {https://emote.is.tue.mpg.de/index.html},
  month_numeric = {12}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TalkingHead

TalkingHead

README.md

EMOTE : Emotional Speech-Driven Animation with Content-Emotion Disentanglement

Siggraph Asia 2023, Sydney, Australia

Warning

News

Installation

Demos

Create a speech-driven animation for all of 8 basic emotions

Citation

Files

TalkingHead

Directory actions

More options

Directory actions

More options

Latest commit

History

TalkingHead

Folders and files

parent directory

README.md

EMOTE : Emotional Speech-Driven Animation with Content-Emotion Disentanglement

Siggraph Asia 2023, Sydney, Australia

Warning

News

Installation

Demos

Create a speech-driven animation for all of 8 basic emotions

Citation