Say Anything with Any Style

This repository provides the official PyTorch implementation for the following paper:
Say Anything with Any Style
Shuai Tan, et al.
In AAAI, 2024.

Given a source image and a style reference clip, SAAS generates stylized talking faces driven by audio. The lip motions are synchronized with the audio, while the speaking styles are controlled by the style clips. We also support video-driven style editing by inputting a source video. The pipeline of our SAAS is as follows:

Requirements

We train and test based on Python 3.8 and Pytorch. To install the dependencies run:

conda create -n SAAS python=3.8
conda activate SAAS

python packages

pip install -r requirements.txt

Inference

Run the demo in audio-driven setting:

python audio_driven/train_test/inference.py --img_path path/to/image --wav_path path/to/audio --img_3DMM_path path/to/img_3DMM --style_path path/to/style --save_path path/to/save

The result will be stored in save_path.

Run the demo in video-driven setting:

python video_driven/inference.py --img_path path/to/image --wav_path path/to/audio --video_3DMM_path path/to/video_3DMM --style_path path/to/style --save_path path/to/save

The result will be stored in save_path.

img_path used should be first cropped using scripts crop_image.py

Download checkpoints for video-driven setting and put them into ./checkpoints.
Our audio encoder can be viewed as the combination of SadTalker' Audio encoder and our video-encoder. You can download the checkpoint of SadTalker' Audio encoder and our video-encoder to support audio-driven setting.

Acknowledgement

Some code are borrowed from following projects:

Thanks for their contributions!

We would like to thank Xinya Ji, Yifeng Ma and Zhiyao Sun for their generous help.

Citation

If you find this codebase useful for your research, please use the following entry.

@inproceedings{tan2024say,
  title={Say Anything with Any Style},
  author={Tan, Shuai and Ji, Bin and Ding, Yu and Pan, Ye},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={5},
  pages={5088--5096},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
audio2pose		audio2pose
audio_driven		audio_driven
configs		configs
data_preprocess		data_preprocess
demo		demo
style_extraction		style_extraction
video_driven		video_driven
README.md		README.md
distributed.py		distributed.py
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

audio2pose

audio2pose

audio_driven

audio_driven

configs

configs

data_preprocess

data_preprocess

demo

demo

style_extraction

style_extraction

video_driven

video_driven

README.md

README.md

distributed.py

distributed.py

requirement.txt

requirement.txt

Repository files navigation

Say Anything with Any Style

Requirements

Inference

Acknowledgement

Citation

About

Releases

Packages

Languages

tanshuai0219/SAAS

Folders and files

Latest commit

History

Repository files navigation

Say Anything with Any Style

Requirements

Inference

Acknowledgement

Citation

About

Resources

Stars

Watchers

Forks

Languages