Skip to content

YunjinPark/awesome_talking_face_generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 

Repository files navigation

Awesome talking face generation

papers & codes

2023

title paper code dataset keywords
CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior CVPR(23) paper code BIWI, VOCA
DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation CVPR(23) paper HDTF Diffusion
AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction CVPR(23) paper Multiface 3D
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert CVPR(23) paper code LRS2
LipFormer: High-fidelity and Generalizable Talking Face Generation with A Pre-learned Facial Codebook CVPR(23) paper LRS2, FFHQ
Parametric Implicit Face Representation for Audio-Driven Facial Reenactment CVPR(23) paper HDTF
Identity-Preserving Talking Face Generation with Landmark and Appearance Priors CVPR(23) paper code LRS2, LRS3
High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning CVPR(23) paper MEAD emotion
Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks InterSpeech(23) paper MEAD emotion
EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation ICCV(23) paper code(not yet) emotion
Emotionally Enhanced Talking Face Generation paper code CREMA-D emotion
DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video AAAI(23) paper code
CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior paper code 3D
GENEFACE: GENERALIZED AND HIGH-FIDELITY AUDIO-DRIVEN 3D TALKING FACE SYNTHESIS ICLR (23) paper code NeRF
OPT: ONE-SHOT POSE-CONTROLLABLE TALKING HEAD GENERATION paper
LipNeRF: What is the right feature space to lip-sync a NeRF? paper NeRF
Audio-Visual Face Reenactment WACV (23) paper code
Towards Generating Ultra-High Resolution Talking-Face Videos With Lip Synchronization WACV (23) paper
StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles AAAI(23) paper code
DiffTalk: Crafting Diffusion Models for Generalized Talking Head Synthesis paper proj Diffusion
Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation paper proj Diffusion
Speech Driven Video Editing via an Audio-Conditioned Diffusion Model paper code Diffusion
TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles paper Text-Annotated MEAD Text

2022

title paper code dataset keywords
Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors paper proj
SPACE: Speech-driven Portrait Animation with Controllable Expression ICCV(23) paper Pose, Emotion
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation CVPR(23) paper code
Compressing Video Calls using Synthetic Talking Heads BMVC (22) paper application
EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model SIGGRAPH (22) paper emotion
Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis ECCV(22) paper code
Expressive Talking Head Generation with Granular Audio-Visual Control CVPR(22) paper
Talking Face Generation With Multilingual TTS CVPR(22) paper code -
Deep Learning for Visual Speech Analysis: A Survey paper survey
StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN paper code stylegan
Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation ECCV(22) paper code(coming soon) NeRF
Cross-Modal Mutual Learning for Audio-Visual Speech Recognition and Manipulation paper
SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip Memory AAAI(22) paper(temp) LRW, LRS2, BBC News
DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering paper NeRF
Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos paper
Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions paper
DialogueNeRF: Towards Realistic Avatar Face-to-face Conversation Video Generation paper
Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion paper
StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation paper -
AUTOLV: AUTOMATIC LECTURE VIDEO GENERATOR paper
Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement paper

2021

title paper code dataset
Depth-Aware Generative Adversarial Network for Talking Head Video Generation paper code
paper code
Parallel and High-Fidelity Text-to-Lip Generation paper
[Survey]Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesis - paper
FaceFormer: Speech-Driven 3D Facial Animation with Transformers CVPR(22) paper code
Voice2Mesh: Cross-Modal 3D Face Model Generation from Voices paper code
FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning ICCV paper code
Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis paper code
Audio-Driven Emotional Video Portraits CVPR paper code MEAD, LRW
LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization CVPR paper
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation CVPR paper code VoxCeleb2, LRW
Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset CVPR paper code HDTF
MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement ICCV paper code(coming soon)
AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis ICCV paper code
Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation AAAI paper code(coming soon) Mocap dataset
Visual Speech Enhancement Without A Real Visual Stream paper
Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary paper code
Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion IJCAI paper code VoxCeleb, GRID, LRW
3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head paper
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person paper VoxCeleb2, Obama

2020

title paper code dataset
[Survey]What comprises a good talking-head video generation?: A survey and benchmark paper code
One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing CVPR(21) paper code
Speech Driven Talking Face Generation from a Single Image and an Emotion Condition paper code CREMA-D
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild ACMMM paper code LRS2
Talking-head Generation with Rhythmic Head Motion ECCV paper code Crema, Grid, Voxceleb, Lrs3
MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation ECCV paper code VoxCeleb2, AffectNet
Neural voice puppetry:Audio-driven facial reenactment ECCV paper
Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars ECCV paper code
HeadGAN:Video-and-Audio-Driven Talking Head Synthesis paper VoxCeleb2
MakeItTalk: Speaker-Aware Talking Head Animation paper code, code VoxCeleb2, VCTK
Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose - paper code ImageNet, FaceWarehouse, LRW
Photorealistic Lip Sync with Adversarial Temporal Convolutional Networks paper
SPEECH-DRIVEN FACIAL ANIMATION USING POLYNOMIAL FUSION OF FEATURES paper LRW
Animating Face using Disentangled Audio Representations WACV paper
Everybody’s Talkin’: Let Me Talk as You Want paper
Multimodal Inputs Driven Talking Face Generation With Spatial-Temporal Dependency paper
Speech Driven Talking Face Generation from a Single Image and an Emotion Condition paper

2019

title paper code dataset
Hierarchical Cross-Modal Talking Face Generation with Dynamic Pixel-Wise Loss CVPR paper code VGG Face, LRW

datasets

metrics

  • PSNR (peak signal-to-noise ratio)
  • SSIM (structural similarity index measure)
  • LMD (landmark distance error)
  • LRA (lip-reading accuracy) -
  • FID (Fréchet inception distance)
  • LSE-D (Lip Sync Error - Distance)
  • LSE-C (Lip Sync Error - Confidence)
  • LPIPS (Learned Perceptual Image Patch Similarity) -
  • NIQE (Natural Image Quality Evaluator) -