You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Existing methods for facial identity transfer for diffusion denoising image generation models face challenges in achieving high fidelity and detailed identity (ID) consistency, primarily due to insufficient fine-grained control over facial areas and the lack of a comprehensive strategy for ID preservation by fully considering intricate facial details. To address these limitations, the authors introduce ConsistentID, an innovative method crafted for diverse identity-preserving portrait generation under fine-grained multimodal facial prompts, utilizing only a single reference image.
ConsistentID is comprised of three key components:
A fine-tuned IP-Adapter-FaceID-Plus module to capture the overall facial context from the reference image.
Expanded textual descriptions of generated from the reference face image using LLAVA 1.5 to further refine facial features.
An ID-preservation network injecting Perceiver-remapped CLIP embeddings of separated facial regions into the embeddings of the expanded text prompt, optimized through the facial attention localization strategy aimed at preserving ID consistency in facial regions.
Together, these components significantly enhance the accuracy of ID preservation by introducing fine-grained multimodal ID information from facial regions.
Open source status
The model implementation is available.
The model weights are available (Only relevant if addition is not a scheduler).
Model/Pipeline/Scheduler description
Existing methods for facial identity transfer for diffusion denoising image generation models face challenges in achieving high fidelity and detailed identity (ID) consistency, primarily due to insufficient fine-grained control over facial areas and the lack of a comprehensive strategy for ID preservation by fully considering intricate facial details. To address these limitations, the authors introduce ConsistentID, an innovative method crafted for diverse identity-preserving portrait generation under fine-grained multimodal facial prompts, utilizing only a single reference image.
ConsistentID is comprised of three key components:
Together, these components significantly enhance the accuracy of ID preservation by introducing fine-grained multimodal ID information from facial regions.
Open source status
Provide useful links for the implementation
Arxiv: https://arxiv.org/pdf/2404.16771
Github: https://github.com/JackAILab/ConsistentID
Contact: @JackAILab
The text was updated successfully, but these errors were encountered: