A Collection of Papers and Codes for CVPR2024 AIGC
整理汇总下今年CVPR AIGC相关的论文和代码,具体如下。
欢迎star,fork和PR~
Please feel free to star, fork or PR if helpful~
CVPR2024官网:https://cvpr.thecvf.com/Conferences/2024
CVPR完整论文列表:https://cvpr.thecvf.com/Conferences/2024/AcceptedPapers
开会时间:2024年6月17日-6月21日
论文接收公布时间:2024年2月27日
【Contents】
- 1.图像生成(Image Generation/Image Synthesis)
- 2.图像编辑(Image Editing)
- 3.视频生成(Video Generation/Image Synthesis)
- 4.视频编辑(Video Editing)
- 5.3D生成(3D Generation/3D Synthesis)
- 6.3D编辑(3D Editing)
- 7.多模态大语言模型(Multi-Modal Large Language Model)
- 8.其他多任务(Others)
Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder
- Paper: https://arxiv.org/abs/2403.10255
- Code:
CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization
- Paper: https://arxiv.org/abs/2404.00521
- Code:
- Paper: https://arxiv.org/abs/2311.15773
- Code:
- Paper: https://arxiv.org/abs/2312.15905
- Code:
- Paper:
- Code: https://github.com/haofengl/DragNoise
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization
- Paper: https://arxiv.org/abs/2311.18822
- Code: https://github.com/MoayedHajiAli/ElasticDiffusion-official
- Paper:
- Code:
- Paper:
- Code:
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
- Paper: https://arxiv.org/abs/2403.06775
- Code:
- Paper:
- Code: https://github.com/aim-uofa/FreeCustom
- Paper: https://www.cs.jhu.edu/~alanlab/Pubs24/chen2024towards.pdf
- Code: https://github.com/MrGiovanni/DiffTumor
- Paper: https://arxiv.org/abs/2404.13353
- Code:
- Paper: https://arxiv.org/abs/2311.10329
- Code: https://github.com/CodeGoat24/Face-diffuser?tab=readme-ov-file
- Paper:
- Code: https://github.com/xiefan-guo/initno
- Paper: https://arxiv.org/abs/2401.01952
- Code:
Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models
LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion
- Paper:
- Code: https://github.com/PanchengZhao/LAKE-RED
- Paper: https://arxiv.org/abs/2312.07330
- Code: https://github.com/cvlab-stonybrook/Large-Image-Diffusion
- Paper: https://arxiv.org/abs/2402.08654
- Code: https://github.com/ttchengab/continuous_3d_words_code/
- Paper: https://arxiv.org/abs/2311.15841
- Code:
LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model
- Paper:
- Code: https://github.com/ewrfcas/LeftRefill
- Paper: https://arxiv.org/abs/2404.02883
- Code:
Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models
- Paper: https://arxiv.org/abs/2404.15081
- Code:
- Paper:
- Code: https://github.com/cszy98/PLACE
- Paper:
- Code: https://github.com/WUyinwei-hah/RRNet
- Paper: https://arxiv.org/abs/2401.09603
- Code: https://github.com/google-research/google-research/tree/master/cmmd
- Paper: https://arxiv.org/abs/2401.08053
- Code:
- Paper: https://arxiv.org/abs/2308.09972
- Code: https://github.com/bcmi/Object-Shadow-Generation-Dataset-DESOBAv2
- Paper: https://arxiv.org/abs/2402.17563
- Code:
- Paper: https://arxiv.org/abs/2403.18978
- Code:
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
- Paper: https://arxiv.org/abs/2404.00922
- Code:
- Paper:
- Code: https://github.com/Snowfallingplum/CSD-MT
- Paper: https://arxiv.org/abs/2311.18608
- Code: https://github.com/HyelinNAM/ContrastiveDenoisingScore
- Paper:
- Code: https://github.com/HansSunY/DiffAM
- Paper: https://arxiv.org/abs/2312.10113
- Code: https://github.com/guoqincode/Focus-on-Your-Instruction
- Paper: https://arxiv.org/abs/2403.09632
- Code: https://github.com/guoqincode/Focus-on-Your-Instruction
- Paper: hhttps://arxiv.org/abs/2312.04965
- Code: https://github.com/sled-group/InfEdit
Person in Place: Generating Associative Skeleton-Guidance Maps for Human-Object Interaction Image Editing
- Paper: https://arxiv.org/abs/2303.17546
- Code: https://github.com/YangChangHee/CVPR2024_Person-In-Place_RELEASE?tab=readme-ov-file
- Paper: https://arxiv.org/abs/2403.00483
- Code:
Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer
SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting
- Paper: https://arxiv.org/abs/2402.18848
- Code:
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation
- Paper: https://arxiv.org/abs/2404.00234
- Code: https://github.com/taegyeong-lee/Grid-Diffusion-Models-for-Text-to-Video-Generation
Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation guided by the Characteristic Dance Primitives
- Paper: https://arxiv.org/abs/2311.17590
- Code:
A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing
- Paper:
- Code: https://github.com/zhangguiwei610/CAMEL
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
- Paper: https://arxiv.org/abs/2312.00845
- Code: https://github.com/HyeonHo99/Video-Motion-Customization
Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior
DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
- Paper: https://arxiv.org/abs/2312.12274
- Code: https://github.com/Peter-Kocsis/IntrinsicImageDiffusion
- Paper: https://arxiv.org/abs/2312.00063
- Code: https://github.com/yifanlu0227/ChatSim?tab=readme-ov-file
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion
Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering
- Paper: https://arxiv.org/abs/2403.07773
- Code: https://github.com/zoomin-lee/SemCity?tab=readme-ov-file
- Paper:
- Code: https://github.com/Zhiyuan-R/Tiger-Diffusion
- Paper:
- Code: https://github.com/hancyran/LiDAR-Diffusion
- Paper: https://arxiv.org/abs/2404.06244
- Code:
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
- Paper: https://arxiv.org/abs/2312.02974
- Code: https://github.com/Understanding-Visual-Datasets/VisDiff
- Paper: https://arxiv.org/abs/2404.11207
- Code: https://github.com/zycheiheihei/transferable-visual-prompting
- Paper: https://arxiv.org/abs/2403.19949
- Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP
FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
- Paper: https://arxiv.org/abs/2404.16123
- Code:
- Paper: https://arxiv.org/abs/2404.00909
- Code:
Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
- Paper: https://arxiv.org/abs/2403.07839
- Code:
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
- Paper: https://arxiv.org/abs/2404.09011
- Code:
- Paper: https://arxiv.org/abs/2404.01156
- Code:
- Paper: https://arxiv.org/abs/2403.12532
- Code:
AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error
持续更新~