Towards Practical Plug-and-Play Diffusion Models （CVPR 2023）

Official Pytorch Implementation of the paper "Towards Practical Plug-and-Play Diffusion Models". This repository contains the code for guidance with 1) Finetuned models on forward diffused data 2) Multi-Expert strategy 3) PPAP, which are used in the paper.

This repository is based on following repositories with some modifications:

Plan

Release code.
Make checkpoints available.
Make PPAP data available.

Requirements

For distributed training, MPICH should be installed with following commands.

apt install mpich
pip install git+https://github.com/openai/CLIP.git --no-deps

For installing required python packages, use this commands.

pip install -r requirements.txt

Imagenet Class Guidance for ADM

A. Prepare pre-trained diffusion models.

For the pre-trained diffusion model, we use ADM which trained on imagenet 256x256 dataset. Checkpoint of this model is available at 256x256_diffusion_uncond.pt.

Download it and save on the path [diffusion_path].

B. Train

Our code supports training 1) finetuned model 2) multi-experts 3) PPAP. Here is commands for these.

Finetune off-the-shelf models on forward diffused data.

export PYTHONPATH=$PYTHONPATH:$(pwd)
MODEL_FLAGS="--iterations 300000 --anneal_lr True --batch_size 256 --lr 1e-4 --weight_decay 0.05 --save_interval 10000"
CLASSIFIER_FLAGS="--image_size 256 --classifier_name [classifier name: ResNet18, ResNet50, ResNet152, DEIT]"
python python_scripts/classifier_train.py --log_path [directory for logging] --data_dir [ImageNet1k training dataset path] --method "finetune" $MODEL_FLAGS $CLASSIFIER_FLAGS --gpus 0

Multi-Experts that are supervisely trained.

export PYTHONPATH=$PYTHONPATH:$(pwd)
MODEL_FLAGS="--iterations 300000 --anneal_lr True --batch_size 256 --lr 1e-4 --weight_decay 0.05 --save_interval 10000"
CLASSIFIER_FLAGS="--image_size 256 --classifier_name [classifier name: ResNet18, ResNet50, ResNet152, DEIT]"
python python_scripts/classifier_train.py --log_path [directory for logging] --data_dir [ImageNet1k training dataset path] $MODEL_FLAGS $CLASSIFIER_FLAGS --gpus 0 --n_experts [Number of experts] --method "multi_experts"

PPAP.

For finetune off-the-shelf models with PPAP framework, we should generate synthetic images from unconditional diffusion models.

The following command will generate these data from ADM unconditional 256x256 diffusion model:

SAMPLE_FLAGS="--batch_size 100 --num_samples 500000  --timestep_respacing ddim25 --use_ddim True"
MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond False --diffusion_steps 1000 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True"
mpiexec -n [number of gpus] python python_scripts/generate_dataset.py --log_path [path for saving dataset] $MODEL_FLAGS $SAMPLE_FLAGS --gpus [Gpu ids] --model_path [diffusion_path]

Instead of this, you can download generated data from ADM unconditional 256x256 diffusion model from link

Then, this command will train PPAP with synthetic data.

export PYTHONPATH=$PYTHONPATH:$(pwd)
MODEL_FLAGS="--iterations 300000 --anneal_lr True --batch_size 256 --lr 1e-4 --weight_decay 0.05 --save_interval 10000"
CLASSIFIER_FLAGS="--image_size 256 --classifier_name [classifier name: ResNet18, ResNet50, ResNet152, DEIT] --lora_alpha 8 --gamma 16"
python python_scripts/classifier_train.py --log_path [directory for logging] --data_dir [Synthetic data path] $MODEL_FLAGS $CLASSIFIER_FLAGS  --gpus 0 --n_experts [Number of experts] --method "ppap"

B.1 Enabling DDP for training

If mpich is installed, distributed data parallel (DDP) can be enabled for training. For DDP with k gpus, --batch_size should be divided by k, mpiexec -n k should be specified in front of python execution command, and --gpu option should be set by gpu ids that will be used.

For example, above finetuning off-the-shelf models with DDP on 0, 1, 2, 3 gpus can be executed with following commands:

 export PYTHONPATH=$PYTHONPATH:$(pwd)
 MODEL_FLAGS="--iterations 300000 --anneal_lr True --batch_size 64 --lr 1e-4 --weight_decay 0.05 --save_interval 10000"
 CLASSIFIER_FLAGS="--image_size 256 --classifier_name [classifier name: ResNet18, ResNet50, ResNet152, DEIT]"
 mpiexec -n 4 python python_scripts/classifier_train.py --log_path [directory for logging] --data_dir [ImageNet1k training dataset path] --method "finetune" $MODEL_FLAGS $CLASSIFIER_FLAGS --gpus 0 1 2 3

C Trained checkpoint

Model	Finetune	Multi-experts-5	PPAP-5
ResNet50	Model	experts [0, 200], [200, 400], [400, 600] [600, 800] [800, 1000]	experts [0, 200], [200, 400], [400, 600] [600, 800] [800, 1000]
DeiT-S	Model	experts [0, 200], [200, 400], [400, 600] [600, 800] [800, 1000]	experts [0, 200], [200, 400], [400, 600] [600, 800] [800, 1000]

D. Sampling with classifier guidance

Our code supports sampling with guidance from 1) finetuned model 2) multi-experts 3) PPAP.

Finetune

export PYTHONPATH=$PYTHONPATH:$(pwd)
SAMPLE_FLAGS="--batch_size 100 --num_samples 10000  --timestep_respacing ddim25 --use_ddim True"
MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond False --diffusion_steps 1000 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True"
MODEL_PATH_FLAGS="--model_path [diffusion_path] --classifier_path [ckpt_path]"
python python_scripts/classifier_sample.py --log_path [sampling_path] $MODEL_FLAGS $SAMPLE_FLAGS $MODEL_PATH_FLAGS --method "finetune" --gpus 0

Multi-experts

export PYTHONPATH=$PYTHONPATH:$(pwd)
SAMPLE_FLAGS="--batch_size 100 --num_samples 10000  --timestep_respacing ddim25 --use_ddim True"
MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond False --diffusion_steps 1000 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True"
MODEL_PATH_FLAGS="--model_path [diffusion_path] --classifier_path [ckpt_path_0] [ckpt_path_1] ... [ckpt_path_N]"
python python_scripts/classifier_sample.py --log_path [sampling_path] $MODEL_FLAGS $SAMPLE_FLAGS $MODEL_PATH_FLAGS --method "multi_experts" --gpus 0

PPAP

export PYTHONPATH=$PYTHONPATH:$(pwd)
SAMPLE_FLAGS="--batch_size 100 --num_samples 10000  --timestep_respacing ddim25 --use_ddim True"
MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond False --diffusion_steps 1000 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True"
MODEL_PATH_FLAGS="--model_path [diffusion_path] --classifier_path [ckpt_path_0] [ckpt_path_1] ... [ckpt_path_N]"
python python_scripts/classifier_sample.py --log_path [sampling_path] $MODEL_FLAGS $SAMPLE_FLAGS $MODEL_PATH_FLAGS --method "ppap" --gpus 0

D.1 Sampling configuration.

DDIM: To sample by DDIM with t steps, set --timestep_respacing as ddimt.
DDPM: DDPM with t steps is enabled when --timestep_respacing is set as t.

D.2 DDP for sampling.

Because of slow sampling speed, we recommend to use DDP for sampling. For using DDP with k gpus, please add command mpiexec -n k in front of python execution command, and set --gpu option to gpu ids that will be used.

E. Evaluation

Check evaluations/Readme.md.

PPAP with various models for DeepFloyd-IF.

Baseimage	Depth map	PPAP Depth Guided Image	PPAP Depth Guided + "Dog" prompt Image

We provide the codes for depth guidance with Midas for DeepFloyd-IF. Deepfloyd-IF is similar to GLIDE model which is used in our paper, but can create 1024x1024 higher quality images than GLIDE. From this reason, we change the target diffusion model as DeepFloyd-IF in released code for offering high quality images.

A. Prepare pre-trained model weight of DeepFloyd-IF.

First step is preparing pretrained checkpoint of DeepFloyd-IF. Please refer the repository of DeepFloyd-IF (Link) and get the access token of hugging face.

Then, set hf_token argument of following python command as your access token.

B. Generate unconditional image dataset for PPAP.

For finetune off-the-shelf models with PPAP framework, we should generate synthetic images from unconditional diffusion models. The following command will generate these data from deepfloyd-IF:

export PYTHONPATH=$PYTHONPATH:$(pwd)
mpiexec -n [number_of_gpus] python python_scripts/generate_dataset_deepfloyd.py --stage 2 --num_samples 500000 --gpus ["gpu_ids"] --log_path ["Directory for saving the dataset"] --batch_size [batch_size] --hf_token ["your token"]

C. PPAP-finetune Midas

Following command will finetune Midas as the guidance model with PPAP framework.

export PYTHONPATH=$PYTHONPATH:$(pwd)
mpiexec -n [number_of_gpus] python python_scripts/deepfloyd_guidance_ppap.py --iterations 300000 --batch_size 64 --gpus ["gpu_ids"] --log_path ["path for logging directory"]

D. Trained checkpoints

experts [0,200] [200,400] [400,600] [600,800] [800,1000]

E. Generating samples

Please refer deepfloyd_guidance_ppap.ipynb, which contains examples for depth guidance with PPAP.

F. Used dataset in DeepFloyd-IF PPAP

The generated dataset produced in B. Generate unconditional image dataset for PPAP. can be download in link.

BibTex

@inproceedings{go2023towards,
  title={Towards Practical Plug-and-Play Diffusion Models},
  author={Go, Hyojun and Lee, Yunsung and Kim, Jin-Young and Lee, Seunghyun and Jeong, Myeongho and Lee, Hyun Seung and Choi, Seungtaek},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={1962--1971},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
asset		asset
deepfloyd_if		deepfloyd_if
evaluations		evaluations
guided_diffusion		guided_diffusion
peft		peft
python_scripts		python_scripts
LICENSE		LICENSE
README.md		README.md
deepfloyd_guidance_ppap.ipynb		deepfloyd_guidance_ppap.ipynb
requirements.txt		requirements.txt
setup.py		setup.py

License

riiid/PPAP

Folders and files

Latest commit

History

Repository files navigation