training example for instruct pix2pix doesn't zero out embeds #7920

bghira · 2024-05-12T04:26:58Z

Describe the bug

When running inference on SDXL, the config specifies to zero out the embedding when the prompt is empty.

Reproduction

    # Get null conditioning
    def compute_null_conditioning():
        null_conditioning_list = []
        for a_tokenizer, a_text_encoder in zip(tokenizers, text_encoders):
            null_conditioning_list.append(
                a_text_encoder(
                    tokenize_captions([""], tokenizer=a_tokenizer).to(accelerator.device),
                    output_hidden_states=True,
                ).hidden_states[-2]
            )
        return torch.concat(null_conditioning_list, dim=-1)

    null_conditioning = compute_null_conditioning()

this could likely be replaced with a probabilistic call to torch.zeros_like() inside the training loop instead.

I've checked the values of the embeds, and classifier-free guidance at inference time definitely makes use of the zero embed and not just "", which end up producing very different results.

other models though like deepfloyd just use "" from eg. T5 and behave rather differently.

Logs

No response

System Info

N/A

Who can help?

@sayakpaul

The text was updated successfully, but these errors were encountered:

sayakpaul · 2024-05-12T11:35:25Z

Thanks for bringing this up. Possible for you to show a comparison between what happens when you zero out like the way you mentioned compared to the existing approach?

I've checked the values of the embeds, and classifier-free guidance at inference time definitely makes use of the zero embed and not just "", which end up producing very different results.

The SD IP2P pipeline uses "", though when negative prompt is not provided:

diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py

Line 558 in ec9e881

uncond_tokens = [""] * batch_size

However, it makes use to zeros_like for the unconditional image embeddings:

diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py

Line 869 in ec9e881

uncond_image_latents = torch.zeros_like(image_latents)

bghira · 2024-05-12T12:03:50Z

the base model was trained using it, so i figured aligning with the base model's training and inference has better results.

from my own tests, i can now reduce the step count required when running the default config on the SDXL pipelines, eg. force zeroes is set to True

I also have much better learning. this model started from ptx0/terminus-xl-velocity-v1 and it was unable to spell.

1000 steps of tuning later:

the base model was trained using "" and it never really ends up with better CFG performance... but now it does!

bghira · 2024-05-12T12:07:26Z

see the base SDXL pipeline:

        # get unconditional embeddings for classifier free guidance
        zero_out_negative_prompt = negative_prompt is None and self.config.force_zeros_for_empty_prompt
        if do_classifier_free_guidance and negative_prompt_embeds is None and zero_out_negative_prompt:
            negative_prompt_embeds = torch.zeros_like(prompt_embeds)
            negative_pooled_prompt_embeds = torch.zeros_like(pooled_prompt_embeds)

and the config:

{
   "_class_name": "StableDiffusionXLPipeline",
   "_diffusers_version": "0.19.0.dev0",
   "force_zeros_for_empty_prompt": true
}

sayakpaul · 2024-05-12T12:14:14Z

Ah makes a ton of sense. Do you want to take a step at opening a PR to fix this?

bghira · 2024-05-12T15:47:58Z

can i also open the pull request for all of the other training examples, to add general dropout capabilities to them?

sayakpaul · 2024-05-12T19:13:00Z

We can open that up for the community. This way everyone gets to participate.

bghira · 2024-05-12T22:21:41Z

like the ticket for updating the fp16 error? #6231

sayakpaul · 2024-05-13T02:30:57Z

#6552

bghira added the bug Something isn't working label May 12, 2024

bghira linked a pull request May 18, 2024 that will close this issue

[SDXL] Update ControlNet training example to zero out embeds for empty captions #7976

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training example for instruct pix2pix doesn't zero out embeds #7920

training example for instruct pix2pix doesn't zero out embeds #7920

bghira commented May 12, 2024 •

edited

sayakpaul commented May 12, 2024

bghira commented May 12, 2024

bghira commented May 12, 2024 •

edited

sayakpaul commented May 12, 2024

bghira commented May 12, 2024

sayakpaul commented May 12, 2024

bghira commented May 12, 2024

sayakpaul commented May 13, 2024

training example for instruct pix2pix doesn't zero out embeds #7920

training example for instruct pix2pix doesn't zero out embeds #7920

Comments

bghira commented May 12, 2024 • edited

Describe the bug

Reproduction

Logs

System Info

Who can help?

sayakpaul commented May 12, 2024

bghira commented May 12, 2024

bghira commented May 12, 2024 • edited

sayakpaul commented May 12, 2024

bghira commented May 12, 2024

sayakpaul commented May 12, 2024

bghira commented May 12, 2024

sayakpaul commented May 13, 2024

bghira commented May 12, 2024 •

edited

bghira commented May 12, 2024 •

edited