Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Keep sigmas on CPU #15823

Draft
wants to merge 1 commit into
base: dev
Choose a base branch
from
Draft

Conversation

drhead
Copy link
Contributor

@drhead drhead commented May 17, 2024

Description

  • Currently, the k diffusion sampler code creates a tensor for sigmas on the GPU. This will cause forced device syncs every step because they are used for control flow within the sampler code.
  • I changed them to stay on CPU. Every operation using the sigma values works when the value is on the CPU (much like it would if you had a native python list of floats). This also allows it to work ahead as control flow is no longer dependent on the GPU.
  • There are still other blocking ops, so this may not give an immediate performance benefit. I'll open a separate PR for the other one that I know of.

Checklist:

@Panchovix
Copy link

Panchovix commented May 17, 2024

It seems it interferes with #15751?

I get

Traceback (most recent call last):
      File "G:\Stable difussion\stable-diffusion-webui\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
                   ^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\txt2img.py", line 109, in txt2img
        processed = processing.process_images(p)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\processing.py", line 839, in process_images
        res = process_images_inner(p)
              ^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 59, in processing_process_images_hijack
        return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\processing.py", line 975, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\processing.py", line 1322, in sample
        samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 181, in sample
        sigmas = self.get_sigmas(p, steps)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 118, in get_sigmas
        sigmas = scheduler.function(n=steps, **sigmas_kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    TypeError: get_align_your_steps_sigmas() missing 1 required positional argument: 'device'

---

@drhead
Copy link
Contributor Author

drhead commented May 17, 2024

It seems it interferes with #15751?

That is a problem on their end, and can be resolved by not sending the sigmas to the device.

LoganBooker added a commit to LoganBooker/stable-diffusion-webui that referenced this pull request May 17, 2024
* Consistent with implementations in k-diffusion.
* Makes this compatible with AUTOMATIC1111#15823
@Panchovix
Copy link

I was testing with hr-fix and it seems it doesn't work?

Traceback (most recent call last):
      File "G:\Stable difussion\stable-diffusion-webui\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
                   ^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\txt2img.py", line 109, in txt2img
        processed = processing.process_images(p)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\processing.py", line 839, in process_images
        res = process_images_inner(p)
              ^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 59, in processing_process_images_hijack
        return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\processing.py", line 975, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\processing.py", line 1338, in sample
        return self.sample_hr_pass(samples, decoded_samples, seeds, subseeds, subseed_strength, prompts)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\processing.py", line 1423, in sample_hr_pass
        samples = self.sampler.sample_img2img(self, samples, noise, self.hr_c, self.hr_uc, steps=self.hr_second_pass_steps or self.steps, image_conditioning=image_conditioning)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 172, in sample_img2img
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\sd_samplers_common.py", line 272, in launch_sampling
        return func()
               ^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 172, in <lambda>
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
                                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\venv\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\sd_samplers_extra.py", line 71, in restart_sampler
        x = heun_step(x, old_sigma, new_sigma)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\sd_samplers_extra.py", line 20, in heun_step
        d = to_d(x, old_sigma, denoised)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 48, in to_d
        return (x - denoised) / utils.append_dims(sigma, x.ndim)
               ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

---

It only happens when hi-res starts to generate (after tiled upscale process)

@drhead
Copy link
Contributor Author

drhead commented May 18, 2024

I was testing with hr-fix and it seems it doesn't work?

That's not a hi-res fix issue, it's a k-diffusion issue specific to that sampler (Euler and Heun at least, DPM++ 2M works fine). I'll have to work on an upstream patch for k-diffusion for this fix to be viable, then. Marking as draft until then.

@drhead drhead marked this pull request as draft May 18, 2024 19:36
@drhead
Copy link
Contributor Author

drhead commented May 19, 2024

So, for this to resolve, either this PR needs to be merged upstream (crowsonkb/k-diffusion#109) or I have to monkey patch the function. This probably isn't going to be a merge candidate any time soon so I'll give the upstream PR some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants