[Performance] Keep sigmas on CPU #15823

drhead · 2024-05-17T14:40:50Z

Description

Currently, the k diffusion sampler code creates a tensor for sigmas on the GPU. This will cause forced device syncs every step because they are used for control flow within the sampler code.
I changed them to stay on CPU. Every operation using the sigma values works when the value is on the CPU (much like it would if you had a native python list of floats). This also allows it to work ahead as control flow is no longer dependent on the GPU.
There are still other blocking ops, so this may not give an immediate performance benefit. I'll open a separate PR for the other one that I know of.

Checklist:

I have read contributing wiki page
I have performed a self-review of my own code
My code follows the style guidelines
My code passes tests

Panchovix · 2024-05-17T18:21:39Z

It seems it interferes with #15751?

I get

Traceback (most recent call last):
      File "G:\Stable difussion\stable-diffusion-webui\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
                   ^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\txt2img.py", line 109, in txt2img
        processed = processing.process_images(p)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\processing.py", line 839, in process_images
        res = process_images_inner(p)
              ^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 59, in processing_process_images_hijack
        return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\processing.py", line 975, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\processing.py", line 1322, in sample
        samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 181, in sample
        sigmas = self.get_sigmas(p, steps)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 118, in get_sigmas
        sigmas = scheduler.function(n=steps, **sigmas_kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    TypeError: get_align_your_steps_sigmas() missing 1 required positional argument: 'device'

---

drhead · 2024-05-17T18:24:14Z

It seems it interferes with #15751?

That is a problem on their end, and can be resolved by not sending the sigmas to the device.

* Consistent with implementations in k-diffusion. * Makes this compatible with AUTOMATIC1111#15823

Panchovix · 2024-05-18T18:35:31Z

I was testing with hr-fix and it seems it doesn't work?

Traceback (most recent call last):
      File "G:\Stable difussion\stable-diffusion-webui\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
                   ^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\txt2img.py", line 109, in txt2img
        processed = processing.process_images(p)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\processing.py", line 839, in process_images
        res = process_images_inner(p)
              ^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 59, in processing_process_images_hijack
        return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\processing.py", line 975, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\processing.py", line 1338, in sample
        return self.sample_hr_pass(samples, decoded_samples, seeds, subseeds, subseed_strength, prompts)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\processing.py", line 1423, in sample_hr_pass
        samples = self.sampler.sample_img2img(self, samples, noise, self.hr_c, self.hr_uc, steps=self.hr_second_pass_steps or self.steps, image_conditioning=image_conditioning)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 172, in sample_img2img
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\sd_samplers_common.py", line 272, in launch_sampling
        return func()
               ^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 172, in <lambda>
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
                                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\venv\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\sd_samplers_extra.py", line 71, in restart_sampler
        x = heun_step(x, old_sigma, new_sigma)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\modules\sd_samplers_extra.py", line 20, in heun_step
        d = to_d(x, old_sigma, denoised)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "G:\Stable difussion\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 48, in to_d
        return (x - denoised) / utils.append_dims(sigma, x.ndim)
               ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

---

It only happens when hi-res starts to generate (after tiled upscale process)

drhead · 2024-05-18T19:36:09Z

I was testing with hr-fix and it seems it doesn't work?

That's not a hi-res fix issue, it's a k-diffusion issue specific to that sampler (Euler and Heun at least, DPM++ 2M works fine). I'll have to work on an upstream patch for k-diffusion for this fix to be viable, then. Marking as draft until then.

drhead · 2024-05-19T07:13:36Z

So, for this to resolve, either this PR needs to be merged upstream (crowsonkb/k-diffusion#109) or I have to monkey patch the function. This probably isn't going to be a merge candidate any time soon so I'll give the upstream PR some time.

Keep sigmas on CPU

01491d3

drhead requested a review from AUTOMATIC1111 as a code owner May 17, 2024 14:40

LoganBooker added a commit to LoganBooker/stable-diffusion-webui that referenced this pull request May 17, 2024

Default device for sigma tensor to CPU

1d74482

* Consistent with implementations in k-diffusion. * Makes this compatible with AUTOMATIC1111#15823

drhead marked this pull request as draft May 18, 2024 19:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Keep sigmas on CPU #15823

[Performance] Keep sigmas on CPU #15823

drhead commented May 17, 2024

Panchovix commented May 17, 2024 •

edited

drhead commented May 17, 2024

Panchovix commented May 18, 2024

drhead commented May 18, 2024

drhead commented May 19, 2024

[Performance] Keep sigmas on CPU #15823

Are you sure you want to change the base?

[Performance] Keep sigmas on CPU #15823

Conversation

drhead commented May 17, 2024

Description

Checklist:

Panchovix commented May 17, 2024 • edited

drhead commented May 17, 2024

Panchovix commented May 18, 2024

drhead commented May 18, 2024

drhead commented May 19, 2024

Panchovix commented May 17, 2024 •

edited