[Performance 6/6] Add --precision half option to avoid casting during inference #15820

huchenlei · 2024-05-17T00:12:51Z

Description

According to lllyasviel/stable-diffusion-webui-forge#716 (comment) , casting during inference is a main source of performance overhead. ComfyUI and Forge by default does fp16 inference without any casting, i.e. all tensors are fp16 before inference. The performance overhead is ~50ms/it.

This PR adds an option --precision half to disable autocasting and use all fp16 values during inference.

Screenshots/videos:

Checklist:

I have read contributing wiki page
I have performed a self-review of my own code
My code follows the style guidelines
My code passes tests

SLAPaper · 2024-05-17T03:48:42Z

will force-fp16 mode conflicting with fp8 unet?

AG-w · 2024-05-17T05:59:29Z

I'm not sure if this is related to using dynamic lora weight
but I got this error

      File "H:\AItest\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
        return forward_call(*args, **kwargs)
      File "H:\AItest\stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 522, in network_Conv2d_forward
        return originals.Conv2d_forward(self, input)
      File "H:\AItest\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 460, in forward
        return self._conv_forward(input, self.weight, self.bias)
      File "H:\AItest\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 456, in _conv_forward
        return F.conv2d(input, weight, bias, self.stride,
    RuntimeError: Input type (float) and bias type (struct c10::Half) should be the same

wonder if it's related to this
#12205

feffy380 · 2024-05-17T07:59:19Z

Enabling --precision half breaks SD1.5 with the error mentioned here

feffy380 · 2024-05-17T11:47:46Z

Found the offending line. In ldm's openaimodel.py L795 in the UNetModel class we have:

        h = x.type(self.dtype)

while in sgm it is simply:

        # h = x.type(self.dtype)
        h = x

self.dtype is set by constructing the model with use_fp16. When enabling force_fp16, we need to make sure to set the model's dtype to fp16. The fact that it works with SDXL is purely an accident due to the missing cast.

I don't know if it's the appropriate place to put it, but setting use_fp16 in sd_models.repair_config fixed SD1.5 inference with this PR for me.

AG-w · 2024-05-17T12:50:58Z

I don't know if it's the appropriate place to put it, but setting use_fp16 in sd_models.repair_config fixed SD1.5 inference with this PR for me.

something like this?

def repair_config(sd_config):

    if not hasattr(sd_config.model.params, "use_ema"):
        sd_config.model.params.use_ema = False

    if hasattr(sd_config.model.params, 'unet_config'):
        if shared.cmd_opts.no_half:
            sd_config.model.params.unet_config.params.use_fp16 = False
        elif shared.cmd_opts.upcast_sampling or shared.cmd_opts.precision == "half":
            sd_config.model.params.unet_config.params.use_fp16 = True

this does fixed dtype mismatch error

huchenlei · 2024-05-17T17:34:56Z

I don't know if it's the appropriate place to put it, but setting use_fp16 in sd_models.repair_config fixed SD1.5 inference with this PR for me.

something like this?
def repair_config(sd_config):

    if not hasattr(sd_config.model.params, "use_ema"):
        sd_config.model.params.use_ema = False

    if hasattr(sd_config.model.params, 'unet_config'):
        if shared.cmd_opts.no_half:
            sd_config.model.params.unet_config.params.use_fp16 = False
        elif shared.cmd_opts.upcast_sampling or shared.cmd_opts.precision == "half":
            sd_config.model.params.unet_config.params.use_fp16 = True
this does fixed dtype mismatch error

Thanks for digging out the solution! Verified that the solution works.

ThereforeGames · 2024-05-17T19:23:34Z

I'm still getting the following runtime error with both SDXL and SD15 models:

      File "T:\code\python\automatic-stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\openaimodel.py", line 984, in forward
        emb = self.time_embed(t_emb)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\container.py", line 215, in forward
        input = module(input)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 508, in network_Linear_forward
        return originals.Linear_forward(self, input)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
        return F.linear(input, self.weight, self.bias)
    RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half

Seems to be related to --precision half. Anyone else getting this?

huchenlei · 2024-05-17T20:16:22Z

I'm still getting the following runtime error with both SDXL and SD15 models:

      File "T:\code\python\automatic-stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\openaimodel.py", line 984, in forward
        emb = self.time_embed(t_emb)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\container.py", line 215, in forward
        input = module(input)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 508, in network_Linear_forward
        return originals.Linear_forward(self, input)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
        return F.linear(input, self.weight, self.bias)
    RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half

Seems to be related to --precision half. Anyone else getting this?

Can you share what model you used? I am not sure if you load a full precision model, whether weights are casted to fp16 before inference. The models I tested are already half precision.

ThereforeGames · 2024-05-17T20:34:30Z

Can you share what model you used? I am not sure if you load a full precision model, whether weights are casted to fp16 before inference. The models I tested are already half precision.

Sure, I tried a few:

anyloraCheckpoint_bakedvaeBlessedFp16.safetensors [ef49fbb25f]
v1-5-pruned.safetensors [1a189f0be6]
cyberrealisticPony_v20a.safetensors [41e77f7657]

Same error regardless of checkpoint. It probably has something to do with my environment, although I'm not sure what yet. Here's a bit more context:

All extensions disabled aside from built-ins.
Not using any LoRAs or extra networks.
Tried a bunch of different samplers and schedulers.
Using commandline args: --precision half --ckpt-dir "S:/stable_diffusion/checkpoints" --lora-dir "S:/stable_diffusion/lora"
Installed via your bundle PR

I'll write back if I figure out the cause.

Arvamer · 2024-05-18T08:47:34Z

I’ve tested this on a 6700 XT and there is a performance improvement. However, I think that this should not disallow setting --no-half-vae. On my card, running VAE in fp16 always produces black images. So, the only way to get correct images with --precision half is to enable NaN checks and rely on A1111 automatic fallback to fp32 VAE decoding, which negates some of the performance gains from this PR.

Add --precision half cmd option

2a8a60c

huchenlei requested a review from AUTOMATIC1111 as a code owner May 17, 2024 00:12

huchenlei mentioned this pull request May 17, 2024

[DO NOT MERGE] All perf improvements bundle #15821

Draft

4 tasks

v0xie mentioned this pull request May 17, 2024

Fix: Identity matrix different dtype than output v0xie/sd-webui-incantations#42

Closed

drhead mentioned this pull request May 17, 2024

[Performance] LDM optimization patches #15824

Merged

4 tasks

huchenlei added 2 commits May 17, 2024 13:23

Fix SD15 dtype

dca9007

Proper fix of SD15 dtype

b57a70f

AG-w referenced this pull request in SLAPaper/sd-webui-negpip Jun 6, 2024

fix: work with fp8 storage

fc9ff14

AUTOMATIC1111 approved these changes Jun 8, 2024

View reviewed changes

AUTOMATIC1111 merged commit 33b73c4 into AUTOMATIC1111:dev Jun 8, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance 6/6] Add --precision half option to avoid casting during inference #15820

[Performance 6/6] Add --precision half option to avoid casting during inference #15820

huchenlei commented May 17, 2024

SLAPaper commented May 17, 2024

AG-w commented May 17, 2024 •

edited

feffy380 commented May 17, 2024

feffy380 commented May 17, 2024 •

edited

AG-w commented May 17, 2024 •

edited

huchenlei commented May 17, 2024

ThereforeGames commented May 17, 2024

huchenlei commented May 17, 2024

ThereforeGames commented May 17, 2024 •

edited

Arvamer commented May 18, 2024

[Performance 6/6] Add --precision half option to avoid casting during inference #15820

[Performance 6/6] Add --precision half option to avoid casting during inference #15820

Conversation

huchenlei commented May 17, 2024

Description

Screenshots/videos:

Checklist:

SLAPaper commented May 17, 2024

AG-w commented May 17, 2024 • edited

feffy380 commented May 17, 2024

feffy380 commented May 17, 2024 • edited

AG-w commented May 17, 2024 • edited

huchenlei commented May 17, 2024

ThereforeGames commented May 17, 2024

huchenlei commented May 17, 2024

ThereforeGames commented May 17, 2024 • edited

Arvamer commented May 18, 2024

AG-w commented May 17, 2024 •

edited

feffy380 commented May 17, 2024 •

edited

AG-w commented May 17, 2024 •

edited

ThereforeGames commented May 17, 2024 •

edited