Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance 6/6] Add --precision half option to avoid casting during inference #15820

Merged
merged 3 commits into from
Jun 8, 2024

Conversation

huchenlei
Copy link
Contributor

Description

According to lllyasviel/stable-diffusion-webui-forge#716 (comment) , casting during inference is a main source of performance overhead. ComfyUI and Forge by default does fp16 inference without any casting, i.e. all tensors are fp16 before inference. The performance overhead is ~50ms/it.

This PR adds an option --precision half to disable autocasting and use all fp16 values during inference.

Screenshots/videos:

328562287-8edf8f2c-3ee3-4fa0-af4b-7da3667a081e

Checklist:

@SLAPaper
Copy link

will force-fp16 mode conflicting with fp8 unet?

@AG-w
Copy link

AG-w commented May 17, 2024

I'm not sure if this is related to using dynamic lora weight
but I got this error

      File "H:\AItest\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
        return forward_call(*args, **kwargs)
      File "H:\AItest\stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 522, in network_Conv2d_forward
        return originals.Conv2d_forward(self, input)
      File "H:\AItest\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 460, in forward
        return self._conv_forward(input, self.weight, self.bias)
      File "H:\AItest\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 456, in _conv_forward
        return F.conv2d(input, weight, bias, self.stride,
    RuntimeError: Input type (float) and bias type (struct c10::Half) should be the same

wonder if it's related to this
#12205

@feffy380
Copy link

Enabling --precision half breaks SD1.5 with the error mentioned here

@feffy380
Copy link

feffy380 commented May 17, 2024

Found the offending line. In ldm's openaimodel.py L795 in the UNetModel class we have:

        h = x.type(self.dtype)

while in sgm it is simply:

        # h = x.type(self.dtype)
        h = x

self.dtype is set by constructing the model with use_fp16. When enabling force_fp16, we need to make sure to set the model's dtype to fp16. The fact that it works with SDXL is purely an accident due to the missing cast.

I don't know if it's the appropriate place to put it, but setting use_fp16 in sd_models.repair_config fixed SD1.5 inference with this PR for me.

@AG-w
Copy link

AG-w commented May 17, 2024

I don't know if it's the appropriate place to put it, but setting use_fp16 in sd_models.repair_config fixed SD1.5 inference with this PR for me.

something like this?

def repair_config(sd_config):

    if not hasattr(sd_config.model.params, "use_ema"):
        sd_config.model.params.use_ema = False

    if hasattr(sd_config.model.params, 'unet_config'):
        if shared.cmd_opts.no_half:
            sd_config.model.params.unet_config.params.use_fp16 = False
        elif shared.cmd_opts.upcast_sampling or shared.cmd_opts.precision == "half":
            sd_config.model.params.unet_config.params.use_fp16 = True

this does fixed dtype mismatch error

@huchenlei
Copy link
Contributor Author

I don't know if it's the appropriate place to put it, but setting use_fp16 in sd_models.repair_config fixed SD1.5 inference with this PR for me.

something like this?

def repair_config(sd_config):

    if not hasattr(sd_config.model.params, "use_ema"):
        sd_config.model.params.use_ema = False

    if hasattr(sd_config.model.params, 'unet_config'):
        if shared.cmd_opts.no_half:
            sd_config.model.params.unet_config.params.use_fp16 = False
        elif shared.cmd_opts.upcast_sampling or shared.cmd_opts.precision == "half":
            sd_config.model.params.unet_config.params.use_fp16 = True

this does fixed dtype mismatch error

Thanks for digging out the solution! Verified that the solution works.

@ThereforeGames
Copy link
Contributor

I'm still getting the following runtime error with both SDXL and SD15 models:

      File "T:\code\python\automatic-stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\openaimodel.py", line 984, in forward
        emb = self.time_embed(t_emb)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\container.py", line 215, in forward
        input = module(input)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 508, in network_Linear_forward
        return originals.Linear_forward(self, input)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
        return F.linear(input, self.weight, self.bias)
    RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half

Seems to be related to --precision half. Anyone else getting this?

@huchenlei
Copy link
Contributor Author

I'm still getting the following runtime error with both SDXL and SD15 models:

      File "T:\code\python\automatic-stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\openaimodel.py", line 984, in forward
        emb = self.time_embed(t_emb)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\container.py", line 215, in forward
        input = module(input)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 508, in network_Linear_forward
        return originals.Linear_forward(self, input)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
        return F.linear(input, self.weight, self.bias)
    RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half

Seems to be related to --precision half. Anyone else getting this?

Can you share what model you used? I am not sure if you load a full precision model, whether weights are casted to fp16 before inference. The models I tested are already half precision.

@ThereforeGames
Copy link
Contributor

ThereforeGames commented May 17, 2024

Can you share what model you used? I am not sure if you load a full precision model, whether weights are casted to fp16 before inference. The models I tested are already half precision.

Sure, I tried a few:

  • anyloraCheckpoint_bakedvaeBlessedFp16.safetensors [ef49fbb25f]
  • v1-5-pruned.safetensors [1a189f0be6]
  • cyberrealisticPony_v20a.safetensors [41e77f7657]

Same error regardless of checkpoint. It probably has something to do with my environment, although I'm not sure what yet. Here's a bit more context:

  • All extensions disabled aside from built-ins.
  • Not using any LoRAs or extra networks.
  • Tried a bunch of different samplers and schedulers.
  • Using commandline args: --precision half --ckpt-dir "S:/stable_diffusion/checkpoints" --lora-dir "S:/stable_diffusion/lora"
  • Installed via your bundle PR

I'll write back if I figure out the cause.

@Arvamer
Copy link

Arvamer commented May 18, 2024

I’ve tested this on a 6700 XT and there is a performance improvement. However, I think that this should not disallow setting --no-half-vae. On my card, running VAE in fp16 always produces black images. So, the only way to get correct images with --precision half is to enable NaN checks and rely on A1111 automatic fallback to fp32 VAE decoding, which negates some of the performance gains from this PR.

AG-w referenced this pull request in SLAPaper/sd-webui-negpip Jun 6, 2024
@AUTOMATIC1111 AUTOMATIC1111 merged commit 33b73c4 into AUTOMATIC1111:dev Jun 8, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants