Much higher RAM usage (2-3 times) compared to FastSDCPU when using the exact same models/settings #261

JohnAlcatraz · 2024-05-12T02:51:46Z

Currently stable-diffusion.cpp seems to have a too high RAM usage compared to https://github.com/rupeshs/fastsdcpu (written in Python) for the same result.

I compared the Dreamshaper LCM model + TAESD at 5 steps and a resolution of 512x512 on stable-diffusion.cpp vs FastSDCPU, running on the CPU.

The speed is fully identical between both projects, I get ~4.4 s/it with both projects.

But stable-diffusion.cpp uses a peak of 2 GB RAM, or 1.6 GB with flash attention enabled, while FastSDCPU only uses a peak of 700 MB RAM. So stable-diffusion.cpp needs between 2-3x more RAM for the same result.

It looks like some significant optimizations would be possible in stable-diffusion.cpp that make it much more memory efficient.

FSSRepo · 2024-05-13T13:20:32Z

Currently, im2col is being used for convolutions, which consumes a very high amount of RAM during the VAE phase.

I have been working on a kernel that merges im2col and matrix multiplications to avoid materializing a lot of data in memory, although that entails a 40% performance reduction. So far, I am only doing this for CUDA; for CPU it will be more difficult and will likely have a negative impact on performance.

JohnAlcatraz · 2024-05-13T14:51:14Z

Currently, im2col is being used for convolutions, which consumes a very high amount of RAM during the VAE phase.

But I did my comparison with TAESD instead of the VAE, so I think that means the VAE isn't used at all? TAESD is super lightweight already.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Much higher RAM usage (2-3 times) compared to FastSDCPU when using the exact same models/settings #261

Much higher RAM usage (2-3 times) compared to FastSDCPU when using the exact same models/settings #261

JohnAlcatraz commented May 12, 2024 •

edited

FSSRepo commented May 13, 2024

JohnAlcatraz commented May 13, 2024 •

edited

Much higher RAM usage (2-3 times) compared to FastSDCPU when using the exact same models/settings #261

Much higher RAM usage (2-3 times) compared to FastSDCPU when using the exact same models/settings #261

Comments

JohnAlcatraz commented May 12, 2024 • edited

FSSRepo commented May 13, 2024

JohnAlcatraz commented May 13, 2024 • edited

JohnAlcatraz commented May 12, 2024 •

edited

JohnAlcatraz commented May 13, 2024 •

edited