You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently stable-diffusion.cpp seems to have a too high RAM usage compared to https://github.com/rupeshs/fastsdcpu (written in Python) for the same result.
I compared the Dreamshaper LCM model + TAESD at 5 steps and a resolution of 512x512 on stable-diffusion.cpp vs FastSDCPU, running on the CPU.
The speed is fully identical between both projects, I get ~4.4 s/it with both projects.
But stable-diffusion.cpp uses a peak of 2 GB RAM, or 1.6 GB with flash attention enabled, while FastSDCPU only uses a peak of 700 MB RAM. So stable-diffusion.cpp needs between 2-3x more RAM for the same result.
It looks like some significant optimizations would be possible in stable-diffusion.cpp that make it much more memory efficient.
The text was updated successfully, but these errors were encountered:
Currently, im2col is being used for convolutions, which consumes a very high amount of RAM during the VAE phase.
I have been working on a kernel that merges im2col and matrix multiplications to avoid materializing a lot of data in memory, although that entails a 40% performance reduction. So far, I am only doing this for CUDA; for CPU it will be more difficult and will likely have a negative impact on performance.
Currently stable-diffusion.cpp seems to have a too high RAM usage compared to https://github.com/rupeshs/fastsdcpu (written in Python) for the same result.
I compared the Dreamshaper LCM model + TAESD at 5 steps and a resolution of 512x512 on stable-diffusion.cpp vs FastSDCPU, running on the CPU.
The speed is fully identical between both projects, I get ~4.4 s/it with both projects.
But stable-diffusion.cpp uses a peak of 2 GB RAM, or 1.6 GB with flash attention enabled, while FastSDCPU only uses a peak of 700 MB RAM. So stable-diffusion.cpp needs between 2-3x more RAM for the same result.
It looks like some significant optimizations would be possible in stable-diffusion.cpp that make it much more memory efficient.
The text was updated successfully, but these errors were encountered: