Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is the loss of Diffusion model calculated between “RANDOM noise” and “model predicted noise”? #118

Open
egshkim opened this issue May 29, 2023 · 4 comments

Comments

@egshkim
Copy link

egshkim commented May 29, 2023

Thanks for your a lot contribution and hard work.
Why is the loss of Diffusion model calculated between “RANDOM noise” and “model predicted noise”?
Not between “Actual added noise” and “model predicted noise”?

image
@Krasner
Copy link

Krasner commented May 29, 2023

@egshkim I'm working with this repo too so i'll give my 2 cents.
the denoise_fn which is the u-net is not actually reconstructing an image from noise - rather it is predicting the amount of noise added at each timestep. so x_recon is a misnomer - it really should be noise_pred.

In the inference step p_sample_loop iteratively infers the amount of noise at each timestep and removes it from the previous noisy image. The code is a bit hard to follow but I think the actual subtraction happens in predict_start_from_noise:

def predict_start_from_noise(self, x_t, t, noise):

That function is getting called by p_mean_variance

- in this case x_recon is actually correctly named.

x_recon = self.predict_start_from_noise(
                x, t=t, noise=self.denoise_fn(x, noise_level))

the u-net (denoise_fn) predicts the noise at whichever timestep and then predict_start_from_noise removes that noise from x to give you x_recon

@egshkim
Copy link
Author

egshkim commented May 30, 2023

@egshkim I'm working with this repo too so i'll give my 2 cents. the denoise_fn which is the u-net is not actually reconstructing an image from noise - rather it is predicting the amount of noise added at each timestep. so x_recon is a misnomer - it really should be noise_pred.

Thanks for your kind and detailed explanation. : )
But actually, my question is about "why random noise is used for loss calculation",
rather than where the denoising process actually happened.

In my opinion, the actually added noise between step "t-1" and step "t" should be used for loss calculation.
But almost all currently available diffusion training code uses random noise for loss calculation.

@Krasner
Copy link

Krasner commented May 30, 2023

Yes - that comes from the original definition of diffusion processes in the first paper: https://arxiv.org/pdf/2006.11239.pdf (specifically equation 14):
image

The real loss function (Equations 3 and 5) do relate x_t-1 to x_t via a variational lower bound, but this loss in intractable, the authors go through a derivation / simplification of this loss function which then results in this "simple" L2 loss form (eq 14)

The SR3 paper (https://arxiv.org/pdf/2104.07636.pdf) they also experiment with different loss norms (L1 vs L2) and find that L1 loss gives better results...

I'm no mathematician so the derivation is a bit hard to follow, I also welcome further explanations :)

@whiteYi
Copy link

whiteYi commented May 30, 2023

@egshkim I'm working with this repo too so i'll give my 2 cents. the denoise_fn which is the u-net is not actually reconstructing an image from noise - rather it is predicting the amount of noise added at each timestep. so x_recon is a misnomer - it really should be noise_pred.

Thanks for your kind and detailed explanation. : ) But actually, my question is about "why random noise is used for loss calculation", rather than where the denoising process actually happened.

In my opinion, the actually added noise between step "t-1" and step "t" should be used for loss calculation. But almost all currently available diffusion training code uses random noise for loss calculation.

As far as i know,the function 'default' means that return the actually noise if actually noise is exist,only return the random noise when actually noise is not exist.i hope my answear can help you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants