Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why my train loss after introducing sync loss? #140

Open
Marskly opened this issue May 1, 2024 · 4 comments
Open

Why my train loss after introducing sync loss? #140

Marskly opened this issue May 1, 2024 · 4 comments

Comments

@Marskly
Copy link

Marskly commented May 1, 2024

image
After introducin at Step 250000, the L1 Loss, Vgg Loss, Percep are all increasing.
It is because taht the loss of sync is too big? And it influences the weights of model?

@see2run
Copy link

see2run commented May 7, 2024

Hey, can you share what you do from dataset preparation to running the script train_syncnet_sam.py? Because I've been trying and the output result is just stuck like this without any progress:

(w2l_cek) vian:~/wav2lip_288x288$ python3 train_syncnet_sam.py
use_cuda: True
total trainable params 65054464
Training From Scratch !!!
Starting Epoch: 0

@Marskly
Copy link
Author

Marskly commented May 9, 2024

Hey, can you share what you do from dataset preparation to running the script train_syncnet_sam.py? Because I've been trying and the output result is just stuck like this without any progress:

(w2l_cek) vian:~/wav2lip_288x288$ python3 train_syncnet_sam.py use_cuda: True total trainable params 65054464 Training From Scratch !!! Starting Epoch: 0

Maybe your CPU loads data too slowly. You can monitor your CPU utilization and GPU memory.
Try smaller batch size.

@Liming-belief
Copy link

Hello, I have encountered the same problem as you. Have you resolved it @Marskly

@see2run
Copy link

see2run commented May 20, 2024

Hey, can you share what you do from dataset preparation to running the script train_syncnet_sam.py? Because I've been trying and the output result is just stuck like this without any progress:
(w2l_cek) vian:~/wav2lip_288x288$ python3 train_syncnet_sam.py use_cuda: True total trainable params 65054464 Training From Scratch !!! Starting Epoch: 0

Maybe your CPU loads data too slowly. You can monitor your CPU utilization and GPU memory. Try smaller batch size.

Okay, I have solved it, thank you, and now when training, the results are as follows:

Step 259 | L1: 0.08976 | Vgg: 0.3718 | SW: 0.03 | Sync: 0.0 | DW: 0.0 | Percep: 0.0 | Fake: 0.0, Real: 0.0 | Load: 0.01096, Train: 1.225

where Percep, Fake, and Real are always 0.0.
Can you provide any suggestions? I am training with 1725 videos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants