Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is reward_fn equal to log_softmax #222

Closed
EganGu opened this issue May 14, 2024 · 2 comments
Closed

Is reward_fn equal to log_softmax #222

EganGu opened this issue May 14, 2024 · 2 comments

Comments

@EganGu
Copy link

EganGu commented May 14, 2024

I noticed that the scores in reward_fn is actually equal to logits_i - logsumexp(logits).
I think this expression can be calculated directly by log_softmax. Why not use log_softmax?

def reward_fn(self, input_ids, gen_ids, inf_mask=None, output_pos=True):

@t1101675
Copy link
Contributor

This is because for models with tensor parallelism, log_softmax should be computed as logits_i - logsumexp(logits). To maintain consistency and compare the results, we also use logits_i - logsumexp(logits) in normal scenarios.

@EganGu
Copy link
Author

EganGu commented May 22, 2024

This is because for models with tensor parallelism, log_softmax should be computed as logits_i - logsumexp(logits). To maintain consistency and compare the results, we also use logits_i - logsumexp(logits) in normal scenarios.

Understood. Thanks for your reply.

@EganGu EganGu closed this as completed May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants