You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that the scores in reward_fn is actually equal to logits_i - logsumexp(logits).
I think this expression can be calculated directly by log_softmax. Why not use log_softmax?
This is because for models with tensor parallelism, log_softmax should be computed as logits_i - logsumexp(logits). To maintain consistency and compare the results, we also use logits_i - logsumexp(logits) in normal scenarios.
This is because for models with tensor parallelism, log_softmax should be computed as logits_i - logsumexp(logits). To maintain consistency and compare the results, we also use logits_i - logsumexp(logits) in normal scenarios.
I noticed that the
scores
inreward_fn
is actually equal tologits_i - logsumexp(logits)
.I think this expression can be calculated directly by
log_softmax
. Why not uselog_softmax
?LMOps/minillm/minillm/reward.py
Line 33 in 5fbf5bc
The text was updated successfully, but these errors were encountered: