Is reward_fn equal to log_softmax #222

EganGu · 2024-05-14T07:05:16Z

I noticed that the scores in reward_fn is actually equal to logits_i - logsumexp(logits).
I think this expression can be calculated directly by log_softmax. Why not use log_softmax?

LMOps/minillm/minillm/reward.py

Line 33 in 5fbf5bc

def reward_fn(self, input_ids, gen_ids, inf_mask=None, output_pos=True):

The text was updated successfully, but these errors were encountered:

t1101675 · 2024-05-21T01:56:32Z

This is because for models with tensor parallelism, log_softmax should be computed as logits_i - logsumexp(logits). To maintain consistency and compare the results, we also use logits_i - logsumexp(logits) in normal scenarios.

EganGu · 2024-05-22T11:30:30Z

This is because for models with tensor parallelism, log_softmax should be computed as logits_i - logsumexp(logits). To maintain consistency and compare the results, we also use logits_i - logsumexp(logits) in normal scenarios.

Understood. Thanks for your reply.

EganGu closed this as completed May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is reward_fn equal to log_softmax #222

Is reward_fn equal to log_softmax #222

EganGu commented May 14, 2024

t1101675 commented May 21, 2024

EganGu commented May 22, 2024

Is reward_fn equal to log_softmax #222

Is reward_fn equal to log_softmax #222

Comments

EganGu commented May 14, 2024

t1101675 commented May 21, 2024

EganGu commented May 22, 2024