Is left-padding in PPO strictly necessary? #232

mgerstgrasser · 2024-03-05T01:49:43Z

I noticed that RemoteExperienceMaker left-pads the input sequences even when using vllm for generation:

OpenRLHF/openrlhf/trainer/ppo_utils/experience_maker.py

Line 346 in dcd379a

# NOTE: concat all outputs to following format:

I can see that a few lines down,self.actor.process_sequences() assumes this left-padding, as it calculates an action mask in a way that hinges on all the inputs terminating at the same index.

Other than that, are there any other parts of the code that assume that inputs are left-padded, respectively that inputs always terminate on the same index?

If not, I'd like to open a PR that skips the left-padding and calculates action mask directly - the left-padding can be inefficient, and action-masking this way doesn't generalise to multi-turn conversations.

The text was updated successfully, but these errors were encountered:

hijkzzz · 2024-03-05T01:59:31Z

Yes this is necessary to reduce the pad of the training samples

Prompt samples with left padding allow us to remove PAD on both sides and then dynamically padding depending on batch training samples

mgerstgrasser · 2024-03-05T02:05:08Z

Ah, you mean remove_padding_in_sequences()? Wouldn't that still work with only right-padding?

hijkzzz · 2024-03-05T03:04:52Z

Ah, you mean remove_padding_in_sequences()? Wouldn't that still work with only right-padding?

This will lead to a lot of pads in the middle

mgerstgrasser · 2024-03-05T04:23:13Z

Ah, no, to be clear, what I mean is the following:
Right now, the padding is done like this ('promp' - a prompt token, 'respo' - a response token):

| [PAD]   [PAD]   promp   promp   promp | respo   respo   [EOS]   [PAD] |
| promp   promp   promp   promp   promp | respo   respo   [EOS]   [PAD] |
| [PAD]   [PAD]   [PAD]   promp   promp | respo   respo   respo   [EOS] |

What I have in mind is instead to do this:

| promp   promp   promp | respo   respo   [EOS]   [PAD]   [PAD] |
| promp   promp   promp   promp   promp | respo   respo   [EOS] |
| promp   promp | respo   respo   respo   [EOS]   [PAD]   [PAD] |

So, less padding overall, and no padding in the middle. The only thing that is now a little different is that the index where the prompt stops and the response starts isn't the same for each sequence - will that break anything?

hijkzzz · 2024-03-05T04:24:55Z

Ah, no, to be clear, what I mean is the following: Right now, the padding is done like this ('promp' - a prompt token, 'respo' - a response token):
| [PAD]   [PAD]   promp   promp   promp | respo   respo   [EOS]   [PAD] |
| promp   promp   promp   promp   promp | respo   respo   [EOS]   [PAD] |
| [PAD]   [PAD]   [PAD]   promp   promp | respo   respo   respo   [EOS] |
What I have in mind is instead to do this:
| promp   promp   promp | respo   respo   [EOS]   [PAD]   [PAD] |
| promp   promp   promp   promp   promp | respo   respo   [EOS] |
| promp   promp | respo   respo   respo   [EOS]   [PAD]   [PAD] |
So, less padding overall, and no padding in the middle. The only thing that is now a little different is that the index where the prompt stops and the response starts isn't the same for each sequence - will that break anything?

they are

| promp   promp   promp  [PAD]  [PAD]      | respo   respo   [EOS]   [PAD]  
| promp   promp   promp  promp   promp | respo   respo   [EOS] |  [PAD] |
| promp   promp  [PAD]  [PAD]   [PAD]        | respo   respo   respo   [EOS]

mgerstgrasser · 2024-03-05T04:42:11Z

That's not what I am proposing though! What I mean is, if I return it without the pads in the middle from _generate_vllm(), would that break anything? (No worries if I'm not making sense though, I can just try it and see what happens.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is left-padding in PPO strictly necessary? #232

Is left-padding in PPO strictly necessary? #232

mgerstgrasser commented Mar 5, 2024 •

edited

hijkzzz commented Mar 5, 2024 •

edited

mgerstgrasser commented Mar 5, 2024 •

edited by hijkzzz

hijkzzz commented Mar 5, 2024

mgerstgrasser commented Mar 5, 2024

hijkzzz commented Mar 5, 2024

mgerstgrasser commented Mar 5, 2024

Is left-padding in PPO strictly necessary? #232

Is left-padding in PPO strictly necessary? #232

Comments

mgerstgrasser commented Mar 5, 2024 • edited

hijkzzz commented Mar 5, 2024 • edited

mgerstgrasser commented Mar 5, 2024 • edited by hijkzzz

hijkzzz commented Mar 5, 2024

mgerstgrasser commented Mar 5, 2024

hijkzzz commented Mar 5, 2024

mgerstgrasser commented Mar 5, 2024

mgerstgrasser commented Mar 5, 2024 •

edited

hijkzzz commented Mar 5, 2024 •

edited

mgerstgrasser commented Mar 5, 2024 •

edited by hijkzzz