Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is left-padding in PPO strictly necessary? #232

Open
mgerstgrasser opened this issue Mar 5, 2024 · 6 comments
Open

Is left-padding in PPO strictly necessary? #232

mgerstgrasser opened this issue Mar 5, 2024 · 6 comments

Comments

@mgerstgrasser
Copy link
Contributor

mgerstgrasser commented Mar 5, 2024

I noticed that RemoteExperienceMaker left-pads the input sequences even when using vllm for generation:

# NOTE: concat all outputs to following format:

I can see that a few lines down,self.actor.process_sequences() assumes this left-padding, as it calculates an action mask in a way that hinges on all the inputs terminating at the same index.

Other than that, are there any other parts of the code that assume that inputs are left-padded, respectively that inputs always terminate on the same index?

If not, I'd like to open a PR that skips the left-padding and calculates action mask directly - the left-padding can be inefficient, and action-masking this way doesn't generalise to multi-turn conversations.

@hijkzzz
Copy link
Collaborator

hijkzzz commented Mar 5, 2024

Yes this is necessary to reduce the pad of the training samples

Prompt samples with left padding allow us to remove PAD on both sides and then dynamically padding depending on batch training samples

@mgerstgrasser
Copy link
Contributor Author

mgerstgrasser commented Mar 5, 2024

Ah, you mean remove_padding_in_sequences()? Wouldn't that still work with only right-padding?

@hijkzzz
Copy link
Collaborator

hijkzzz commented Mar 5, 2024

Ah, you mean remove_padding_in_sequences()? Wouldn't that still work with only right-padding?

This will lead to a lot of pads in the middle

@mgerstgrasser
Copy link
Contributor Author

Ah, no, to be clear, what I mean is the following:
Right now, the padding is done like this ('promp' - a prompt token, 'respo' - a response token):

| [PAD]   [PAD]   promp   promp   promp | respo   respo   [EOS]   [PAD] |
| promp   promp   promp   promp   promp | respo   respo   [EOS]   [PAD] |
| [PAD]   [PAD]   [PAD]   promp   promp | respo   respo   respo   [EOS] |

What I have in mind is instead to do this:

| promp   promp   promp | respo   respo   [EOS]   [PAD]   [PAD] |
| promp   promp   promp   promp   promp | respo   respo   [EOS] |
| promp   promp | respo   respo   respo   [EOS]   [PAD]   [PAD] |

So, less padding overall, and no padding in the middle. The only thing that is now a little different is that the index where the prompt stops and the response starts isn't the same for each sequence - will that break anything?

@hijkzzz
Copy link
Collaborator

hijkzzz commented Mar 5, 2024

Ah, no, to be clear, what I mean is the following: Right now, the padding is done like this ('promp' - a prompt token, 'respo' - a response token):

| [PAD]   [PAD]   promp   promp   promp | respo   respo   [EOS]   [PAD] |
| promp   promp   promp   promp   promp | respo   respo   [EOS]   [PAD] |
| [PAD]   [PAD]   [PAD]   promp   promp | respo   respo   respo   [EOS] |

What I have in mind is instead to do this:

| promp   promp   promp | respo   respo   [EOS]   [PAD]   [PAD] |
| promp   promp   promp   promp   promp | respo   respo   [EOS] |
| promp   promp | respo   respo   respo   [EOS]   [PAD]   [PAD] |

So, less padding overall, and no padding in the middle. The only thing that is now a little different is that the index where the prompt stops and the response starts isn't the same for each sequence - will that break anything?

they are

| promp   promp   promp  [PAD]  [PAD]      | respo   respo   [EOS]   [PAD]  
| promp   promp   promp  promp   promp | respo   respo   [EOS] |  [PAD] |
| promp   promp  [PAD]  [PAD]   [PAD]        | respo   respo   respo   [EOS] 

@mgerstgrasser
Copy link
Contributor Author

That's not what I am proposing though! What I mean is, if I return it without the pads in the middle from _generate_vllm(), would that break anything? (No worries if I'm not making sense though, I can just try it and see what happens.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants