Error in PPO when using resized model #204
Replies: 3 comments 7 replies
-
@pratikkumar018 Currently, we disable resizing model embeddings to tokenizer vocab size since it causes mismatch with vLLM. To quick fix, you can try diff --git a/openrlhf/utils/utils.py b/openrlhf/utils/utils.py
index 6fa62e1..801398b 100644
--- a/openrlhf/utils/utils.py
+++ b/openrlhf/utils/utils.py
@@ -32,6 +32,7 @@ def get_tokenizer(pretrain, model, padding_side="left", strategy=None, use_fast=
tokenizer.pad_token = tokenizer.eos_token
tokenizer.pad_token_id = tokenizer.eos_token_id
model.config.pad_token_id = tokenizer.pad_token_id
+ model.resize_token_embeddings(len(tokenizer)) |
Beta Was this translation helpful? Give feedback.
-
Oh, I see where the problem is. You have a finetinued actor model with vocab size 32002 and use our reward model with vocab size 32000. For now the prompts and sequences are tokenized by actor model's tokenizer, this is where mismatch happens. You can configure critic and reward model to use pretrain's tokenizer and https://github.com/OpenLLMAI/OpenRLHF/blob/main/examples/train_ppo.py#L60-L63 diff --git a/examples/train_ppo.py b/examples/train_ppo.py
index bcff074..a2e7f12 100644
--- a/examples/train_ppo.py
+++ b/examples/train_ppo.py
@@ -59,8 +59,8 @@ def train(args):
# configure tokenizer
tokenizer = get_tokenizer(args.pretrain, actor.model, "left", strategy)
- get_tokenizer(args.reward_pretrain, critic, "left", strategy)
- get_tokenizer(args.reward_pretrain, reward_model, "left", strategy)
+ get_tokenizer(args.pretrain, critic, "left", strategy)
+ get_tokenizer(args.pretrain, reward_model, "left", strategy)
strategy.print(actor)
strategy.print(critic) |
Beta Was this translation helpful? Give feedback.
-
Hi @wuxibin89 |
Beta Was this translation helpful? Give feedback.
-
Hi Team,
I am trying to replicate this example. https://github.com/OpenLLMAI/OpenRLHF/blob/main/examples/scripts/train_ppo_llama.sh
When using the exact same example. I am able to run & get the training started. But when I change --pretrain to one of my finetuned LLama 7B with increased vocab model(32002) sized I am getting the below error.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [119,0,0], thread: [64,0,0] Assertion
srcIndex < srcSelectDimSize
failed.I see the tokeniser is only initialised once in the train_ppo.py code(tokenizer = get_tokenizer(args.pretrain, actor.model, "left", strategy))
I guess the above error comes when the size of embedding mismatches.
Shouldnt we initialise the tokenizer for the reward model as well?
My guess is reward model has size of 32000 & is trying to use the pretrain tokenizer of 32002 and hence the error. Not sure though. Please help.
Beta Was this translation helpful? Give feedback.
All reactions