Error in PPO when using resized model #204

pratikkumar018 · 2024-01-30T14:34:34Z

pratikkumar018
Jan 30, 2024

Hi Team,
I am trying to replicate this example. https://github.com/OpenLLMAI/OpenRLHF/blob/main/examples/scripts/train_ppo_llama.sh
When using the exact same example. I am able to run & get the training started. But when I change --pretrain to one of my finetuned LLama 7B with increased vocab model(32002) sized I am getting the below error.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [119,0,0], thread: [64,0,0] Assertion srcIndex < srcSelectDimSize failed.

I see the tokeniser is only initialised once in the train_ppo.py code(tokenizer = get_tokenizer(args.pretrain, actor.model, "left", strategy))
I guess the above error comes when the size of embedding mismatches.
Shouldnt we initialise the tokenizer for the reward model as well?
My guess is reward model has size of 32000 & is trying to use the pretrain tokenizer of 32002 and hence the error. Not sure though. Please help.

wuxibin89 · 2024-01-31T05:18:08Z

wuxibin89
Jan 31, 2024
Maintainer

@pratikkumar018 Currently, we disable resizing model embeddings to tokenizer vocab size since it causes mismatch with vLLM. To quick fix, you can try resize_token_embeddings manually like below:

diff --git a/openrlhf/utils/utils.py b/openrlhf/utils/utils.py
index 6fa62e1..801398b 100644
--- a/openrlhf/utils/utils.py
+++ b/openrlhf/utils/utils.py
@@ -32,6 +32,7 @@ def get_tokenizer(pretrain, model, padding_side="left", strategy=None, use_fast=
         tokenizer.pad_token = tokenizer.eos_token
         tokenizer.pad_token_id = tokenizer.eos_token_id
         model.config.pad_token_id = tokenizer.pad_token_id
+    model.resize_token_embeddings(len(tokenizer))

3 replies

pratikkumar018 Jan 31, 2024
Author

@wuxibin89 Thanks for quick reply. Will try this. A follow up question. Shouldn't reward model have a separate tokenizer different from Actor one?

wuxibin89 Jan 31, 2024
Maintainer

We actually have plan(#151) to support different reward method:

model: same or different size to actor model
function: rule based reward
api: GPT-3.5/4
...

So it's reasonable that reward model have a separate tokenizer different from actor.

pratikkumar018 Jan 31, 2024
Author

Thanks a lot for the info.

wuxibin89 · 2024-01-31T06:08:07Z

wuxibin89
Jan 31, 2024
Maintainer

Oh, I see where the problem is. You have a finetinued actor model with vocab size 32002 and use our reward model with vocab size 32000. For now the prompts and sequences are tokenized by actor model's tokenizer, this is where mismatch happens. You can configure critic and reward model to use pretrain's tokenizer and resize_token_embeddings to 32002.

https://github.com/OpenLLMAI/OpenRLHF/blob/main/examples/train_ppo.py#L60-L63

diff --git a/examples/train_ppo.py b/examples/train_ppo.py
index bcff074..a2e7f12 100644
--- a/examples/train_ppo.py
+++ b/examples/train_ppo.py
@@ -59,8 +59,8 @@ def train(args):
 
     # configure tokenizer
     tokenizer = get_tokenizer(args.pretrain, actor.model, "left", strategy)
-    get_tokenizer(args.reward_pretrain, critic, "left", strategy)
-    get_tokenizer(args.reward_pretrain, reward_model, "left", strategy)
+    get_tokenizer(args.pretrain, critic, "left", strategy)
+    get_tokenizer(args.pretrain, reward_model, "left", strategy)
 
     strategy.print(actor)
     strategy.print(critic)

2 replies

pratikkumar018 Jan 31, 2024
Author

Got it. THanks

pratikkumar018 Jan 31, 2024
Author

Have followed this and started a training. Thanks a lot

pratikkumar018 · 2024-02-02T08:59:58Z

pratikkumar018
Feb 2, 2024
Author

Hi @wuxibin89
I need one help.
I am trying to finetune a bigger model. 13B hence using https://github.com/OpenLLMAI/OpenRLHF/blob/main/examples/scripts/train_ppo_llama_ray.sh recipe.
But again getting the same error '../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [119,0,0], thread: [64,0,0] Assertion srcIndex < srcSelectDimSize failed.'
I dont see get_tokenizer() called anywhere in train_ppo_llama_ray.sh . Hence I cant do the model.resize_token_embeddings(len(tokenizer)) this time.
Can you please help what I can do this time when using ray.

2 replies

wuxibin89 Feb 4, 2024
Maintainer

@pratikkumar018 Sorry for late reply. What size of your reward model, 7B or 13B? If you use 13B+13B, then you can apply this patch:

diff --git a/openrlhf/trainer/ray/launcher.py b/openrlhf/trainer/ray/launcher.py
index fcb1d71..5525306 100644
--- a/openrlhf/trainer/ray/launcher.py
+++ b/openrlhf/trainer/ray/launcher.py
@@ -70,6 +70,7 @@ class ReferenceModelRayActor(BasePPORole):
             load_in_4bit=strategy.args.load_in_4bit,
             ds_config=strategy.get_ds_eval_config(),
         )
+        get_tokenizer(strategy.args.pretrain, model.model, "left", strategy)
         strategy.print(model)
 
         self.model = self.strategy.prepare(model, is_rlhf=True)
@@ -101,6 +102,7 @@ class RewardModelRayActor(BasePPORole):
             load_in_4bit=strategy.args.load_in_4bit,
             ds_config=strategy.get_ds_eval_config(),
         )
+        get_tokenizer(strategy.args.pretrain, model, "left", strategy)
         strategy.print(model)
         strategy.print("reward normalization status: {}".format(strategy.args.normalize_reward))
         strategy.print("mean: {}, std {}".format(model.mean, model.std))
diff --git a/openrlhf/trainer/ray/ppo_critic.py b/openrlhf/trainer/ray/ppo_critic.py
index 9d8dc94..2f197a8 100644
--- a/openrlhf/trainer/ray/ppo_critic.py
+++ b/openrlhf/trainer/ray/ppo_critic.py
@@ -75,6 +75,7 @@ class CriticModelRayActor(BasePPORole):
             target_modules=strategy.args.target_modules,
             ds_config=strategy.get_ds_train_config(is_actor=False),
         )
+        get_tokenizer(strategy.args.pretrain, critic, "left", strategy)
         strategy.print(critic)
         strategy.print("reward normalization status: {}".format(strategy.args.normalize_reward))
         strategy.print("mean: {}, std {}".format(critic.mean, critic.std))

pratikkumar018 Feb 5, 2024
Author

Thanks for reply. Will try this.
Also, by when you people planning to add code for separate tokeniser for reward model?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in PPO when using resized model #204

{{title}}

Replies: 3 comments 7 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Error in PPO when using resized model #204

pratikkumar018 Jan 30, 2024

Replies: 3 comments · 7 replies

wuxibin89 Jan 31, 2024 Maintainer

pratikkumar018 Jan 31, 2024 Author

wuxibin89 Jan 31, 2024 Maintainer

pratikkumar018 Jan 31, 2024 Author

wuxibin89 Jan 31, 2024 Maintainer

pratikkumar018 Jan 31, 2024 Author

pratikkumar018 Jan 31, 2024 Author

pratikkumar018 Feb 2, 2024 Author

wuxibin89 Feb 4, 2024 Maintainer

pratikkumar018 Feb 5, 2024 Author

pratikkumar018
Jan 30, 2024

Replies: 3 comments 7 replies

wuxibin89
Jan 31, 2024
Maintainer

pratikkumar018 Jan 31, 2024
Author

wuxibin89 Jan 31, 2024
Maintainer

pratikkumar018 Jan 31, 2024
Author

wuxibin89
Jan 31, 2024
Maintainer

pratikkumar018 Jan 31, 2024
Author

pratikkumar018 Jan 31, 2024
Author

pratikkumar018
Feb 2, 2024
Author

wuxibin89 Feb 4, 2024
Maintainer

pratikkumar018 Feb 5, 2024
Author