Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] rope scaling with phi3 models #406

Open
arunpatala opened this issue May 1, 2024 · 2 comments
Open

[BUG] rope scaling with phi3 models #406

arunpatala opened this issue May 1, 2024 · 2 comments

Comments

@arunpatala
Copy link

Hi,

I am trying to train phi3 mini model with longer context length 8192 than its default length of 4096.
I understand that reope scaling is not supported for models with sliding window. How can I proceed
from this to train a phi3 model with longer context? should i finetune the base model to extend its
context length? which methods can i use? Is there a plan to support in future?

AlgorithmError: ExecuteUserScriptError: ExitCode 1 ErrorMessage "raise RuntimeError( RuntimeError: Unsloth: Unfortunately Mistral type models do not support RoPE scaling! The maximum sequence length supported is 4096." Command "/opt/conda/bin/python3.10 run_unsloth.py --bf16 True --dataset_path /opt/ml/input/data/training --eval_steps 1000 --evaluation_strategy steps --fp16 False --gradient_accumulation_steps 2 --gradient_checkpointing True --learning_rate 0.0002 --load_in_4bit True --logging_dir /opt/ml/output/tensorboard --logging_steps 10 --lr_scheduler_type linear --max_seq_length 8192 --model_name unsloth/Phi-3-mini-4k-instruct-bnb-4bit --neftune_noise_alpha 5 --num_train_epochs 2 --optim adamw_8bit --output_dir /opt/ml/checkpoints --per_device_eval_batch_size 6 --per_device_train_batch_size 6 --report_to tensorboard --save_strategy epoch --seed 3407 --train_filename train.parquet --validation_filename val.parquet --warmup_steps 5 --weight_decay 0.01", exit code: 1

@danielhanchen
Copy link
Contributor

@arunpatala Oh yep that is an issue - planning to support the 128K Phi later down the road

@arunpatala
Copy link
Author

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants