Continued pretraining
You can now do continued pretraining with Unsloth. See https://unsloth.ai/blog/contpretraining for more details!
Continued pretraining is 2x faster and uses 50% less VRAM than HF + FA2 QLoRA. We offload embed_tokens
and lm_head
to disk to save VRAM!
You can now simply use both in the target modules like below:
model = FastLanguageModel.get_peft_model(
model,
r = 128, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",
"embed_tokens", "lm_head",], # Add for continual pretraining
lora_alpha = 32,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
use_rslora = True, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
We also allow 2 learning rates - one for the embedding matrices and another for the LoRA adapters:
from unsloth import is_bfloat16_supported
from unsloth import UnslothTrainer, UnslothTrainingArguments
trainer = UnslothTrainer(
args = UnslothTrainingArguments(
....
learning_rate = 5e-5,
embedding_learning_rate = 5e-6,
),
)
We also share a free Colab to finetune Mistral v3 to learn Korean (you can select any language you like) using Wikipedia and the Aya Dataset: https://colab.research.google.com/drive/1tEd1FrOXWMnCU9UIvdYhs61tkxdMuKZu?usp=sharing
And we're sharing our free Colab notebook for continued pretraining for text completion: https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing
What's Changed
- Update _utils.py by @CoffeeVampir3 in #520
- Nightly by @danielhanchen in #548
- Fix Phi-3 by @danielhanchen in #556
- Fix #563 by @danielhanchen in #564
New Contributors
- @CoffeeVampir3 made their first contribution in #520
Full Changelog: May-2024...June-2024