Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prefix-tuning RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). #226

Open
1 of 2 tasks
hhh12hhh opened this issue Sep 27, 2023 · 2 comments
Assignees
Labels

Comments

@hhh12hhh
Copy link

System Info

PyTorch version: 2.0.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.27.0
Libc version: glibc-2.31

Python version: 3.9.17 (main, Jul 5 2023, 20:41:20) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-71-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA GeForce RTX 3090

Nvidia driver version: 495.29.05
Versions of relevant libraries:
mypy-extensions==1.0.0
numpy==1.23.5
torch==2.0.1
torchdata==0.6.1
torchtext==0.15.2
torchvision==0.15.2
numpy = 1.23.5
torch = 2.0.1
torchdata = 0.6.1
torchtext =0.15.2
torchvision = 0.15.2

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

I encountered the above error while fine-tuning the model with prefix
here is my fine-tuning script:

CUDA_VISIBLE_DEVICES=0,1 torchrun --nnodes 1 --nproc_per_node 1 examples/finetuning.py \
    --use_peft \
    --peft_method prefix \
    --model_name ../model/llama-2-7b-chat-hf \
    --use_fp16 \
    --output_dir ./output \
    --dataset alpaca_dataset \
    --data_path ./data.json \
    --batch_size_training 16 \
    --num_epochs 3 \
    --quantization 

Error logs

Traceback (most recent call last):
File "/home/zxy/llama2/llama2-lora-fine-tuning/llama-recipes-main/examples/finetuning.py", line 8, in
fire.Fire(main)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/llama_recipes/finetuning.py", line 237, in main
results = train(
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/llama_recipes/utils/train_utils.py", line 84, in train
scaler.scale(loss).backward()
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/autograd/function.py", line 274, in apply
return user_fn(self, *args)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 157, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Expected behavior

I want to know if I wrote something wrong or other reasons, how to solve it

@JunoLiusj
Copy link

Encounter the same problem! when the finetuning method turn to ptuning or others, there weren't be this problem. Is there any thing wrong with the peft.PrefixTuningConfig?

@HamidShojanazeri
Copy link
Contributor

@JunoLiusj if you are using it with FSDP unfortunately its not supported, #482

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants