prefix-tuning RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). #226

hhh12hhh · 2023-09-27T10:54:34Z

System Info

PyTorch version: 2.0.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.27.0
Libc version: glibc-2.31

Python version: 3.9.17 (main, Jul 5 2023, 20:41:20) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-71-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA GeForce RTX 3090

Nvidia driver version: 495.29.05
Versions of relevant libraries:
mypy-extensions==1.0.0
numpy==1.23.5
torch==2.0.1
torchdata==0.6.1
torchtext==0.15.2
torchvision==0.15.2
numpy = 1.23.5
torch = 2.0.1
torchdata = 0.6.1
torchtext =0.15.2
torchvision = 0.15.2

Information

The official example scripts
My own modified scripts

🐛 Describe the bug

I encountered the above error while fine-tuning the model with prefix
here is my fine-tuning script：

CUDA_VISIBLE_DEVICES=0,1 torchrun --nnodes 1 --nproc_per_node 1 examples/finetuning.py \
    --use_peft \
    --peft_method prefix \
    --model_name ../model/llama-2-7b-chat-hf \
    --use_fp16 \
    --output_dir ./output \
    --dataset alpaca_dataset \
    --data_path ./data.json \
    --batch_size_training 16 \
    --num_epochs 3 \
    --quantization

Error logs

Traceback (most recent call last):
File "/home/zxy/llama2/llama2-lora-fine-tuning/llama-recipes-main/examples/finetuning.py", line 8, in
fire.Fire(main)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/llama_recipes/finetuning.py", line 237, in main
results = train(
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/llama_recipes/utils/train_utils.py", line 84, in train
scaler.scale(loss).backward()
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/autograd/function.py", line 274, in apply
return user_fn(self, *args)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 157, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Expected behavior

I want to know if I wrote something wrong or other reasons, how to solve it

JunoLiusj · 2024-04-26T14:09:03Z

Encounter the same problem! when the finetuning method turn to ptuning or others, there weren't be this problem. Is there any thing wrong with the peft.PrefixTuningConfig?

HamidShojanazeri · 2024-05-07T00:12:32Z

@JunoLiusj if you are using it with FSDP unfortunately its not supported, #482

HamidShojanazeri added the triaged label May 7, 2024

wukaixingxp self-assigned this May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prefix-tuning RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). #226

prefix-tuning RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). #226

hhh12hhh commented Sep 27, 2023

JunoLiusj commented Apr 26, 2024

HamidShojanazeri commented May 7, 2024

prefix-tuning RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). #226

prefix-tuning RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). #226

Comments

hhh12hhh commented Sep 27, 2023

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

JunoLiusj commented Apr 26, 2024

HamidShojanazeri commented May 7, 2024