You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
prefix-tuning RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed).
#226
Open
1 of 2 tasks
hhh12hhh opened this issue
Sep 27, 2023
· 2 comments
Traceback (most recent call last):
File "/home/zxy/llama2/llama2-lora-fine-tuning/llama-recipes-main/examples/finetuning.py", line 8, in
fire.Fire(main)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/llama_recipes/finetuning.py", line 237, in main
results = train(
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/llama_recipes/utils/train_utils.py", line 84, in train
scaler.scale(loss).backward()
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/autograd/function.py", line 274, in apply
return user_fn(self, *args)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 157, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
Expected behavior
I want to know if I wrote something wrong or other reasons, how to solve it
The text was updated successfully, but these errors were encountered:
Encounter the same problem! when the finetuning method turn to ptuning or others, there weren't be this problem. Is there any thing wrong with the peft.PrefixTuningConfig?
System Info
PyTorch version: 2.0.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.27.0
Libc version: glibc-2.31
Python version: 3.9.17 (main, Jul 5 2023, 20:41:20) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-71-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA GeForce RTX 3090
Nvidia driver version: 495.29.05
Versions of relevant libraries:
mypy-extensions==1.0.0
numpy==1.23.5
torch==2.0.1
torchdata==0.6.1
torchtext==0.15.2
torchvision==0.15.2
numpy = 1.23.5
torch = 2.0.1
torchdata = 0.6.1
torchtext =0.15.2
torchvision = 0.15.2
Information
🐛 Describe the bug
I encountered the above error while fine-tuning the model with prefix
here is my fine-tuning script:
Error logs
Traceback (most recent call last):
File "/home/zxy/llama2/llama2-lora-fine-tuning/llama-recipes-main/examples/finetuning.py", line 8, in
fire.Fire(main)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/llama_recipes/finetuning.py", line 237, in main
results = train(
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/llama_recipes/utils/train_utils.py", line 84, in train
scaler.scale(loss).backward()
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/autograd/function.py", line 274, in apply
return user_fn(self, *args)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 157, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "/root/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
Expected behavior
I want to know if I wrote something wrong or other reasons, how to solve it
The text was updated successfully, but these errors were encountered: