llama_7b model OOM issue #2051

jinsong-mao · 2023-11-21T08:55:02Z

Hi

I duplicate the llama model and rename it into llama_7b, changed the model parameters according to llama_7b specification, looks like this:

skiped the CPU eager mode, only run the cuda model.

it reports the following issue when running with this command:
python userbenchmark/dynamo/dynamobench/torchbench.py -dcuda --float16 -n1 --inductor --performance --inference --filter "llama" --batch_size 1 --in_slen 32 --out_slen 3 --output-dir=torchbench_llama_test_logs

If I want to run this model, how should I fix it? my hardware is A100-40G

thanks

xuzhao9 · 2023-11-21T14:11:23Z

We only guarantee the runability of models on PT eager mode on A100 40GB in our CI. It is possible that inductor uses more GPU memory than eager mode, causing OOM. Optimizing GPU memory usage with inductor is an open question.
cc @msaroufim

jinsong-mao · 2023-11-24T02:29:12Z

@xuzhao9 I tried to use 4xA100-40G to avoid the OOM issue, looks torchbench.py only use one GPU's memory, I used options like --device-index or --multiprocess, both failed. do you have any advice on multi GPU support?

thanks

jinsong-mao mentioned this issue Nov 27, 2023

redundant memory allocation maybe the root cause of OOMs #2060

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama_7b model OOM issue #2051

llama_7b model OOM issue #2051

jinsong-mao commented Nov 21, 2023 •

edited

xuzhao9 commented Nov 21, 2023 •

edited

jinsong-mao commented Nov 24, 2023

llama_7b model OOM issue #2051

llama_7b model OOM issue #2051

Comments

jinsong-mao commented Nov 21, 2023 • edited

xuzhao9 commented Nov 21, 2023 • edited

jinsong-mao commented Nov 24, 2023

jinsong-mao commented Nov 21, 2023 •

edited

xuzhao9 commented Nov 21, 2023 •

edited