redundant memory allocation maybe the root cause of OOMs #2060

jinsong-mao · 2023-11-27T07:42:45Z

during the investigation of LLAMA_7b OOM issue, we found that there are many redundant memory allocation. maybe it's not necessary for test.
1, there is deepcopy for maybe_cast() and deepcopy_and_maybe_cast(). which would duplicate the memory on GPU allocated for this model.
https://github.com/pytorch/benchmark/blob/main/userbenchmark/dynamo/dynamobench/common.py#L2400
https://github.com/pytorch/benchmark/blob/main/userbenchmark/dynamo/dynamobench/common.py#L2403

looks we need to check more strictly on deepcopy.

2, there is deepcopy in validate_model() too.
https://github.com/pytorch/benchmark/blob/main/userbenchmark/dynamo/dynamobench/common.py#L1918

we can run the LLAMA_7b model(which has OOM issue previously #2051 ) with one A100 40G after commenting out the unnecessary deepcopy().

hope this information can help on fixing the OOM issues in this repo.

Thanks

xuzhao9 · 2023-11-27T14:10:35Z

dynamobench is owned by the PT2 team. In my understanding, it is used for accuracy check because some models are stateful.
cc @desertfire is there a way to turn off deepcopy in dynamobench?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

redundant memory allocation maybe the root cause of OOMs #2060

redundant memory allocation maybe the root cause of OOMs #2060

jinsong-mao commented Nov 27, 2023

xuzhao9 commented Nov 27, 2023 •

edited

redundant memory allocation maybe the root cause of OOMs #2060

redundant memory allocation maybe the root cause of OOMs #2060

Comments

jinsong-mao commented Nov 27, 2023

xuzhao9 commented Nov 27, 2023 • edited

xuzhao9 commented Nov 27, 2023 •

edited