You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to run the 7B model's validation code with 8*NVIDIA RTX A5000. However, an out-of-memory error occurred. I'm wondering if it needs so much to test.
Hi,
I'm trying to run the 7B model's validation code with 8*NVIDIA RTX A5000. However, an out-of-memory error occurred. I'm wondering if it needs so much to test.
Here is the log:
[2024-01-17 17:44:34,524] [WARNING] [runner.py:122:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-01-17 17:44:34,597] [INFO] [runner.py:360:main] cmd = /home/zhiling/anaconda3/envs/python310/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMywgNCwgNiwgNywgOCwgOV19 --master_addr=127.0.0.1 --master_port=24999 train_ds.py --version=xinlai/LISA-7B-v1 --dataset_dir=/data/zhiling/Dataset/LISA/dataset --vision_pretrained=/home/zhiling/LISA_old/checkpoints/sam/sam_vit_h_4b8939.pth --exp_name=lisa-7b --precision=fp16 --eval_only
[2024-01-17 17:44:35,956] [INFO] [launch.py:80:main] WORLD INFO DICT: {'localhost': [0, 1, 3, 4, 6, 7, 8, 9]}
[2024-01-17 17:44:35,956] [INFO] [launch.py:86:main] nnodes=1, num_local_procs=8, node_rank=0
[2024-01-17 17:44:35,956] [INFO] [launch.py:101:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
[2024-01-17 17:44:35,956] [INFO] [launch.py:102:main] dist_world_size=8
[2024-01-17 17:44:35,956] [INFO] [launch.py:104:main] Setting CUDA_VISIBLE_DEVICES=0,1,3,4,6,7,8,9
The text was updated successfully, but these errors were encountered: