Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: __init__() got an unexpected keyword argument 'no_args_is_help' #24

Open
SeekPoint opened this issue May 22, 2023 · 0 comments

Comments

@SeekPoint
Copy link

(gh_finetune-gpt2xl) r730ub20@r730ub20-M0:~/llm_dev/finetune-gpt2xl$ deepspeed --num_gpus=1 run_clm.py --deepspeed ds_config.json --model_name_or_path gpt2-xl --train_file train.csv --validation_file validation.csv --do_train --do_eval --fp16 --overwrite_cache --evaluation_strategy="steps" --output_dir finetuned --eval_steps 200 --num_train_epochs 1 --gradient_accumulation_steps 2 --per_device_train_batch_size 1
[2023-05-22 22:00:31,576] [WARNING] [runner.py:191:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-05-22 22:00:31,600] [INFO] [runner.py:541:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None run_clm.py --deepspeed ds_config.json --model_name_or_path gpt2-xl --train_file train.csv --validation_file validation.csv --do_train --do_eval --fp16 --overwrite_cache --evaluation_strategy=steps --output_dir finetuned --eval_steps 200 --num_train_epochs 1 --gradient_accumulation_steps 2 --per_device_train_batch_size 1
[2023-05-22 22:00:33,028] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0]}
[2023-05-22 22:00:33,028] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-05-22 22:00:33,028] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-05-22 22:00:33,028] [INFO] [launch.py:247:main] dist_world_size=1
[2023-05-22 22:00:33,028] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0
[2023-05-22 22:00:34,832] [INFO] [comm.py:622:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
05/22/2023 22:00:34 - WARNING - main - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True
05/22/2023 22:00:34 - INFO - main - Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=ds_config.json,
disable_tqdm=False,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_steps=200,
evaluation_strategy=IntervalStrategy.STEPS,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gradient_accumulation_steps=2,
greater_is_better=None,
group_by_length=False,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_on_each_node=True,
logging_dir=runs/May22_22-00-34_r730ub20-M0,
logging_first_step=False,
logging_steps=500,
logging_strategy=IntervalStrategy.STEPS,
lr_scheduler_type=SchedulerType.LINEAR,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=1.0,
output_dir=finetuned,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=1,
prediction_loss_only=False,
push_to_hub=False,
remove_unused_columns=True,
report_to=['wandb'],
resume_from_checkpoint=None,
run_name=finetuned,
save_steps=500,
save_strategy=IntervalStrategy.STEPS,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_legacy_prediction_loop=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
)
05/22/2023 22:00:36 - WARNING - datasets.builder - Using custom data configuration default-3bfffae691dad1b0
05/22/2023 22:00:36 - WARNING - datasets.builder - Reusing dataset csv (/home/r730ub20/.cache/huggingface/datasets/csv/default-3bfffae691dad1b0/0.0.0/2dc6629a9ff6b5697d82c25b73731dd440507a69cbce8b425db50b751e8fcfd0)
[INFO|configuration_utils.py:517] 2023-05-22 22:00:36,541 >> loading configuration file https://huggingface.co/gpt2-xl/resolve/main/config.json from cache at /home/r730ub20/.cache/huggingface/transformers/d2de8fec009fa9b9196047559bcac6c1f02a9c500718b4346bc516354965b1ca.d684cb2afa3f8c44c73bd67537d9aa5ff6044658793e077d7306ef2e37dd79bd
[INFO|configuration_utils.py:553] 2023-05-22 22:00:36,543 >> Model config GPT2Config {
"activation_function": "gelu_new",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50256,
"embd_pdrop": 0.1,
"eos_token_id": 50256,
"gradient_checkpointing": false,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 1024,
"n_embd": 1600,
"n_head": 25,
"n_inner": null,
"n_layer": 48,
"n_positions": 1024,
"output_past": true,
"resid_pdrop": 0.1,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 50
}
},
"transformers_version": "4.7.0",
"use_cache": true,
"vocab_size": 50257
}

[INFO|configuration_utils.py:517] 2023-05-22 22:00:36,953 >> loading configuration file https://huggingface.co/gpt2-xl/resolve/main/config.json from cache at /home/r730ub20/.cache/huggingface/transformers/d2de8fec009fa9b9196047559bcac6c1f02a9c500718b4346bc516354965b1ca.d684cb2afa3f8c44c73bd67537d9aa5ff6044658793e077d7306ef2e37dd79bd
[INFO|configuration_utils.py:553] 2023-05-22 22:00:36,954 >> Model config GPT2Config {
"activation_function": "gelu_new",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50256,
"embd_pdrop": 0.1,
"eos_token_id": 50256,
"gradient_checkpointing": false,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 1024,
"n_embd": 1600,
"n_head": 25,
"n_inner": null,
"n_layer": 48,
"n_positions": 1024,
"output_past": true,
"resid_pdrop": 0.1,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 50
}
},
"transformers_version": "4.7.0",
"use_cache": true,
"vocab_size": 50257
}

[INFO|tokenization_utils_base.py:1717] 2023-05-22 22:00:39,950 >> loading file https://huggingface.co/gpt2-xl/resolve/main/vocab.json from cache at /home/r730ub20/.cache/huggingface/transformers/8560a2df03f812b276794ae6935255d0590522553a4c8103155472b07591a21b.c7ed1f96aac49e745788faa77ba0a26a392643a50bb388b9c04ff469e555241f
[INFO|tokenization_utils_base.py:1717] 2023-05-22 22:00:39,950 >> loading file https://huggingface.co/gpt2-xl/resolve/main/merges.txt from cache at /home/r730ub20/.cache/huggingface/transformers/18fe27e0b70062b3e45fc4e827d5449d9fe85875937594da927e48cb657366d1.5d12962c5ee615a4c803841266e9c3be9a691a924f72d395d3a6c6c81157788b
[INFO|tokenization_utils_base.py:1717] 2023-05-22 22:00:39,950 >> loading file https://huggingface.co/gpt2-xl/resolve/main/tokenizer.json from cache at /home/r730ub20/.cache/huggingface/transformers/aabb8839163cd911f810ab23f5ae8c966b9b9ea60622c429020611caa389b04b.cf2d0ecb83b6df91b3dbb53f1d1e4c311578bfd3aa0e04934215a49bf9898df0
[INFO|tokenization_utils_base.py:1717] 2023-05-22 22:00:39,950 >> loading file https://huggingface.co/gpt2-xl/resolve/main/added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:1717] 2023-05-22 22:00:39,950 >> loading file https://huggingface.co/gpt2-xl/resolve/main/special_tokens_map.json from cache at None
[INFO|tokenization_utils_base.py:1717] 2023-05-22 22:00:39,950 >> loading file https://huggingface.co/gpt2-xl/resolve/main/tokenizer_config.json from cache at None
[INFO|modeling_utils.py:1152] 2023-05-22 22:00:40,482 >> loading weights file https://huggingface.co/gpt2-xl/resolve/main/pytorch_model.bin from cache at /home/r730ub20/.cache/huggingface/transformers/96569b907e56747ce3e593c6a13d8475b8c733a64aab8af8f602b90d94c4af71.8fbbcdf404c82c5967934d411f1462fa0574d639f2aa398aa3754fced1bb26c0
[INFO|modeling_utils.py:1336] 2023-05-22 22:00:58,095 >> All model checkpoint weights were used when initializing GPT2LMHeadModel.

[INFO|modeling_utils.py:1344] 2023-05-22 22:00:58,095 >> All the weights of GPT2LMHeadModel were initialized from the model checkpoint at gpt2-xl.
If your task is similar to the task the model of the checkpoint was trained on, you can already use GPT2LMHeadModel for predictions without further training.
05/22/2023 22:00:58 - WARNING - datasets.fingerprint - Parameter 'function'=<function main..tokenize_function at 0x7f2363e61af0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
0%| | 0/1 [00:00<?, ?ba/s][WARNING|tokenization_utils_base.py:3171] 2023-05-22 22:01:02,910 >> Token indices sequence length is longer than the specified maximum sequence length for this model (1462828 > 1024). Running this sequence through the model will result in indexing errors
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00, 5.10s/ba]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 61.16ba/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.46s/ba]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 194.43ba/s]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[INFO|trainer.py:414] 2023-05-22 22:01:05,456 >> Using amp fp16 backend
[2023-05-22 22:01:05,461] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.2, git-hash=unknown, git-branch=unknown
[2023-05-22 22:01:05,462] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter cpu_offload is deprecated use offload_optimizer instead
[2023-05-22 22:01:10,928] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Installed CUDA version 11.7 does not match the version torch was compiled with 11.3 but since the APIs are compatible, accepting this combination
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Installed CUDA version 11.7 does not match the version torch was compiled with 11.3 but since the APIs are compatible, accepting this combination
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Using /home/r730ub20/.cache/torch_extensions/py38_cu113 as PyTorch extensions root...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Detected CUDA files, patching ldflags
Emitting ninja build file /home/r730ub20/.cache/torch_extensions/py38_cu113/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 3.7361702919006348 seconds
Adam Optimizer #0 is created with AVX2 arithmetic capability.
Config: alpha=0.000050, betas=(0.900000, 0.999000), weight_decay=0.000000, adam_w=1
[2023-05-22 22:01:17,581] [INFO] [logging.py:96:log_dist] [Rank 0] Using DeepSpeed Optimizer param name adamw as basic optimizer
[2023-05-22 22:01:17,630] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam
[2023-05-22 22:01:17,630] [INFO] [utils.py:54:is_zero_supported_optimizer] Checking ZeRO support for optimizer=DeepSpeedCPUAdam type=<class 'deepspeed.ops.adam.cpu_adam.DeepSpeedCPUAdam'>
[2023-05-22 22:01:17,630] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer
[2023-05-22 22:01:17,630] [INFO] [stage_1_and_2.py:133:init] Reduce bucket size 200000000
[2023-05-22 22:01:17,630] [INFO] [stage_1_and_2.py:134:init] Allgather bucket size 200000000
[2023-05-22 22:01:17,630] [INFO] [stage_1_and_2.py:135:init] CPU Offload: True
[2023-05-22 22:01:17,630] [INFO] [stage_1_and_2.py:136:init] Round robin gradient partitioning: False
Using /home/r730ub20/.cache/torch_extensions/py38_cu113 as PyTorch extensions root...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Emitting ninja build file /home/r730ub20/.cache/torch_extensions/py38_cu113/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
ninja: no work to do.
Loading extension module utils...
Time to load utils op: 0.6310455799102783 seconds
Rank: 0 partition count [1] and sizes[(1557611200, False)]
[2023-05-22 22:01:24,621] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states
[2023-05-22 22:01:24,622] [INFO] [utils.py:786:see_memory_usage] MA 3.1 GB Max_MA 3.1 GB CA 3.1 GB Max_CA 3 GB
[2023-05-22 22:01:24,623] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 18.32 GB, percent = 7.3%
[2023-05-22 22:01:31,310] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states
[2023-05-22 22:01:31,311] [INFO] [utils.py:786:see_memory_usage] MA 3.1 GB Max_MA 3.1 GB CA 3.1 GB Max_CA 3 GB
[2023-05-22 22:01:31,311] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 35.84 GB, percent = 14.2%
[2023-05-22 22:01:31,311] [INFO] [stage_1_and_2.py:489:init] optimizer state initialized
[2023-05-22 22:01:31,369] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer
[2023-05-22 22:01:31,370] [INFO] [utils.py:786:see_memory_usage] MA 3.1 GB Max_MA 3.1 GB CA 3.1 GB Max_CA 3 GB
[2023-05-22 22:01:31,370] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 35.84 GB, percent = 14.2%
[2023-05-22 22:01:31,386] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = adamw
[2023-05-22 22:01:31,386] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using configured LR scheduler = WarmupLR
[2023-05-22 22:01:31,386] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <deepspeed.runtime.lr_schedules.WarmupLR object at 0x7f22265c1040>
[2023-05-22 22:01:31,386] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[5e-05], mom=[[0.9, 0.999]]
[2023-05-22 22:01:31,387] [INFO] [config.py:955:print] DeepSpeedEngine configuration:
[2023-05-22 22:01:31,387] [INFO] [config.py:959:print] activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
[2023-05-22 22:01:31,387] [INFO] [config.py:959:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-05-22 22:01:31,387] [INFO] [config.py:959:print] amp_enabled .................. False
[2023-05-22 22:01:31,387] [INFO] [config.py:959:print] amp_params ................... False
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] autotuning_config ............ {
"enabled": false,
"start_step": null,
"end_step": null,
"metric_path": null,
"arg_mappings": null,
"metric": "throughput",
"model_info": null,
"results_dir": "autotuning_results",
"exps_dir": "autotuning_exps",
"overwrite": true,
"fast": true,
"start_profile_step": 3,
"end_profile_step": 5,
"tuner_type": "gridsearch",
"tuner_early_stopping": 5,
"tuner_num_trials": 50,
"model_info_path": null,
"mp_size": 1,
"max_train_batch_size": null,
"min_train_batch_size": 1,
"max_train_micro_batch_size_per_gpu": 1.024000e+03,
"min_train_micro_batch_size_per_gpu": 1,
"num_tuning_micro_batch_sizes": 3
}
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] bfloat16_enabled ............. False
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] checkpoint_parallel_write_pipeline False
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] checkpoint_tag_validation_enabled True
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] checkpoint_tag_validation_fail False
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f2032c4a580>
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] communication_data_type ...... None
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] curriculum_enabled_legacy .... False
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] curriculum_params_legacy ..... False
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] data_efficiency_enabled ...... False
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] dataloader_drop_last ......... False
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] disable_allgather ............ False
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] dump_state ................... False
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 1000, 'delayed_shift': 2, 'min_scale': 1}
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] eigenvalue_enabled ........... False
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] eigenvalue_gas_boundary_resolution 1
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] eigenvalue_layer_name ........ bert.encoder.layer
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] eigenvalue_layer_num ......... 0
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] eigenvalue_max_iter .......... 100
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] eigenvalue_stability ......... 1e-06
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] eigenvalue_tol ............... 0.01
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] eigenvalue_verbose ........... False
[2023-05-22 22:01:31,388] [INFO] [config.py:959:print] elasticity_enabled ........... False
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] flops_profiler_config ........ {
"enabled": false,
"profile_step": 1,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] fp16_auto_cast ............... False
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] fp16_enabled ................. True
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] fp16_master_weights_and_gradients False
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] global_rank .................. 0
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] grad_accum_dtype ............. None
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] gradient_accumulation_steps .. 2
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] gradient_clipping ............ 1.0
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] gradient_predivide_factor .... 1.0
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] initial_dynamic_scale ........ 65536
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] load_universal_checkpoint .... False
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] loss_scale ................... 0
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] memory_breakdown ............. False
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] mics_hierarchial_params_gather False
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] mics_shard_size .............. -1
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] nebula_config ................ {
"enabled": false,
"persistent_storage_path": null,
"persistent_time_interval": 100,
"num_of_version_in_retention": 2,
"enable_nebula_load": true,
"load_path": null
}
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] optimizer_legacy_fusion ...... False
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] optimizer_name ............... adamw
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] optimizer_params ............. {'lr': 5e-05, 'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0.0}
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-05-22 22:01:31,389] [INFO] [config.py:959:print] pld_enabled .................. False
[2023-05-22 22:01:31,390] [INFO] [config.py:959:print] pld_params ................... False
[2023-05-22 22:01:31,390] [INFO] [config.py:959:print] prescale_gradients ........... False
[2023-05-22 22:01:31,390] [INFO] [config.py:959:print] scheduler_name ............... WarmupLR
[2023-05-22 22:01:31,390] [INFO] [config.py:959:print] scheduler_params ............. {'warmup_min_lr': 0, 'warmup_max_lr': 5e-05, 'warmup_num_steps': 0}
[2023-05-22 22:01:31,390] [INFO] [config.py:959:print] sparse_attention ............. None
[2023-05-22 22:01:31,390] [INFO] [config.py:959:print] sparse_gradients_enabled ..... False
[2023-05-22 22:01:31,390] [INFO] [config.py:959:print] steps_per_print .............. 2000
[2023-05-22 22:01:31,390] [INFO] [config.py:959:print] train_batch_size ............. 2
[2023-05-22 22:01:31,390] [INFO] [config.py:959:print] train_micro_batch_size_per_gpu 1
[2023-05-22 22:01:31,390] [INFO] [config.py:959:print] use_node_local_storage ....... False
[2023-05-22 22:01:31,390] [INFO] [config.py:959:print] wall_clock_breakdown ......... False
[2023-05-22 22:01:31,390] [INFO] [config.py:959:print] world_size ................... 1
[2023-05-22 22:01:31,390] [INFO] [config.py:959:print] zero_allow_untested_optimizer False
[2023-05-22 22:01:31,390] [INFO] [config.py:959:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=200000000 allgather_partitions=True allgather_bucket_size=200000000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='cpu', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True
[2023-05-22 22:01:31,390] [INFO] [config.py:959:print] zero_enabled ................. True
[2023-05-22 22:01:31,390] [INFO] [config.py:959:print] zero_force_ds_cpu_optimizer .. True
[2023-05-22 22:01:31,390] [INFO] [config.py:959:print] zero_optimization_stage ...... 2
[2023-05-22 22:01:31,390] [INFO] [config.py:945:print_user_config] json = {
"fp16": {
"enabled": true,
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
},
"optimizer": {
"type": "AdamW",
"params": {
"lr": 5e-05,
"betas": [0.9, 0.999],
"eps": 1e-08,
"weight_decay": 0.0
}
},
"scheduler": {
"type": "WarmupLR",
"params": {
"warmup_min_lr": 0,
"warmup_max_lr": 5e-05,
"warmup_num_steps": 0
}
},
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 2.000000e+08,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 2.000000e+08,
"contiguous_gradients": true,
"cpu_offload": true
},
"gradient_accumulation_steps": 2,
"gradient_clipping": 1.0,
"steps_per_print": 2.000000e+03,
"train_batch_size": 2,
"train_micro_batch_size_per_gpu": 1,
"wall_clock_breakdown": false
}
Using /home/r730ub20/.cache/torch_extensions/py38_cu113 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0004444122314453125 seconds
[INFO|trainer.py:1147] 2023-05-22 22:01:31,391 >> ***** Running training *****
[INFO|trainer.py:1148] 2023-05-22 22:01:31,391 >> Num examples = 11428
[INFO|trainer.py:1149] 2023-05-22 22:01:31,391 >> Num Epochs = 1
[INFO|trainer.py:1150] 2023-05-22 22:01:31,391 >> Instantaneous batch size per device = 1
[INFO|trainer.py:1151] 2023-05-22 22:01:31,391 >> Total train batch size (w. parallel, distributed & accumulation) = 2
[INFO|trainer.py:1152] 2023-05-22 22:01:31,391 >> Gradient Accumulation steps = 2
[INFO|trainer.py:1153] 2023-05-22 22:01:31,391 >> Total optimization steps = 5714
[INFO|integrations.py:402] 2023-05-22 22:01:31,393 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/r730ub20/.local/lib/python3.8/site-packages/wandb/main.py", line 1, in
from wandb.cli import cli
File "/home/r730ub20/.local/lib/python3.8/site-packages/wandb/cli/cli.py", line 933, in
def launch_sweep(
File "/usr/lib/python3/dist-packages/click/core.py", line 1234, in decorator
cmd = command(*args, **kwargs)(f)
File "/usr/lib/python3/dist-packages/click/decorators.py", line 115, in decorator
cmd = _make_command(f, name, attrs, cls)
File "/usr/lib/python3/dist-packages/click/decorators.py", line 88, in make_command
return cls(name=name or f.name.lower().replace('
', '-'),
TypeError: init() got an unexpected keyword argument 'no_args_is_help'
Traceback (most recent call last):
File "run_clm.py", line 478, in
main()
File "run_clm.py", line 441, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/r730ub20/.local/lib/python3.8/site-packages/transformers/trainer.py", line 1207, in train
self.control = self.callback_handler.on_train_begin(args, self.state, self.control)
File "/home/r730ub20/.local/lib/python3.8/site-packages/transformers/trainer_callback.py", line 340, in on_train_begin
return self.call_event("on_train_begin", args, state, control)
File "/home/r730ub20/.local/lib/python3.8/site-packages/transformers/trainer_callback.py", line 378, in call_event
result = getattr(callback, event)(
File "/home/r730ub20/.local/lib/python3.8/site-packages/transformers/integrations.py", line 446, in on_train_begin
self.setup(args, state, model, **kwargs)
File "/home/r730ub20/.local/lib/python3.8/site-packages/transformers/integrations.py", line 419, in setup
self._wandb.init(
File "/home/r730ub20/.local/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 1169, in init
raise e
File "/home/r730ub20/.local/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 1146, in init
wi.setup(kwargs)
File "/home/r730ub20/.local/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 172, in setup
self._wl = wandb_setup.setup(settings=setup_settings)
File "/home/r730ub20/.local/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 327, in setup
ret = _setup(settings=settings)
File "/home/r730ub20/.local/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 320, in _setup
wl = _WandbSetup(settings=settings)
File "/home/r730ub20/.local/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 303, in init
_WandbSetup._instance = _WandbSetup__WandbSetup(settings=settings, pid=pid)
File "/home/r730ub20/.local/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 114, in init
self._setup()
File "/home/r730ub20/.local/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 250, in _setup
self._setup_manager()
File "/home/r730ub20/.local/lib/python3.8/site-packages/wandb/sdk/wandb_setup.py", line 277, in _setup_manager
self._manager = wandb_manager._Manager(settings=self._settings)
File "/home/r730ub20/.local/lib/python3.8/site-packages/wandb/sdk/wandb_manager.py", line 145, in init
self._service.start()
File "/home/r730ub20/.local/lib/python3.8/site-packages/wandb/sdk/service/service.py", line 199, in start
self._launch_server()
File "/home/r730ub20/.local/lib/python3.8/site-packages/wandb/sdk/service/service.py", line 193, in _launch_server
_sentry.reraise(e)
File "/home/r730ub20/.local/lib/python3.8/site-packages/wandb/analytics/sentry.py", line 146, in reraise
raise exc.with_traceback(sys.exc_info()[2])
File "/home/r730ub20/.local/lib/python3.8/site-packages/wandb/sdk/service/service.py", line 191, in _launch_server
self._wait_for_ports(fname, proc=internal_proc)
File "/home/r730ub20/.local/lib/python3.8/site-packages/wandb/sdk/service/service.py", line 116, in _wait_for_ports
raise ServiceStartProcessError(
wandb.sdk.service.service.ServiceStartProcessError: The wandb service process exited with 1. Ensure that sys.executable is a valid python interpreter. You can override it with the _executable setting or with the WANDB__EXECUTABLE environment variable.
[2023-05-22 22:01:38,113] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 10431
[2023-05-22 22:01:38,114] [ERROR] [launch.py:434:sigkill_handler] ['/usr/bin/python3', '-u', 'run_clm.py', '--local_rank=0', '--deepspeed', 'ds_config.json', '--model_name_or_path', 'gpt2-xl', '--train_file', 'train.csv', '--validation_file', 'validation.csv', '--do_train', '--do_eval', '--fp16', '--overwrite_cache', '--evaluation_strategy=steps', '--output_dir', 'finetuned', '--eval_steps', '200', '--num_train_epochs', '1', '--gradient_accumulation_steps', '2', '--per_device_train_batch_size', '1'] exits with return code = 1
(gh_finetune-gpt2xl) r730ub20@r730ub20-M0:~/llm_dev/finetune-gpt2xl$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant