You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(lmdeploy) root@intern-studio:~# lmdeploy chat turbomind /share/temp/model_repos/internlm-chat-7b/ --model-name internlm-chat-7b
model_source: hf_model
WARNING: Can not find tokenizer.json. It may take long time to initialize the tokenizer.
WARNING: Can not find tokenizer.json. It may take long time to initialize the tokenizer.
model_config:
{
"model_name": "internlm-chat-7b",
"tensor_para_size": 1,
"head_num": 32,
"kv_head_num": 32,
"vocab_size": 103168,
"num_layer": 32,
"inter_size": 11008,
"norm_eps": 1e-06,
"attn_bias": 1,
"start_id": 1,
"end_id": 2,
"session_len": 2056,
"weight_type": "fp16",
"rotary_embedding": 128,
"rope_theta": 10000.0,
"size_per_head": 128,
"group_size": 0,
"max_batch_size": 64,
"max_context_token_num": 1,
"step_length": 1,
"cache_max_entry_count": 0.5,
"cache_block_seq_len": 128,
"cache_chunk_size": 1,
"use_context_fmha": 1,
"quant_policy": 0,
"max_position_embeddings": 2048,
"rope_scaling_factor": 0.0,
"use_logn_attn": 0
}
get 323 model params
[WARNING] gemm_config.in is not found; using default GEMM algo
session 1
double enter to end input >>> hello
<|System|>:You are an AI assistant whose name is InternLM (书生·浦语).
InternLM (书生·浦语) is a conversational language model that is developed by Shanghai AI Laboratory (上海人工智能实验室). It is designed to be helpful, honest, and harmless.
InternLM (书生·浦语) can understand and communicate fluently in the language chosen by the user such as English and 中文.
<|User|>:hello
<|Bot|>: [AMP ERROR][CudaFrontend.cpp:94][1705496068:532304]failed to call cuCtxGetDevice(&device), error code: CUDA_ERROR_INVALID_CONTEXT
我按照教程配置了所有的环境内容,但是一运行就报错..
后面我离线转换的也是报错这个
(lmdeploy) root@intern-studio:~# lmdeploy chat turbomind /share/temp/model_repos/internlm-chat-7b/ --model-name internlm-chat-7b
model_source: hf_model
WARNING: Can not find tokenizer.json. It may take long time to initialize the tokenizer.
WARNING: Can not find tokenizer.json. It may take long time to initialize the tokenizer.
model_config:
{
"model_name": "internlm-chat-7b",
"tensor_para_size": 1,
"head_num": 32,
"kv_head_num": 32,
"vocab_size": 103168,
"num_layer": 32,
"inter_size": 11008,
"norm_eps": 1e-06,
"attn_bias": 1,
"start_id": 1,
"end_id": 2,
"session_len": 2056,
"weight_type": "fp16",
"rotary_embedding": 128,
"rope_theta": 10000.0,
"size_per_head": 128,
"group_size": 0,
"max_batch_size": 64,
"max_context_token_num": 1,
"step_length": 1,
"cache_max_entry_count": 0.5,
"cache_block_seq_len": 128,
"cache_chunk_size": 1,
"use_context_fmha": 1,
"quant_policy": 0,
"max_position_embeddings": 2048,
"rope_scaling_factor": 0.0,
"use_logn_attn": 0
}
get 323 model params
[WARNING] gemm_config.in is not found; using default GEMM algo
session 1
double enter to end input >>> hello
<|System|>:You are an AI assistant whose name is InternLM (书生·浦语).
<|User|>:hello
<|Bot|>: [AMP ERROR][CudaFrontend.cpp:94][1705496068:532304]failed to call cuCtxGetDevice(&device), error code: CUDA_ERROR_INVALID_CONTEXT
===============================================
Back trace dump:
/usr/local/harp/lib/libvirtdev-frontend.so.0(LogStream::PrintBacktrace()+0x52) [0x7fc0d92cc302]
/lib/x86_64-linux-gnu/libcuda.so.1(CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend*, int const*)+0x241) [0x7fc0d94fb471]
/lib/x86_64-linux-gnu/libcuda.so.1(python: CudaFrontend.cpp:94: static const string& CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend*, const CUdevice*): Assertion `0' failed.
Aborted (core dumped)
The text was updated successfully, but these errors were encountered: