大佬，为什么我调试出来回复的速度非常慢呢？ #39

condywl · 2024-05-06T06:25:41Z

内存：32G DDR4
CPU：i7 11700k
显卡：Nvidia GTX 1070

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:30:10_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

python: 3.11
torch==2.3.0+cu121
torchaudio==2.3.0+cu121
torchvision==0.18.0+cu121

使用：streamlit run deploy/web_streamlit_for_instruct.py .\models\llama3-Chinese-chat-8b --theme.base="dark"
命令进入，参数默认情况下，反应非常缓慢。
求教解决方法。谢谢。

condywl · 2024-05-06T06:26:37Z

并且控制台报错：

`

load model...
load model end.
D:\Development-Projects\Python-Projects\ShareAI\deploy\web_streamlit_for_instruct.py:80: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn( # pylint: disable=W4902
--- Logging error ---
Traceback (most recent call last):
File "D:\Soft\Python3\Lib\logging_init_.py", line 1110, in emit
msg = self.format(record)
^^^^^^^^^^^^^^^^^^^
File "D:\Soft\Python3\Lib\logging_init_.py", line 953, in format
return fmt.format(record)
^^^^^^^^^^^^^^^^^^
File "D:\Soft\Python3\Lib\logging_init_.py", line 687, in format
record.message = record.getMessage()
^^^^^^^^^^^^^^^^^^^
File "D:\Soft\Python3\Lib\logging_init_.py", line 377, in getMessage
msg = msg % self.args
~~~~^~~~~~~~~~~
TypeError: not all arguments converted during string formatting
Call stack:
File "D:\Soft\Python3\Lib\threading.py", line 1002, in _bootstrap
self._bootstrap_inner()
File "D:\Soft\Python3\Lib\threading.py", line 1045, in _bootstrap_inner
self.run()
File "D:\Soft\Python3\Lib\threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "D:\Development-Projects\Python-Projects\ShareAI.venv\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 307, in _run_script_thread
self._run_script(request.rerun_data)
File "D:\Development-Projects\Python-Projects\ShareAI.venv\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 600, in _run_script
exec(code, module.dict)
File "D:\Development-Projects\Python-Projects\ShareAI\deploy\web_streamlit_for_instruct.py", line 314, in
main(model_name_or_path, adapter_name_or_path)
File "D:\Development-Projects\Python-Projects\ShareAI\deploy\web_streamlit_for_instruct.py", line 288, in main
for cur_response in generate_interactive(
File "D:\Development-Projects\Python-Projects\ShareAI.venv\Lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "D:\Development-Projects\Python-Projects\ShareAI\deploy\web_streamlit_for_instruct.py", line 80, in generate_interactive
logger.warn( # pylint: disable=W4902
Message: "Both 'max_new_tokens' (=660) and 'max_length'(=862) seem to have been set. 'max_new_tokens' will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)"
Arguments: (<class 'UserWarning'>,)

`

CrazyBoyM · 2024-05-07T01:50:18Z

可能是显卡性能有点低，可以试试用llama.cpp配合cpu多线程跑的方案

CrazyBoyM closed this as completed May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

大佬，为什么我调试出来回复的速度非常慢呢？ #39

大佬，为什么我调试出来回复的速度非常慢呢？ #39

condywl commented May 6, 2024

condywl commented May 6, 2024 •

edited

CrazyBoyM commented May 7, 2024

大佬，为什么我调试出来回复的速度非常慢呢？ #39

大佬，为什么我调试出来回复的速度非常慢呢？ #39

Comments

condywl commented May 6, 2024

condywl commented May 6, 2024 • edited

CrazyBoyM commented May 7, 2024

condywl commented May 6, 2024 •

edited