Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

大佬,为什么我调试出来回复的速度非常慢呢? #39

Closed
condywl opened this issue May 6, 2024 · 2 comments
Closed

大佬,为什么我调试出来回复的速度非常慢呢? #39

condywl opened this issue May 6, 2024 · 2 comments

Comments

@condywl
Copy link

condywl commented May 6, 2024

内存:32G DDR4
CPU:i7 11700k
显卡:Nvidia GTX 1070

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:30:10_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

python: 3.11
torch==2.3.0+cu121
torchaudio==2.3.0+cu121
torchvision==0.18.0+cu121

使用:streamlit run deploy/web_streamlit_for_instruct.py .\models\llama3-Chinese-chat-8b --theme.base="dark"
命令进入,参数默认情况下,反应非常缓慢。
求教解决方法。谢谢。

@condywl
Copy link
Author

condywl commented May 6, 2024

并且控制台报错:

`

load model...
load model end.
D:\Development-Projects\Python-Projects\ShareAI\deploy\web_streamlit_for_instruct.py:80: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn( # pylint: disable=W4902
--- Logging error ---
Traceback (most recent call last):
File "D:\Soft\Python3\Lib\logging_init_.py", line 1110, in emit
msg = self.format(record)
^^^^^^^^^^^^^^^^^^^
File "D:\Soft\Python3\Lib\logging_init_.py", line 953, in format
return fmt.format(record)
^^^^^^^^^^^^^^^^^^
File "D:\Soft\Python3\Lib\logging_init_.py", line 687, in format
record.message = record.getMessage()
^^^^^^^^^^^^^^^^^^^
File "D:\Soft\Python3\Lib\logging_init_.py", line 377, in getMessage
msg = msg % self.args
~~~~^~~~~~~~~~~
TypeError: not all arguments converted during string formatting
Call stack:
File "D:\Soft\Python3\Lib\threading.py", line 1002, in _bootstrap
self._bootstrap_inner()
File "D:\Soft\Python3\Lib\threading.py", line 1045, in _bootstrap_inner
self.run()
File "D:\Soft\Python3\Lib\threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "D:\Development-Projects\Python-Projects\ShareAI.venv\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 307, in _run_script_thread
self._run_script(request.rerun_data)
File "D:\Development-Projects\Python-Projects\ShareAI.venv\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 600, in _run_script
exec(code, module.dict)
File "D:\Development-Projects\Python-Projects\ShareAI\deploy\web_streamlit_for_instruct.py", line 314, in
main(model_name_or_path, adapter_name_or_path)
File "D:\Development-Projects\Python-Projects\ShareAI\deploy\web_streamlit_for_instruct.py", line 288, in main
for cur_response in generate_interactive(
File "D:\Development-Projects\Python-Projects\ShareAI.venv\Lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "D:\Development-Projects\Python-Projects\ShareAI\deploy\web_streamlit_for_instruct.py", line 80, in generate_interactive
logger.warn( # pylint: disable=W4902
Message: "Both 'max_new_tokens' (=660) and 'max_length'(=862) seem to have been set. 'max_new_tokens' will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)"
Arguments: (<class 'UserWarning'>,)

`

@CrazyBoyM
Copy link
Owner

可能是显卡性能有点低,可以试试用llama.cpp配合cpu多线程跑的方案

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants