New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
大佬,为什么我调试出来回复的速度非常慢呢? #39
Comments
并且控制台报错: ` load model... ` |
可能是显卡性能有点低,可以试试用llama.cpp配合cpu多线程跑的方案 |
内存:32G DDR4
CPU:i7 11700k
显卡:Nvidia GTX 1070
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:30:10_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
python: 3.11
torch==2.3.0+cu121
torchaudio==2.3.0+cu121
torchvision==0.18.0+cu121
使用:streamlit run deploy/web_streamlit_for_instruct.py .\models\llama3-Chinese-chat-8b --theme.base="dark"
命令进入,参数默认情况下,反应非常缓慢。
求教解决方法。谢谢。
The text was updated successfully, but these errors were encountered: