You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
using nightly wheels. i can serve just fine with --speculative-mode disable, but all the other options give me this:
Exception in thread Thread-11 (_background_loop):
Traceback (most recent call last):
File "C:\Users\ANON\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 1045, in _bootstrap_inner
self.run()
File "C:\Users\ANON\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\ANON\repos\AI_Grotto\mlcvenv\Lib\site-packages\mlc_llm\serve\engine_base.py", line 482, in _background_loop
self._ffi["run_background_loop"]()
File "C:\Users\ANON\repos\AI_Grotto\mlcvenv\Lib\site-packages\tvm\_ffi\_ctypes\packed_func.py", line 239, in __call__
raise_last_ffi_error()
File "C:\Users\ANON\repos\AI_Grotto\mlcvenv\Lib\site-packages\tvm\_ffi\base.py", line 481, in raise_last_ffi_error
raise py_err
tvm._ffi.base.TVMError: Traceback (most recent call last):
File "D:\a\package\package\mlc-llm\cpp\serve\engine.cc", line 145
InternalError: Check failed: n->models_.size() > 1U (1 vs. 1) :
does speculative-mode have other requirements?
OS: Windows 11, HW: Intel Arc A770
thanks for the great project, btw.
The text was updated successfully, but these errors were encountered:
Hi @0xDEADFED5 sorry for the late reply. Speculative decoding works with two models, so only changing --speculative-mode to small_model won't work. Thanks for bringing this up, and we'll improve the error message to avoid the confusion here.
Here's an example command you could use to enable speculative decoding, which uses the 4-bit quantized Llama3 8B model to speculate the unquantized 8B model.
using nightly wheels. i can serve just fine with --speculative-mode disable, but all the other options give me this:
does speculative-mode have other requirements?
OS: Windows 11, HW: Intel Arc A770
thanks for the great project, btw.
The text was updated successfully, but these errors were encountered: