[Question] mlc_llm serve fails with --speculative-mode, does it require certain hardware? #2350

0xDEADFED5 · 2024-05-16T10:55:13Z

using nightly wheels. i can serve just fine with --speculative-mode disable, but all the other options give me this:

Exception in thread Thread-11 (_background_loop):
Traceback (most recent call last):
  File "C:\Users\ANON\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "C:\Users\ANON\AppData\Local\Programs\Python\Python311\Lib\threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\ANON\repos\AI_Grotto\mlcvenv\Lib\site-packages\mlc_llm\serve\engine_base.py", line 482, in _background_loop
    self._ffi["run_background_loop"]()
  File "C:\Users\ANON\repos\AI_Grotto\mlcvenv\Lib\site-packages\tvm\_ffi\_ctypes\packed_func.py", line 239, in __call__
    raise_last_ffi_error()
  File "C:\Users\ANON\repos\AI_Grotto\mlcvenv\Lib\site-packages\tvm\_ffi\base.py", line 481, in raise_last_ffi_error
    raise py_err
tvm._ffi.base.TVMError: Traceback (most recent call last):
  File "D:\a\package\package\mlc-llm\cpp\serve\engine.cc", line 145
InternalError: Check failed: n->models_.size() > 1U (1 vs. 1) :

does speculative-mode have other requirements?
OS: Windows 11, HW: Intel Arc A770
thanks for the great project, btw.

The text was updated successfully, but these errors were encountered:

MasterJH5574 · 2024-05-28T03:14:15Z

Hi @0xDEADFED5 sorry for the late reply. Speculative decoding works with two models, so only changing --speculative-mode to small_model won't work. Thanks for bringing this up, and we'll improve the error message to avoid the confusion here.

Here's an example command you could use to enable speculative decoding, which uses the 4-bit quantized Llama3 8B model to speculate the unquantized 8B model.

mlc_llm serve "HF://mlc-ai/Llama-3-8B-Instruct-q0f16-MLC" \
  --additional-models "HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC" \
  --speculative-mode "small_draft"

0xDEADFED5 · 2024-06-02T09:49:53Z

interesting! thanks for the reply

0xDEADFED5 added the question Question about the usage label May 16, 2024

0xDEADFED5 closed this as completed Jun 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] mlc_llm serve fails with --speculative-mode, does it require certain hardware? #2350

[Question] mlc_llm serve fails with --speculative-mode, does it require certain hardware? #2350

0xDEADFED5 commented May 16, 2024 •

edited

MasterJH5574 commented May 28, 2024 •

edited

0xDEADFED5 commented Jun 2, 2024

[Question] mlc_llm serve fails with --speculative-mode, does it require certain hardware? #2350

[Question] mlc_llm serve fails with --speculative-mode, does it require certain hardware? #2350

Comments

0xDEADFED5 commented May 16, 2024 • edited

MasterJH5574 commented May 28, 2024 • edited

0xDEADFED5 commented Jun 2, 2024

0xDEADFED5 commented May 16, 2024 •

edited

MasterJH5574 commented May 28, 2024 •

edited