You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Previous thread: #771
Context
Solution
I envision an architecture in Jan that has the following:
Models Extension
/models
API endpointInference Extension
/chat/completions
, later/audio/speech
)model.json
)Extension for each Inference Engine
/chat/completions
endpointExample
File Tree
model.json
gpt4-32k-1603engine.json
example for NitroExecution Path
llama2-70b-intel-bigdl
Inference Extension
loads themodel.json
forllama2-70b-intel-bigdl
and sees engine isintel-bigdl
Inference Extension
routes it tointel-bigdl
Inference Engine Extensionintel-bigdl
Inference Engine Extension takes in/chat/completions
request, runs inference, and returns result through SSEBeta Was this translation helpful? Give feedback.
All reactions