Support loading multiple models #846

Martmists-GH · 2024-05-13T14:21:01Z

At the moment, calling load_model a second time causes it to overwrite the previous model. Instead, load_model should return a pointer to some handle struct, and this handle could then be passed around to the other functions in order to invoke actions on it.

If the goal is to maintain backwards compatibility, the existing functions could reference a static handle address while the core logic supports any number of active handles.

LostRuins · 2024-05-13T14:31:44Z

Actually, the python and C++ methods are not intended to be used by other programs directly as they may be subject to change without warning, rather the API is the preferred way to do it.

But regarding that point - the problem with this is that I don't have a way to effectively free resources taken by a model, which may be partially offloaded to different devices, CPU, GPU etc. Even attempting to unload a dll does not fully release the allocated resources, and the existing deallocation code in GGML leaks memory. The only surefire way to do it right now would be to launch the backend as a separate subprocess, which brings with it the issue of inter-process communication.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support loading multiple models #846

Support loading multiple models #846

Martmists-GH commented May 13, 2024

LostRuins commented May 13, 2024

Support loading multiple models #846

Support loading multiple models #846

Comments

Martmists-GH commented May 13, 2024

LostRuins commented May 13, 2024