Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support loading multiple models #846

Open
Martmists-GH opened this issue May 13, 2024 · 1 comment
Open

Support loading multiple models #846

Martmists-GH opened this issue May 13, 2024 · 1 comment

Comments

@Martmists-GH
Copy link

At the moment, calling load_model a second time causes it to overwrite the previous model. Instead, load_model should return a pointer to some handle struct, and this handle could then be passed around to the other functions in order to invoke actions on it.

If the goal is to maintain backwards compatibility, the existing functions could reference a static handle address while the core logic supports any number of active handles.

@LostRuins
Copy link
Owner

Actually, the python and C++ methods are not intended to be used by other programs directly as they may be subject to change without warning, rather the API is the preferred way to do it.

But regarding that point - the problem with this is that I don't have a way to effectively free resources taken by a model, which may be partially offloaded to different devices, CPU, GPU etc. Even attempting to unload a dll does not fully release the allocated resources, and the existing deallocation code in GGML leaks memory. The only surefire way to do it right now would be to launch the backend as a separate subprocess, which brings with it the issue of inter-process communication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants