You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the moment, calling load_model a second time causes it to overwrite the previous model. Instead, load_model should return a pointer to some handle struct, and this handle could then be passed around to the other functions in order to invoke actions on it.
If the goal is to maintain backwards compatibility, the existing functions could reference a static handle address while the core logic supports any number of active handles.
The text was updated successfully, but these errors were encountered:
Actually, the python and C++ methods are not intended to be used by other programs directly as they may be subject to change without warning, rather the API is the preferred way to do it.
But regarding that point - the problem with this is that I don't have a way to effectively free resources taken by a model, which may be partially offloaded to different devices, CPU, GPU etc. Even attempting to unload a dll does not fully release the allocated resources, and the existing deallocation code in GGML leaks memory. The only surefire way to do it right now would be to launch the backend as a separate subprocess, which brings with it the issue of inter-process communication.
At the moment, calling load_model a second time causes it to overwrite the previous model. Instead, load_model should return a pointer to some handle struct, and this handle could then be passed around to the other functions in order to invoke actions on it.
If the goal is to maintain backwards compatibility, the existing functions could reference a static handle address while the core logic supports any number of active handles.
The text was updated successfully, but these errors were encountered: