You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the runner serving ollama_mistral will just start doing this, bringing down our production service:
:x: there was a session error https://app.tryhelix.ai/session/XXX failed to get response from inference API: Post "http://localhost:41533/v1/chat/completions": dial tcp 127.0.0.1:41533: connect: connection refused
we need to properly run this down and fix the root cause. i've put in a temporary fix here to just exit and restart the runner when we detect this condition, but we need to fix it properly: #241
sometimes, presumably the ollama server exits but we don't handle that by restarting it or cleaning up our record of the model instance.
one issue is that pkg/runner/controller.go is the only call to m.Stop(), which cleans up the modelinstance from r.activeModelInstances
err := m.Stop()
if err != nil {
log.Error().Msgf("error stopping model instance %s: %s", m.ID(), err.Error())
}
r.activeModelInstances.Delete(m.ID())
but nothing cleans up r.activeModelInstances if the ollama instance exits itself. but that might not be the root cause, the ollama instance itself keeps running even when the ollama process exits:
if err := cmd.Wait(); err != nil {
log.Error().Msgf("Ollama model instance exited with error: %s", err.Error())
errMsg := string(stderrBuf.Bytes())
if i.currentSession != nil {
i.errorSession(i.currentSession, fmt.Errorf("%s from cmd - %s", err.Error(), errMsg))
}
return
}
log.Info().Msgf("🟢 Ollama model instance stopped, exit code=%d", cmd.ProcessState.ExitCode())
nothing here causes the goroutine later in Start:
go func() {
for {
select { // ...
to exit, or close the workCh.
so to summarize, if the ollama process exits or is killed:
nothing stops the OllamaModelInstance or closes the channel
nothing deletes the model instance from controller's activeModelInstances
this issue is to properly handle this case without causing any errors to surface to the user (ideally), e.g. by restarting the ollama process if it exits or gives us "connection refused", and by not restarting the entire runner
The text was updated successfully, but these errors were encountered:
the runner serving ollama_mistral will just start doing this, bringing down our production service:
we need to properly run this down and fix the root cause. i've put in a temporary fix here to just exit and restart the runner when we detect this condition, but we need to fix it properly: #241
sometimes, presumably the ollama server exits but we don't handle that by restarting it or cleaning up our record of the model instance.
one issue is that pkg/runner/controller.go is the only call to m.Stop(), which cleans up the modelinstance from r.activeModelInstances
but nothing cleans up r.activeModelInstances if the ollama instance exits itself. but that might not be the root cause, the ollama instance itself keeps running even when the ollama process exits:
nothing here causes the goroutine later in
Start
:to exit, or close the workCh.
so to summarize, if the ollama process exits or is killed:
except the sys.Exit i added in #241
this issue is to properly handle this case without causing any errors to surface to the user (ideally), e.g. by restarting the ollama process if it exits or gives us "connection refused", and by not restarting the entire runner
The text was updated successfully, but these errors were encountered: