Autoscaling Llama.cpp server clusters in GPU spot instances. #6764
jboero
started this conversation in
Show and tell
Replies: 2 comments 4 replies
-
Great, please have a look to: I believe helm approach is more flexible, but we can also introduce a terraform example. |
Beta Was this translation helpful? Give feedback.
0 replies
-
How does the K8s sample work with GPUs though? Each autoscale VM here adds another GPU (or set of GPUs) to the cluster. As I understand most cloud managed Kubernetes options don't support GPUs or customizing kernel/drivers either. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I wrote up a Terraform module PoC for autoscaling clusters of Llama.cpp server in GCP spot instances with GPUs. Autoscaling is a bit overzealous but it tends to work pretty well. Anyone curious to try I'd love to hear feedback.
https://github.com/jboero/terraform-google-llama-autoscale
Beta Was this translation helpful? Give feedback.
All reactions