-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Dial function with client-go kubernetes.NewForConfig triggers a memory leak #118703
Comments
@aojea: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig apimachinery |
@aojea: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@dmitry-irtegov you will need to update your code to use a We need the caching behavior to be the default otherwise clients like kubectl break by creating too many clientsets which then run out of file descriptors. We can try to make the |
I do not really understand how #117258 approach can be applied to our situation. It might be not clear from my mockup, but in our application there are multiple instances of "MyK8sApiWrapper" for access to different clusters. and they are accessed from different contexts and threads. For same reason, keeping a single instance of clientset is not a realistic option. To avoid leak, transport.Config and DialHolder instances must be tied to MyK8sApiWrapper instance. I thought about this, but not sure if this is a good idea, though cannot formulate why. May be you can dispel my doubts. Also, most of our components are long running services so they will recycle "MyK8sApiWrapper" instances from time to time. Because every instance will have its own cache entry, the cache entries still would leak, just not so fast as now. What really makes me wonder, that my mockup (and our applications) do not seem to leak connections. I never seen it opening the second TCP socket (not even to speak of multiple connections or running out of sockets), and in the pprof I did not see any sign of multiple clientset or net/http objects created. So the connection pooling in client-go seems to do its job right: with repeated NewForConfig it returns the same object or, at least, objects pointing to the same HTTP socket. It is only the TLS cache that is leaking. And this leak does not break the connection pooling it supposed to enhance. Which makes me wonder, do we ever need TLS cache in the existing form. |
This is what I would expect you to do (though not exactly as mentioned below).
This is true of the API server code in #117258 as well (basically, a known limitation). You would solve this by holding onto the
It does leak, as I had to fix them: kubernetes/kubectl#1128 (comment) For now, I would probably just set the For long term improvements to client-go, some options I can see are:
|
Well, this is why I thought the approach of #117258 is not applicable to us (at least, not directly applicable). Our I even managed to display this behavior in my mockup, though there it might not seem logical. Am I correct, that to properly convert |
Yes. |
Thank you for the answer. But now I have another question. I guess I need to use |
kubernetes/staging/src/k8s.io/client-go/kubernetes/clientset.go Lines 460 to 471 in 2e93c65
kubernetes/staging/src/k8s.io/client-go/rest/transport.go Lines 32 to 42 in 2e93c65
Alternatively, you could copy the subset of the transport building logic from: kubernetes/staging/src/k8s.io/client-go/transport/cache.go Lines 74 to 145 in 2e93c65
and then set that transport on kubernetes/staging/src/k8s.io/client-go/transport/transport.go Lines 51 to 52 in 2e93c65
|
/assign |
@dmitry-irtegov Thank you for this, we're hitting the exact same issue in hive. Any chance you can share your implementation of the workaround so I can follow your example? |
When ClusterDeployment.Spec.ControlPlaneConfig.APIServerIPOverride is in use, we experience the memory leak described by kubernetes/kubernetes#118703. That issue has not been resolved at the time of this writing, but describes a couple of workarounds. Try the one described in [this comment](kubernetes/kubernetes#118703 (comment)): > For now, I would probably just set the Proxy func to > utilnet.NewProxierWithNoProxyCIDR(http.ProxyFromEnvironment) so that > the TLS cache is skipped for your code. This isn't guaranteed behavior > and may stop working at any time, but it will at least solve the issue > for you for now (by restoring the old behavior). HIVE-2272
Hello,
It seems that in the current code Does it maybe make sense to implement a time-limited cache so that entries that are not accessed for a certain period of time are removed? If I am not mistaken, the same issue is discussed in #109289. |
I can corroborate this. When we deployed code with the workaround, it fixed envs using DialContext -- but we also saw a long-standing "sawtooth" memory pattern disappear in the envs that weren't. |
Correct, though this has been true long before any of the dial holder changes.
In general we need an approach that make it clear to client-go that consumers are done with a particular cache entry. It is unrelated to time because a client could be in-use but idle. |
I'm sure we're shooting for something clever, but I would be happy with either
or
Again, my use case is in the context of Reconcile loops in controllers that manage many spoke clusters that come and go. Each invocation is idempotent and isolated; and we can't (reliably) inject logic that knows when a specific spoke is deleted. Thus
is not really possible for us. We would rather simply not use the cache, and accept whatever performance hit that entails. |
You can already do the former by setting the proxy func or constructing the transport directly. |
Right, cool, I'm doing that and it's working for me, but I'm concerned about this:
If we can declare this a permanent/supported path, great 👍 That said, it would be neat if there were a way to spell it that was more intuitive. For networking dunces like me, "set a proxy function to " doesn't naturally map to "don't use a cache for an internal thing I can't see". Similar for
(which I didn't even try to understand, tbf) |
Yes, this is true, but it is probably the easiest fix that would remove the leak and at the same time keep most of the benefits of the cache. |
What happened?
After we upgraded k8s.io/client-go package to v0.26.0, we noticed memory leaks in our components that access the Kubernetes API. For some components we managed to remove the leak by rolling back client-go to 0.25.4, but for other components it creates a dependency hell.
After some profiling and debugging I managed to write a mockup code that is roughly based on our production code and reliably triggers the leak in client-go 0.26.0 to 0.27.3. The code is uploaded to https://github.com/kublr/memory_leak_demo . It includes the pprof endpoint so you can verify the leak. The memory allocations happen in transport.(*tlsTransportCache)get method.
The leak is triggered by the presence of a custom Dial function in rest.Config object. As I infer from the commits like #117311 , this usage is now considered incorrect, but we need some guidance or advice how to adapt our production code.
What did you expect to happen?
We expected to use public interfaces without triggering memory leaks in the package.
How can we reproduce it (as minimally and precisely as possible)?
Compile and run the sample code in https://github.com/kublr/memory_leak_demo . Using pprof you can observe growing amount of memory allocated in transport(*tlsTransportCache)get(), and using the debugger you can observe that every call of kubernetes.NewForConfig() creates a new entry in TLS cache.
Anything else we need to know?
As I understand, the real culprit is line
kubernetes/staging/src/k8s.io/client-go/rest/transport.go
Line 115 in fa78f28
Because DialHolder pointer is added to the copy of the original config (at least in the code path originating from kubernetes.NewForConfig), the next calls to kubernetes.NewForConfig will result in the new DialHolder instances.
Because pointer to the DialHolder instance is a part of TLS cache key, this creates a sequence of unique map keys that produce useless cache entries (that will never be a cache hit), plus a memory leak.
I reiterate that the problem is not a memory leak itself, but a broken cache with unique cache keys. I think we would be better to mark configs with custom Dial hook as non-cacheable.
Kubernetes version
We also observed the leaks with k8s versions from 1.22 to 1.26
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: