Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hub mode performance improve: use watch-list to replace regular query all service one by one #3393

Open
kylinsoong opened this issue Apr 26, 2024 · 5 comments

Comments

@kylinsoong
Copy link

Hub mode performance improve

Currently Hub mode need more than 30 seconds to synchronize a POD IP change to BIG-IP, need increase performance.

Description

Current CIS Hub mode query all service(configured in configmap) one by one every 30 seconds, which means a service change may need long time to synchronize to BIG-IP.

        if appMgr.hubMode {
                // Leaving the old way for hubMode for now.
                svcListOptions := metav1.ListOptions{
                        LabelSelector: selector,
                }
        
                var err error
                // Identify services that matches the given label
                var services *v1.ServiceList
        
                services, err = appMgr.kubeClient.CoreV1().Services(v1.NamespaceAll).List(context.TODO(), svcListOptions)
        
                if err != nil {
                        log.Errorf("[CORE] Error getting service list. %v", err)
                        return nil, err
                }
                svcItems = services.Items

Too many API query can cause throttling request, decrease the whole system reliable, k8s client go watch-list and cache sdk are wide used to increase the reliability, CIS not hub mode already use this mechenism.

So I request use watch-list to improve hub mode performance. what the initial thoughts are:

  • create a informer watch the service per namespace(Fortinet has a similiar controller, and keep firewall policy in hub namespace)
  • create namespace informer to watch service in a fine-grained way, which only watch the namespace that are referenced in hub configmap
@trinaths trinaths added In Review and removed untriaged no JIRA created labels Apr 29, 2024
@trinaths
Copy link
Contributor

trinaths commented May 7, 2024

Created [CONTCNTR-4719] for internal tracking.

@trinaths trinaths added JIRA and removed In Review labels May 7, 2024
@kylinsoong
Copy link
Author

Hi @trinaths, after some research, this performance not only for Hub mode, but also for all configmap mode.

The appManager has a agentCfgMap which contains all configmap in all namespace, each time the appManager deploy resource, all configmap be send to Channel,

for _, cm := range appMgr.agentCfgMap {
	agentCfgMapLst = append(agentCfgMapLst, cm)
}

After the asManager received the request message from Channel, then it iterator all tenants from all configmaps, for each tenants related services, the asManager will execute a query endpoints opeartion this is the reason why syn POD IP changes to BIG-IP is so slowly once the configmap related services is large.

For no-hub mode configmap, the time spend to syn a POD change is:

Time_as3_update + Time_query_endpoint * Number_of_services

For hub mode configmap, the time spend to syn a POD change is:

Time_as3_update + Time_query_endpoint * Number_of_services + RAND(30)

NOTE: one time query endpoints means CIS query API server, if the numbers of the services is large, then to total time is large.

I also have another finding, each time a service endpoint change(POD IP change), will trigger 2 kinds of query, one is query the related service, another is query all services.

I would like request take a high priority to increase CIS configmap mode performance.

@trinaths
Copy link
Contributor

@kylinsoong Thanks for sharing your findings. Can you also look into migrating from ConfigMaps to CRDs ? and share your findings.

@kylinsoong
Copy link
Author

@trinaths Thanks for updates, I have tried to convinced the customer, but they insist to increase configmap mode performace, I will involved senior manager both from F5 side and Customer's side to get a the direction which to address this problem.

@kylinsoong
Copy link
Author

I have made a small source code change, which use warch-list to replace rest api query in hub mode, and found the performance are increased 95%.

TEST ENV:

  • Total number of namespaces: 20
  • Total number of applications: 200
  • Total number of VS in BIG-IP: 200

TEST PROCEDURE:
Execute kubectl scale deploy and change one of 200 app's replicas many times and got the below results:

Times 1 2 3 4 5 Avg
POD Scale(query api server) 124 106 72 71 104 95
POD Scale(watch-list) 6 4 5 6 5 5

Note that the time spend to syn POD scale in watch-list is 5 seconds, but if use query api server it spend 95 senconds.

The test is demonstrated that switch hub-mod to watch-list is meaningful.

kylinsoong added a commit to kylinsoong/k8s-bigip-ctlr that referenced this issue May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants