Hub mode performance improve: use watch-list to replace regular query all service one by one #3393

kylinsoong · 2024-04-26T03:17:14Z

Hub mode performance improve

Currently Hub mode need more than 30 seconds to synchronize a POD IP change to BIG-IP, need increase performance.

Description

Current CIS Hub mode query all service(configured in configmap) one by one every 30 seconds, which means a service change may need long time to synchronize to BIG-IP.

        if appMgr.hubMode {
                // Leaving the old way for hubMode for now.
                svcListOptions := metav1.ListOptions{
                        LabelSelector: selector,
                }
        
                var err error
                // Identify services that matches the given label
                var services *v1.ServiceList
        
                services, err = appMgr.kubeClient.CoreV1().Services(v1.NamespaceAll).List(context.TODO(), svcListOptions)
        
                if err != nil {
                        log.Errorf("[CORE] Error getting service list. %v", err)
                        return nil, err
                }
                svcItems = services.Items

Too many API query can cause throttling request, decrease the whole system reliable, k8s client go watch-list and cache sdk are wide used to increase the reliability, CIS not hub mode already use this mechenism.

So I request use watch-list to improve hub mode performance. what the initial thoughts are:

create a informer watch the service per namespace(Fortinet has a similiar controller, and keep firewall policy in hub namespace)
create namespace informer to watch service in a fine-grained way, which only watch the namespace that are referenced in hub configmap

The text was updated successfully, but these errors were encountered:

trinaths · 2024-05-07T07:29:33Z

Created [CONTCNTR-4719] for internal tracking.

kylinsoong · 2024-05-22T08:55:01Z

Hi @trinaths, after some research, this performance not only for Hub mode, but also for all configmap mode.

The appManager has a agentCfgMap which contains all configmap in all namespace, each time the appManager deploy resource, all configmap be send to Channel,

for _, cm := range appMgr.agentCfgMap {
	agentCfgMapLst = append(agentCfgMapLst, cm)
}

After the asManager received the request message from Channel, then it iterator all tenants from all configmaps, for each tenants related services, the asManager will execute a query endpoints opeartion this is the reason why syn POD IP changes to BIG-IP is so slowly once the configmap related services is large.

For no-hub mode configmap, the time spend to syn a POD change is:

Time_as3_update + Time_query_endpoint * Number_of_services

For hub mode configmap, the time spend to syn a POD change is:

Time_as3_update + Time_query_endpoint * Number_of_services + RAND(30)

NOTE: one time query endpoints means CIS query API server, if the numbers of the services is large, then to total time is large.

I also have another finding, each time a service endpoint change(POD IP change), will trigger 2 kinds of query, one is query the related service, another is query all services.

I would like request take a high priority to increase CIS configmap mode performance.

trinaths · 2024-05-22T09:04:09Z

@kylinsoong Thanks for sharing your findings. Can you also look into migrating from ConfigMaps to CRDs ? and share your findings.

kylinsoong · 2024-05-22T10:13:48Z

@trinaths Thanks for updates, I have tried to convinced the customer, but they insist to increase configmap mode performace, I will involved senior manager both from F5 side and Customer's side to get a the direction which to address this problem.

kylinsoong · 2024-05-22T11:37:34Z

I have made a small source code change, which use warch-list to replace rest api query in hub mode, and found the performance are increased 95%.

TEST ENV:

Total number of namespaces: 20
Total number of applications: 200
Total number of VS in BIG-IP: 200

TEST PROCEDURE:
Execute kubectl scale deploy and change one of 200 app's replicas many times and got the below results:

Times	1	2	3	4	5	Avg
POD Scale(query api server)	124	106	72	71	104	95
POD Scale(watch-list)	6	4	5	6	5	5

Note that the time spend to syn POD scale in watch-list is 5 seconds, but if use query api server it spend 95 senconds.

The test is demonstrated that switch hub-mod to watch-list is meaningful.

kylinsoong added feature-request untriaged no JIRA created labels Apr 26, 2024

trinaths added In Review and removed untriaged no JIRA created labels Apr 29, 2024

trinaths added JIRA and removed In Review labels May 7, 2024

kylinsoong added a commit to kylinsoong/k8s-bigip-ctlr that referenced this issue May 23, 2024

F5Networks#3393: hub mode support watch-list

45fff77

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hub mode performance improve: use watch-list to replace regular query all service one by one #3393

Hub mode performance improve: use watch-list to replace regular query all service one by one #3393

kylinsoong commented Apr 26, 2024

trinaths commented May 7, 2024

kylinsoong commented May 22, 2024

trinaths commented May 22, 2024

kylinsoong commented May 22, 2024

kylinsoong commented May 22, 2024

Hub mode performance improve: use watch-list to replace regular query all service one by one #3393

Hub mode performance improve: use watch-list to replace regular query all service one by one #3393

Comments

kylinsoong commented Apr 26, 2024

Hub mode performance improve

Description

trinaths commented May 7, 2024

kylinsoong commented May 22, 2024

trinaths commented May 22, 2024

kylinsoong commented May 22, 2024

kylinsoong commented May 22, 2024