Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Active connection tracking in ebpf #32584

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

AwesomePatrol
Copy link
Contributor

@AwesomePatrol AwesomePatrol commented May 16, 2024

As described in cilium/design-cfps#31, this is part of #31752

Example configuration (in cilium-config) for KIND cluster:

  fixed-zone-mapping: zone-a=123,zone-b=129
  enable-active-connection-tracking: "true"

where topology labels were set on the nodes:

kubectl label node kind-control-plane topology.kubernetes.io/zone=zone-a
kubectl label node kind-worker topology.kubernetes.io/zone=zone-b

Zone mapping can be verified with cilium map get cilium_lb4_backends_v3:

Key   Value                        State   Error
16    ANY://10.244.1.74[zone-b]    sync
17    ANY://10.244.1.74[zone-b]    sync
18    ANY://10.244.1.202[zone-b]   sync
19    ANY://10.244.1.202[zone-b]   sync
13    ANY://192.168.11.2[zone-b]   sync
14    ANY://10.244.0.178[zone-a]   sync
15    ANY://10.244.0.178[zone-a]   sync

LRS accounting can be verified with cilium map get cilium_lb_act:

Key                       Value     State   Error
10.96.138.80:80[zone-b]   +72 -72
10.96.138.80:80[zone-a]   +28 -28
Add cilium_lb_act BPF map with counters of opened and closed connections

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 16, 2024
@AwesomePatrol AwesomePatrol force-pushed the active-connection-tracking-in-ebpf branch from e8f0244 to beecf87 Compare May 16, 2024 14:13
@maintainer-s-little-helper

This comment was marked as resolved.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label May 16, 2024
@AwesomePatrol AwesomePatrol force-pushed the active-connection-tracking-in-ebpf branch from c7a9677 to 58f239e Compare May 16, 2024 17:02
@maintainer-s-little-helper

This comment was marked as resolved.

@AwesomePatrol AwesomePatrol force-pushed the active-connection-tracking-in-ebpf branch from 58f239e to 6bd2077 Compare May 17, 2024 07:03
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label May 17, 2024
@AwesomePatrol AwesomePatrol force-pushed the active-connection-tracking-in-ebpf branch from 6bd2077 to 66e1d80 Compare May 17, 2024 07:12
@AwesomePatrol

This comment was marked as outdated.

@AwesomePatrol AwesomePatrol marked this pull request as ready for review May 17, 2024 08:08
@AwesomePatrol AwesomePatrol requested review from a team as code owners May 17, 2024 08:08
@AwesomePatrol

This comment was marked as outdated.

Copy link
Contributor

@lmb lmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loader changes are fine. As a rando reviewer, it's not at all clear to me what lrs stands for however.

@tommyp1ckles tommyp1ckles requested review from julianwiedmann and removed request for danehans May 20, 2024 04:20
@tommyp1ckles
Copy link
Contributor

@julianwiedmann would you be able to review this for @cilium/sig-datapath, I'm assigned sig-cli but my review would also count towards datapath - and I'm not familiar enough with this code to provide a thorough review.

@AwesomePatrol AwesomePatrol force-pushed the active-connection-tracking-in-ebpf branch from 76f4e16 to 27af564 Compare May 21, 2024 16:25
@AwesomePatrol
Copy link
Contributor Author

Loader changes are fine. As a rando reviewer, it's not at all clear to me what lrs stands for however.

LRS stands for Load Reporting Service from Envoy. Please see: https://www.envoyproxy.io/docs/envoy/latest/api-v3/service/load_stats/v3/lrs.proto. I am bad at naming stuff, so I am open to suggestions

@tommyp1ckles
Copy link
Contributor

@AwesomePatrol Does the TODO list in the second commits message represent work that's still to be done?

pkg/maps/lrs/lrs.go Outdated Show resolved Hide resolved
pkg/option/config.go Outdated Show resolved Hide resolved
@julianwiedmann
Copy link
Member

Based on the TODOs and the missing description in the second patch, this looks like it still needs some more work. Flipping to draft for now to get it off my queue, please flip back when you're ready! 🙏

@julianwiedmann julianwiedmann marked this pull request as draft May 22, 2024 10:32
@pchaigno pchaigno added the sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. label May 23, 2024
@AwesomePatrol AwesomePatrol marked this pull request as draft May 23, 2024 10:45
@AwesomePatrol AwesomePatrol force-pushed the active-connection-tracking-in-ebpf branch 2 times, most recently from 323a58c to 36a789c Compare May 23, 2024 12:08
@AwesomePatrol

This comment was marked as outdated.

@pchaigno
Copy link
Member

I removed TODOs from the commit message as to not confuse any other reviewers. What descriptions are missing?

I'm guessing Julian means the commit descriptions. They should describe what the commit is doing, but more importantly why and any subtleties to help reviewers understand what is going on. See examples in the bpf/ commit history. Of course, depending on the changes in the commit, that can be very short (if what and why are obvious with no subtleties). With hundreds of LoC, I'm going to venture your changes aren't trivial enough to have empty commits 😁

@AwesomePatrol AwesomePatrol force-pushed the active-connection-tracking-in-ebpf branch from 36a789c to 4571040 Compare May 27, 2024 06:06
@AwesomePatrol

This comment was marked as outdated.

@AwesomePatrol AwesomePatrol marked this pull request as ready for review May 27, 2024 07:21
pkg/maps/act/act.go Outdated Show resolved Hide resolved
pkg/maps/act/act.go Outdated Show resolved Hide resolved
pkg/maps/act/act.go Outdated Show resolved Hide resolved
Copy link
Contributor

@joamaki joamaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as I'm going on vacation and mostly had nits about unused methods.

@AwesomePatrol AwesomePatrol force-pushed the active-connection-tracking-in-ebpf branch from a0a23e9 to 31d3ec5 Compare May 31, 2024 07:00
@julianwiedmann julianwiedmann added dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. dont-merge/waiting-for-review labels Jun 3, 2024
@AwesomePatrol AwesomePatrol force-pushed the active-connection-tracking-in-ebpf branch from 31d3ec5 to 01bf7ab Compare June 3, 2024 08:47
@pchaigno pchaigno removed the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Jun 3, 2024
@pchaigno
Copy link
Member

pchaigno commented Jun 3, 2024

/test

@pchaigno pchaigno enabled auto-merge June 3, 2024 10:07
Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 thank you! I'd want to address the integration into the BPF codepath below.

Besides that, could you please squash the last two patches in the series into the initial series? I don't think we need to merge code that immediately gets refactored again.

Comment on lines 932 to 940
#ifdef ENABLE_ACTIVE_CONNECTION_TRACKING
_lb_act_conn_open(state->rev_nat_index, state->backend_id);
#endif
} else if (dir == CT_INGRESS || dir == CT_EGRESS) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Picking on this part in particular, as it's clearly misplaced. ct_create_fill_entry() is meant to fill the CT entry (from the info in the ct_state representation), but here we don't even touch the CT entry. So at best this part should just live in ct_create{4,6}() instead.

But as more general question - why are these changes intertwined with the actual CT lookup? If we call the feature Active connection counting instead, would it still fit into conntrack.h? We're not actually tracking any connection state, after all ...

My expectation would be to have this logic live here, where it would be included into the BPF program only once. What's missing to make that happen? Do we just need to pass back a few bits of connection-internal state to detect reopened / closed connections?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's clearly misplaced. ct_create_fill_entry() is meant to fill the CT entry

Agreed. I moved all calls to a single function, __ct_lookup.

If we call the feature Active connection counting instead, would it still fit into conntrack.h?

Yes, the decision to place it in conntrack.h is a technical requirement of what information is available in each function. Please, feel free to suggest other places that are suitable for this, if there are any.

My expectation would be to have this logic live here

This handles counting new connections only, where do you want to handle closing connections?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what information is available in each function. Please, feel free to suggest other places that are suitable for this, if there are any.

Let's use the ct_state then to provide this detailed level of information to callers :).

This handles counting new connections only, where do you want to handle closing connections?

How about the following?

ct_result = ct_lazy_lookup4(...);
if (ct_result == CT_NEW || ct_state->reopened)
	// account for opened connection
else if (ct_state->closing)
	// account for closed connection

This is needed to break a cyclic dependency on LB{4,6}_map from lb.h in
lrs.h (next patch) imported by conntrack.h.

Signed-off-by: Aleksander Mistewicz <amistewicz@google.com>
auto-merge was automatically disabled June 5, 2024 11:08

Head branch was pushed to by a user without write access

@AwesomePatrol AwesomePatrol force-pushed the active-connection-tracking-in-ebpf branch from 01bf7ab to 8aa2cc8 Compare June 5, 2024 11:08
Add new map - LB_ACT_MAP - behind ENABLE_ACTIVE_CONNECTION_TRACKING flag
with counters of opened and closed connections. Behavior of eBPF remains
completely unchanged when ENABLE_ACTIVE_CONNECTION_TRACKING flag is not
set.

When an entry, to conntrack table (with dir == CT_SERVICE) is created,
an entry in LB_ACT_MAP.opened is incremented by one. The same occurs
when a connection is reopened. When connection is closed (action ==
ACTION_CLOSE and dir == CT_SERVICE), the related LB_ACT_MAP.closed is
incremented by one.

LB_ACT_MAP is keyed by svc_id (also known as rev_nat_index) and zone.
Backend ID is used to fetch the backend definition from
LB{4,6}_BACKEND_MAP which also contains zone field. This field is
populated only when EndpointSlice contains a reference to zone in
FixedZoneMapping (so it is possible to convert between uint8 ID and
string).

Signed-off-by: Aleksander Mistewicz <amistewicz@google.com>
@AwesomePatrol AwesomePatrol force-pushed the active-connection-tracking-in-ebpf branch from 8aa2cc8 to 3cc35fe Compare June 5, 2024 11:09
Comment on lines +370 to +374
#ifdef ENABLE_ACTIVE_CONNECTION_TRACKING
if (dir == CT_SERVICE) {
if (ct_state)
_lb_act_conn_open(ct_state->rev_nat_index, ct_state->backend_id);
else if (entry)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this would happen too early in the code flow now. For a new connection, we haven't selected a backend yet.

@julianwiedmann
Copy link
Member

Note that you'll probably need a rebase to pick up the changes from #32653.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/conntrack release-note/major This PR introduces major new functionality to Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants