Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling publishNotReadyAddresses causes proxy to direct traffic to NotReady pods. #124914

Closed
the-redshift opened this issue May 16, 2024 · 6 comments
Labels
kind/documentation Categorizes issue or PR as related to documentation. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@the-redshift
Copy link

the-redshift commented May 16, 2024

What happened?

Documentation states that enabling publishNotReadyAddresses will cause DNS records to appear for pods, even if NotReady. Documentation does not state that enabling this flag will cause incoming traffic to be directed to pods that are NotReady.

What did you expect to happen?

I expected DNS records to be created, nothing beyond that.

How can we reproduce it (as minimally and precisely as possible)?

  1. Create HTTP Server deployment with initContainer -> sleep 30
  2. Create service for that deployment with publishNotReadyAddresses set to true
  3. Continiously call HTTP Server
  4. Scale HTTP Server to 10 Replicas
  5. Observe incoming errors

Anything else we need to know?

It can be easily fixed by changing condition for building topology from isReady to isServing && !isTerminating.

Kubernetes version

$ kubectl version
v1.29.1

Cloud provider

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@the-redshift the-redshift added the kind/bug Categorizes issue or PR as related to a bug. label May 16, 2024
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 16, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@the-redshift
Copy link
Author

/sig network

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 16, 2024
@aojea
Copy link
Member

aojea commented May 16, 2024

Documentation does not state that enabling this flag will cause incoming traffic to be directed to pods that are NotReady.

https://kubernetes.io/docs/concepts/services-networking/endpoint-slices/#ready

maybe is not clear, but is the way it works

/remove-kind bug
/kind documentation

@k8s-ci-robot k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. and removed kind/bug Categorizes issue or PR as related to a bug. labels May 16, 2024
@the-redshift
Copy link
Author

@aojea - could you elaborate why does it work the way it does? What's the benefit of routing traffic to a Pod that doesn't even has its containers up?

@aojea
Copy link
Member

aojea commented May 17, 2024

What's the benefit of routing traffic to a Pod that doesn't even has its containers up?

and the DNS? :)

You are focusing on the behaviors and is important to understand the internals, Services provide discovery via DNS and ClusterIP (L4 load balancers).

Services depend on the EndpointSlices, that represent the set of pods that belong to that Service and their state

These Services/EndpointSlices objects have a 1to1 relation and are used by the DNS implementation (commonly CoreDNS) and the Services implementations (commonly kube-proxy) , so both the DNS records and and proxy will do whatever the EndpointSlice says, if it says is ready then it will trust the object.

If we go to the API you have the explanation of this field

// publishNotReadyAddresses indicates that any agent which deals with endpoints for this
// Service should disregard any indications of ready/not-ready.
// The primary use case for setting this field is for a StatefulSet's Headless Service to
// propagate SRV DNS records for its Pods for the purpose of peer discovery.
// The Kubernetes controllers that generate Endpoints and EndpointSlice resources for
// Services interpret this to mean that all endpoints are considered "ready" even if the
// Pods themselves are not. Agents which consume only Kubernetes generated endpoints
// through the Endpoints or EndpointSlice resources can safely assume this behavior.
// +optional
PublishNotReadyAddresses bool

This is an opt-in feature to deal with very specific behaviors that has side effects as the one you are describing, so the user that wants to use it need to understand this (put a side we could have done differently, this is an v1 API and we need to respect it, changing behaviors at this point will be a backwards compatibility break)... in your description you describe a scenario that will fail as WAI, why any user will choose to do that?

@the-redshift
Copy link
Author

Hi @aojea,
I appreciate you for taking the time to provide more context in that regard.
I can now see how changing that field would break things.

That comment you've quoted indeed states things clear as day - couldn't find that within the docs though.

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/documentation Categorizes issue or PR as related to documentation. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests

3 participants