Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roles with paths do not work when the path is included in their ARN in the aws-auth configmap #268

Open
jceresini opened this issue Sep 12, 2019 · 49 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@jceresini
Copy link

I have a role with an ARN that looks like this: arn:aws:iam::XXXXXXXXXXXX:role/gitlab-ci/gitlab-runner. My aws-auth configmap was as follow:

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: arn:aws:iam::XXXXXXXXXXXX:role/EKSWorkerNode
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes
    - rolearn: arn:aws:iam::XXXXXXXXXXXX:role/EKSServiceWorker
      username: kubernetes-admin
      groups:
        - system:masters
    - rolearn: arn:aws:iam::XXXXXXXXXXXX:role/gitlab-ci/gitlab-runner
      username: gitlab-admin
      groups:
        - system:masters

I repeated got unauthorized errors from the cluster until I updated the rolearn to arn:aws:iam::XXXXXXXXXXXX:role/gitlab-runner. After that change my access worked as expected.

If it makes a difference, I'm using assume-role on our gitlab-runner, and using aws eks update-kubeconfig --region=us-east-1 --name=my-cluster to get kubectl configured.

@beetahnator
Copy link

Running into the same issue here on EKS 1.14.6.

@casey-robertson
Copy link

casey-robertson commented Sep 19, 2019

Ahh....this explains our issue when testing with AWS SSO-created roles too. See the issue referenced in this document. This has been a problem for a quite a while (at least 14 months).

https://aws.amazon.com/blogs/opensource/integrating-ldap-ad-users-kubernetes-rbac-aws-iam-authenticator-project/

Pertinent passage:
For the rolearn be sure to remove the /aws-reserved/sso.amazonaws.com/ from the rolearn url, otherwise the arn will not be able to authorize as a valid user.

When we stumbled across this I assumed it was something about the SSO role but based on this issue it's probably the path.

@rlangfordBV
Copy link

rlangfordBV commented Oct 15, 2019

We don't use EKS, but have had this issue with 1.12 and 1.14.6 with aws-iam-authenticator. If you edit the configmap to remove the /gitlab-ci portion, and restart the pods, you will likely find that access works.

My co-worker and I suspect that is because of the way that sts returns output for assumed role session arns.

We have a role arn:aws:iam::000000000000:/role/bosun/bosun_deploy that we use for cluster administration of our kops created clusters.

If you assume the role, and run aws sts get-caller-identity, we get the following:

{
    "UserId": "<redacted-AKID>:<redacted-userid>",
    "Account": "000000000000",
    "Arn": "arn:aws:sts::000000000000:assumed-role/bosun_deploy/<redacted-userid>"
}

I wish this was fixed, as of now, I'm not sure what to do other than creating a role with a shortened path and switch to it.

I suppose one can also just edit the role that gets input to the configmap itself.

@jceresini
Copy link
Author

Yeah, removing the path is how I identified it as the cause of the issue.

The field name is rolearn and the path is part of the ARN for a given role.

I opened this so others running into the issue might find it, and also because I think something needs to address it, whether its documentation (though I don't think docs are sufficient without changing the name of the field in the configmap) or a bugfix

@nckturner nckturner added kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Oct 23, 2019
@jangrewe
Copy link

We just discovered the same, by using

$ curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/
$ TOKEN=$(aws-iam-authenticator token -i fooCluster --token-only)
$ aws-iam-authenticator verify -i fooCluster -t ${TOKEN}

and comparing the roles that the Pod uses (containing a path) vs. the one that are set in the token (path missing).

For now our workaround is also adding a role mapping to an IAM Role that "doesn't actually exist".

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 26, 2020
@jceresini
Copy link
Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 27, 2020
@arhea
Copy link

arhea commented Apr 20, 2020

I was able to reproduce this issue. I created two roles: K8s-Admin and K8s-Admin-WithPath, I created the roles using the following commands:

  aws iam create-role \
  --role-name K8s-Admin \
  --description "Kubernetes administrator role (for AWS IAM Authenticator for Kubernetes)." \
  --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"AWS":"arn:aws:iam::<account id>:root"},"Action":"sts:AssumeRole","Condition":{}}]}' \
  --output text \
  --query 'Role.Arn'

  aws iam create-role \
  --role-name K8s-Admin-WithPath \
  --path "/kubernetes/" \
  --description "Kubernetes administrator role (for AWS IAM Authenticator for Kubernetes)." \
  --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"AWS":"arn:aws:iam::<account id>:root"},"Action":"sts:AssumeRole","Condition":{}}]}' \
  --output text \
  --query 'Role.Arn'

Mapped them to the cluster with:

eksctl create iamidentitymapping --cluster basic-demo --arn arn:aws:iam::<account id>:role/K8s-Admin--group system:masters --username iam-admin

eksctl create iamidentitymapping --cluster basic-demo --arn arn:aws:iam::<accound id>:role/kubernetes/K8s-Admin-WithPath --group system:masters --username iam-admin-withpath

Then attached the AWS ReadOnly policy to both roles. Next, I created two AWS CLI profiles sandbox-k8s-admin and sandbox-k8s-admin-withpath specifying the rolearn options to trigger an assume role. After creating the roles, I updated my local kubeconfig:

eksctl utils write-kubeconfig --cluster=basic-demo --profile=sandbox-k8s-admin --set-kubeconfig-context --region=us-east-2

kubectl get nodes
# returned list of nodes, expected

Then switched over to the role with the path

eksctl utils write-kubeconfig --cluster=basic-demo --profile=sandbox-k8s-admin-withpath --set-kubeconfig-context --region=us-east-2

kubectl get nodes
# error: You must be logged in to the server (Unauthorized)

@Comradin
Copy link

Comradin commented May 4, 2020

Any news on this? This is quite a weird behavior and hard to detect as an error.

@JeremyProffitt
Copy link

We are seeing this issue as well, any word on resolution?

@gaochundong
Copy link

+1

@sidewinder12s
Copy link

I've enjoyed my 6+ hours lost to this.

@fred-vogt
Copy link

fred-vogt commented Aug 12, 2020

terraform workaround:

join("/", values(regex("(?P<prefix>arn:aws:iam::[0-9]+:role)/[^/]+/(?P<role>.*)", <role-arn>)))

I'm not sure this is still needed with v0.5.1.

@deadanon
Copy link

terraform workaround:

join("/", values(regex("(?P<prefix>arn:aws:iam::[0-9]+:role)/[^/]+/(?P<role>.*)", <role-arn>)))

I'm not sure this is still needed with v0.5.1.

This was a very easy work-around for us, thank you

@nxtof
Copy link

nxtof commented Nov 2, 2020

Any update? Seems that this is still an issue.

@othmane399
Copy link

Hello, I'm having the same issue with aws-iam-authenticator version 0.5.2

@mattjamesaus
Copy link

This caught me too today what a PIA indeed.. Can confirm that instance role with a path will not be able to auth against the cluster - hopefully this gets fixed soon.

Jan 28 05:05:01 ip-10-31-8-66.us-west-1.compute.internal kubelet[3907]: E0128 05:05:01.251418    3907 kubelet_node_status.go:92] Unable to register node "ip-10-31-8-66.us-west-1.compute.internal" with API server: Unauthorized

Adding this in the hope it saves someone else a few hours of their life.

@fred-vogt
Copy link

fred-vogt commented Jan 29, 2021

A fix could be to have iam:GetRole permissions and "lookup" the full role info by "short" role name.

https://awscli.amazonaws.com/v2/documentation/api/latest/reference/iam/get-role.html

I could create a sample PR if that helps.

@billinghamj
Copy link

Between #333, #268, #153 and #98 - would be good to get duplicates closed and it tracked in one place

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 30, 2022
@bgshacklett
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 30, 2022
@mconigliaro
Copy link

mconigliaro commented Aug 3, 2022

terraform workaround:

join("/", values(regex("(?P<prefix>arn:aws:iam::[0-9]+:role)/[^/]+/(?P<role>.*)", <role-arn>)))

This didn't work for us on ARNs that contain nested "directories" in the path (e.g. arn:aws:iam::123456789012:role/with/nested/directories). Here's what did work:

replace(<role-arn>, "//.*//", "/")

@trungdinh98
Copy link

I have a role with an ARN that looks like this: arn:aws:iam::XXXXXXXXXXXX:role/gitlab-ci/gitlab-runner. My aws-auth configmap was as follow:

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: arn:aws:iam::XXXXXXXXXXXX:role/EKSWorkerNode
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes
    - rolearn: arn:aws:iam::XXXXXXXXXXXX:role/EKSServiceWorker
      username: kubernetes-admin
      groups:
        - system:masters
    - rolearn: arn:aws:iam::XXXXXXXXXXXX:role/gitlab-ci/gitlab-runner
      username: gitlab-admin
      groups:
        - system:masters

I repeated got unauthorized errors from the cluster until I updated the rolearn to arn:aws:iam::XXXXXXXXXXXX:role/gitlab-runner. After that change my access worked as expected.

If it makes a difference, I'm using assume-role on our gitlab-runner, and using aws eks update-kubeconfig --region=us-east-1 --name=my-cluster to get kubectl configured.

Excuse me, Can U show me what is username: gitlab-admin ? Thanks

@alhucave
Copy link

Mismo problema ... muchas gracias

@gillg
Copy link

gillg commented Jan 19, 2023

@nckturner as you added the tag "important-soon" more than 2 years ago, what is the reason to have this issue still present ?
Moreover I think @gothrek22 found the root cause

role := strings.Join(parts[1:len(parts)-1], "/")
, but the code explains nothing so have you an explaination on your side ?

If using paths in IAM is a "bad practice" it should be said, but if not this bug could be a real blocker if you have two roles with the same name in different paths... And it also makes any automation very tricky.

@sftim
Copy link

sftim commented Jan 20, 2023

This is an important bug to fix. However, so far no contributor has provided a fix that has been merged.

Anyone who is willing to follow the Kubernetes code of conduct is welcome to work on this. Related to that: if you'd like (ie, if anyone would like) this bug fixed, and are willing to offer a bounty, that offer might help move things forward.


If people want to highlight this issue to the vendor, AWS, then please visit aws/containers-roadmap#573 and add a thumbs-up reaction.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 20, 2023
@bgshacklett
Copy link

/remove-lifecycle stale

@mwgamble
Copy link

mwgamble commented Oct 9, 2023

My team just lost a few hours to this issue today. It'd be great to see it resolved.

@tylersatre
Copy link

My team just lost a few hours to this issue today. It'd be great to see it resolved.

Same thing happened to my team today...

@bgshacklett
Copy link

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Oct 27, 2023
@MalibuKoKo
Copy link

Any update ? using paths in IAM is a "bad practice" or not ?

@emcay
Copy link

emcay commented Jan 10, 2024

Lost two days on this probably should be fixed....

@bgshacklett
Copy link

The following PR was merged, and appears to address the problem: #670, though it's unclear to me what the current effective status is, as I don't see any documentation updated as part of the pull request.

@andrew-aiken
Copy link

Looks like it was merged it but there has not been a release since then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet