About

This creates an example kubernetes cluster hosted in the AWS Elastic Kubernetes Service (EKS) using a Terramate project with Terraform.

This will:

Create an Elastic Kubernetes Service (EKS)-based Kubernetes cluster.
- Enable the VPC CNI cluster add-on.
- Enable the EBS CSI cluster add-on.
- Enable the AWS Distro for OpenTelemetry (ADOT) Operator add-on.
- Create the AWS Distro for OpenTelemetry (ADOT) Collector Deployment and adot-collector Service.
  - Forwarding OpenTelemetry telemetry signals to Amazon CloudWatch.
- Install trust-manager.
  - Manages TLS CA certificate bundles.
- Install reloader.
  - Reloads (restarts) pods when their configmaps or secrets change.
Create the Elastic Container Registry (ECR) repositories declared on the source_images global variable, and upload the corresponding container images.
Create a public DNS Zone using Amazon Route 53.
- Note that you need to configure the parent DNS Zone to delegate to this DNS Zone name servers.
- Use external-dns to create the Ingress DNS Resource Records in the DNS Zone.
Create an example AWS DocumentDB.
Demonstrate how to automatically deploy the kubernetes-hello workload.
- Show its environment variables.
- Show its tokens, secrets, and configs (config maps).
- Show its pod name and namespace.
- Show the containers running inside its pod.
- Show its memory limits.
- Show its cgroups.
- Expose as a Kubernetes Ingress.
  - Use a sub-domain in the DNS Zone.
  - Use a public Certificate managed by Amazon Certificate Manager and issued by the public Amazon Root CA.
  - Note that this results in the creation of an EC2 Application Load Balancer (ALB).
- Use Role and RoleBinding.
- Use ConfigMap.
- Use Secret.
- Use ServiceAccount.
- Use Service Account token volume projection (a JSON Web Token and OpenID Connect (OIDC) ID Token) for the https://example.com audience.
Demonstrate how to automatically deploy the otel-example workload.
- Expose as a Kubernetes Ingress Service.
  - Use a sub-domain in the DNS Zone.
  - Use a public Certificate managed by Amazon Certificate Manager and issued by the public Amazon Root CA.
  - Note that this results in the creation of an EC2 Application Load Balancer (ALB).
- Send OpenTelemetry telemetry signals to the adot-collector service.
  - Send the logs telemetry signal to the Amazon CloudWatch Logs service.
Demonstrate how to manually deploy a stateful application.
- Deploy the etcd key-value store.
  - Use a StatefulSet Workload.
  - Use a PersistentVolumeClaim Persistent Volume.
- Deploy the hello-etcd example application.
  - Use the etcd key-value store.
Demonstrate how to automatically deploy the docdb-example workload.
- Use the deployed example AWS DocumentDB.
- Use a trust-manager managed CA certificates volume that includes the Amazon RDS CA certificates (i.e. global-bundle.pem).

The main components are:

For equivalent example see:

terraform-aws-eks-example

Usage (on a Ubuntu Desktop)

Install the dependencies:

Set the AWS Account credentials using SSO:

# set the environment variables to use a specific profile.
# e.g. use the pattern <aws-sso-session-name>-<aws-account-name>-<aws-account-role>-<aws-account-id>
export AWS_PROFILE=example-dev-AdministratorAccess-123456
unset AWS_ACCESS_KEY_ID
unset AWS_SECRET_ACCESS_KEY
unset AWS_DEFAULT_REGION
# set the account credentials.
# see https://docs.aws.amazon.com/cli/latest/userguide/sso-configure-profile-token.html#sso-configure-profile-token-auto-sso
aws configure sso
# dump the configured profile and sso-session.
cat ~/.aws/config
# show the user, user amazon resource name (arn), and the account id, of the
# profile set in the AWS_PROFILE environment variable.
aws sts get-caller-identity

Or, set the AWS Account credentials using an Access Key:

# set the account credentials.
# NB get these from your aws account iam console.
#    see Managing access keys (console) at
#        https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_CreateAccessKey
export AWS_ACCESS_KEY_ID='TODO'
export AWS_SECRET_ACCESS_KEY='TODO'
unset AWS_PROFILE
# set the default region.
export AWS_DEFAULT_REGION='eu-west-1'
# show the user, user amazon resource name (arn), and the account id.
aws sts get-caller-identity

Review the config.tm.hcl file.

At least, modify the ingress_domain global to a DNS Zone that is a child of a DNS Zone that you control. The ingress_domain DNS Zone will be created by this example. The DNS Zone will be hosted in the Amazon Route 53 DNS name servers.

Generate the project configuration:

terramate generate

Commit the changes to the git repository.

Initialize the project:

terramate run terraform init -lockfile=readonly
terramate run terraform validate

Launch the example:

terramate run terraform apply

The first launch will fail while trying to create the aws_acm_certificate resource. You must delegate the DNS Zone, as described bellow, and then launch the example again to finish the provisioning.

Show the ingress domain and the ingress DNS Zone name servers:

ingress_domain="$(terramate run -C stacks/eks-aws-load-balancer-controller \
  terraform output -raw ingress_domain)"
ingress_domain_name_servers="$(terramate run -C stacks/eks-aws-load-balancer-controller \
  terraform output -json ingress_domain_name_servers \
  | jq -r '.[]')"
printf "ingress_domain:\n\n$ingress_domain\n\n"
printf "ingress_domain_name_servers:\n\n$ingress_domain_name_servers\n\n"

Using your parent ingress domain DNS Registrar or DNS Hosting provider, delegate the ingress_domain DNS Zone to the returned ingress_domain_name_servers DNS name servers. For example, at the parent DNS Zone, add:

example NS ns-123.awsdns-11.com.
example NS ns-321.awsdns-34.net.
example NS ns-456.awsdns-56.org.
example NS ns-948.awsdns-65.co.uk.

Verify the delegation:

ingress_domain="$(terramate run -C stacks/eks-aws-load-balancer-controller \
  terraform output -raw ingress_domain)"
ingress_domain_name_server="$(terramate run -C stacks/eks-aws-load-balancer-controller \
  terraform output -json ingress_domain_name_servers | jq -r '.[0]')"
dig ns "$ingress_domain" "@$ingress_domain_name_server" # verify with amazon route 53 dns.
dig ns "$ingress_domain"                                # verify with your local resolver.

Launch the example again, this time, no error is expected:

terramate run terraform apply

Show the terraform state:

terramate run terraform state list
terramate run terraform show

Show the OpenID Connect Discovery Document (aka OpenID Connect Configuration):

wget -qO- "$(
  terramate run -C stacks/eks-workloads \
    terraform output -raw cluster_oidc_configuration_url)" \
  | jq

NB The Kubernetes Service Account tokens are JSON Web Tokens (JWT) signed by the cluster OIDC provider. They can be validated using the metadata at the cluster_oidc_configuration_url endpoint. You can view a Service Account token at the installed kubernetes-hello service endpoint.

Get the cluster kubeconfig.yml configuration file:

export KUBECONFIG="$PWD/kubeconfig.yml"
rm "$KUBECONFIG"
aws eks update-kubeconfig \
  --region "$(terramate run -C stacks/eks-workloads terraform output -raw region)" \
  --name "$(terramate run -C stacks/eks-workloads terraform output -raw cluster_name)"

Access the EKS cluster:

export KUBECONFIG="$PWD/kubeconfig.yml"
kubectl cluster-info
kubectl get nodes -o wide
kubectl get ingressclass
kubectl get storageclass
# NB notice that the ReclaimPolicy is Delete. this means that, when we delete a
#    PersistentVolumeClaim or PersistentVolume, the volume will be deleted from
#    the AWS account.
kubectl describe storageclass/gp2

List the installed Helm chart releases:

helm list --all-namespaces

Show a helm release status, the user supplied values, all the values, and the chart managed kubernetes resources:

helm -n external-dns status external-dns
helm -n external-dns get values external-dns
helm -n external-dns get values external-dns --all
helm -n external-dns get manifest external-dns

Show the adot OpenTelemetryCollector instance:

kubectl get -n opentelemetry-operator-system opentelemetrycollector/adot -o yaml

Access the otel-example ClusterIP Service from a kubectl port-forward local port:

kubectl port-forward service/otel-example 6789:80 &
sleep 3 && printf '\n\n'
wget -qO- http://localhost:6789/quote | jq
kill %1 && sleep 3

Wait for the otel-example Ingress to be available:

otel_example_host="$(kubectl get ingress/otel-example -o jsonpath='{.spec.rules[0].host}')"
otel_example_url="https://$otel_example_host"
echo "otel-example ingress url: $otel_example_url"
# wait for the host to resolve at the first route 53 name server.
ingress_domain_name_server="$(terramate run -C stacks/eks-aws-load-balancer-controller terraform output -json ingress_domain_name_servers | jq -r '.[0]')"
while [ -z "$(dig +short "$otel_example_host" "@$ingress_domain_name_server")" ]; do sleep 5; done && dig "$otel_example_host" "@$ingress_domain_name_server"
# wait for the host to resolve at the public internet (from the viewpoint
# of our local dns resolver).
while [ -z "$(dig +short "$otel_example_host")" ]; do sleep 5; done && dig "$otel_example_host"

Access the otel-example Ingress from the Internet:

wget -qO- "$otel_example_url/quote" | jq

Audit the otel-example Ingress TLS implementation:

otel_example_host="$(kubectl get ingress/otel-example -o jsonpath='{.spec.rules[0].host}')"
echo "otel-example ingress host: $otel_example_host"
xdg-open https://www.ssllabs.com/ssltest/

Access the kubernetes-hello ClusterIP Service from a kubectl port-forward local port:

kubectl port-forward service/kubernetes-hello 6789:80 &
sleep 3 && printf '\n\n'
wget -qO- http://localhost:6789
kill %1 && sleep 3

Access the kubernetes-hello Ingress from the Internet:

kubernetes_hello_host="$(kubectl get ingress/kubernetes-hello -o jsonpath='{.spec.rules[0].host}')"
kubernetes_hello_url="https://$kubernetes_hello_host"
echo "kubernetes-hello ingress url: $kubernetes_hello_url"
# wait for the host to resolve at the first route 53 name server.
ingress_domain_name_server="$(terramate run -C stacks/eks-aws-load-balancer-controller terraform output -json ingress_domain_name_servers | jq -r '.[0]')"
while [ -z "$(dig +short "$kubernetes_hello_host" "@$ingress_domain_name_server")" ]; do sleep 5; done && dig "$kubernetes_hello_host" "@$ingress_domain_name_server"
# wait for the host to resolve at the public internet (from the viewpoint
# of our local dns resolver).
while [ -z "$(dig +short "$kubernetes_hello_host")" ]; do sleep 5; done && dig "$kubernetes_hello_host"
# finally, access the service.
wget -qO- "$kubernetes_hello_url"

Audit the kubernetes-example Ingress TLS implementation:

kubernetes_hello_host="$(kubectl get ingress/kubernetes-hello -o jsonpath='{.spec.rules[0].host}')"
echo "kubernetes-hello ingress host: $kubernetes_hello_host"
xdg-open https://www.ssllabs.com/ssltest/

Deploy the example hello-etcd stateful application:

install -d tmp/hello-etcd
pushd tmp/hello-etcd
wget -qO- https://raw.githubusercontent.com/rgl/hello-etcd/v0.0.2/manifest.yml \
  | perl -pe 's,(storageClassName:).+,$1 gp2,g' \
  | perl -pe 's,(storage:).+,$1 100Mi,g' \
  > manifest.yml
kubectl apply -f manifest.yml
kubectl rollout status deployment hello-etcd
kubectl rollout status statefulset hello-etcd-etcd
kubectl get service,statefulset,pod,pvc,pv,sc

Access the hello-etcd service from a kubectl port-forward local port:

kubectl port-forward service/hello-etcd 6789:web &
sleep 3 && printf '\n\n'
wget -qO- http://localhost:6789 # Hello World #1!
wget -qO- http://localhost:6789 # Hello World #2!
wget -qO- http://localhost:6789 # Hello World #3!

Delete the etcd pod:

# NB the used gp2 StorageClass is configured with ReclaimPolicy set to Delete.
#    this means that, when we delete the application PersistentVolumeClaim, the
#    volume will be deleted from the AWS account. this also means that, to play
#    with this, we cannot delete all the application resource. we have to keep
#    the persistent volume around by only deleting the etcd pod.
# NB although we delete the pod, the StatefulSet will create a fresh pod to
#    replace it. using the same persistent volume as the old one.
kubectl delete pod/hello-etcd-etcd-0
kubectl get pod/hello-etcd-etcd-0 # NB its age should be in the seconds range.
kubectl get pvc,pv

Access the application, and notice that the counter continues after the previously returned value, which means that although the etcd instance is different, it picked up the same persistent volume:

wget -qO- http://localhost:6789 # Hello World #4!
wget -qO- http://localhost:6789 # Hello World #5!
wget -qO- http://localhost:6789 # Hello World #6!

Delete everything:

kubectl delete -f manifest.yml
kill %1 # kill the kubectl port-forward background command execution.
# NB the persistent volume will linger for a bit, until it will be eventually
#    reclaimed and deleted (because the StorageClass is configured with
#    ReclaimPolicy set to Delete).
kubectl get pvc,pv
# force the persistent volume deletion.
# NB if you do not do this (or wait until the persistent volume is actually
#    deleted), the associated AWS EBS volume we be left created in your AWS
#    account, and you have to manually delete it from there.
kubectl delete pvc/etcd-data-hello-etcd-etcd-0
# NB you should wait until its actually deleted.
kubectl get pvc,pv
popd

Access the docdb-example ClusterIP Service from a kubectl port-forward local port:

kubectl port-forward service/docdb-example 6789:80 &
sleep 3 && printf '\n\n'
wget -qO- http://localhost:6789
kill %1 && sleep 3

Access the docdb-example Ingress from the Internet:

docdb_example_host="$(kubectl get ingress/docdb-example -o jsonpath='{.spec.rules[0].host}')"
docdb_example_url="https://$docdb_example_host"
echo "docdb-example ingress url: $docdb_example_url"
# wait for the host to resolve at the first route 53 name server.
ingress_domain_name_server="$(terramate run -C stacks/eks-aws-load-balancer-controller terraform output -json ingress_domain_name_servers | jq -r '.[0]')"
while [ -z "$(dig +short "$docdb_example_host" "@$ingress_domain_name_server")" ]; do sleep 5; done && dig "$docdb_example_host" "@$ingress_domain_name_server"
# wait for the host to resolve at the public internet (from the viewpoint
# of our local dns resolver).
while [ -z "$(dig +short "$docdb_example_host")" ]; do sleep 5; done && dig "$docdb_example_host"
# finally, access the service.
wget -qO- "$docdb_example_url"

Verify the trusted CA certificates, this should include the Amazon RDS CA certificates (e.g. Amazon RDS eu-west-1 Root CA RSA2048 G1):

kubectl exec --stdin deployment/docdb-example -- bash <<'EOF'
openssl crl2pkcs7 -nocrl -certfile /etc/ssl/certs/ca-certificates.crt \
  | openssl pkcs7 -print_certs -text -noout
EOF

List all the used container images:

# see https://kubernetes.io/docs/tasks/access-application-cluster/list-all-running-container-images/
kubectl get pods --all-namespaces \
  -o jsonpath="{.items[*].spec['initContainers','containers'][*].image}" \
  | tr -s '[[:space:]]' '\n' \
  | sort --unique

Log in the container registry:

NB You are logging in at the registry level. You are not logging in at the repository level.

aws ecr get-login-password \
  --region "$(terramate run -C stacks/ecr terraform output -raw registry_region)" \
  | docker login \
      --username AWS \
      --password-stdin \
      "$(terramate run -C stacks/ecr terraform output -raw registry_domain)"

NB This saves the credentials in the ~/.docker/config.json local file.

Inspect the created example container image:

image="$(terramate run -C stacks/ecr terraform output -json images | jq -r '."otel-example"')"
echo "image: $image"
crane manifest "$image" | jq .

Download the created example container image from the created container image repository, and execute it locally:

docker run --rm "$image"

Delete the local copy of the created container image:

docker rmi "$image"

Log out the container registry:

docker logout \
  "$(terramate run -C stacks/ecr terraform output -raw registry_domain)"

Delete the example image resource:

terramate run -C stacks/ecr \
  terraform destroy -target='terraform_data.ecr_image["otel-example"]'

At the ECR AWS Management Console, verify that the example image no longer exists (actually, it's the image index/tag that no longer exists).

Do an terraform apply to verify that it recreates the example image:

terramate run terraform apply

Destroy the example:

terramate run --reverse terraform destroy

NB For some unknown reason, terraform shows the following Warning message. If you known how to fix it, please let me known!

╷
│ Warning: EC2 Default Network ACL (acl-004fd900909c20039) not deleted, removing from state
│
│
╵

List this repository dependencies (and which have newer versions):

GITHUB_COM_TOKEN='YOUR_GITHUB_PERSONAL_TOKEN' ./renovate.sh

Caveats

After terraform destroy, the following resources will still remain in AWS:
- KMS Kubernetes cluster encryption key.
  - It will be automatically deleted after 30 days (the default value of the kms_key_deletion_window_in_days eks module property).
- CloudWatch log groups.
  - These will be automatically deleted after 90 days (the default value of the cloudwatch_log_group_retention_in_days eks module property)

When running terraform destroy, the current user (aka the cluster creator) is eagerly removed from the cluster, which means, when there are problems, we are not able to continue or troubleshoot without manually granting our role the AmazonEKSClusterAdminPolicy access policy. For example, when using SSO roles, we need to add an IAM access entry like:

Property	Value
IAM principal ARN	`arn:aws:iam::123456:role/aws-reserved/sso.amazonaws.com/eu-west-1/AWSReservedSSO_AdministratorAccess_0000000000000000`
Type	`Standard`
Username	`arn:aws:sts::123456:assumed-role/AWSReservedSSO_AdministratorAccess_0000000000000000/{{SessionName}}`
Access policies	`AmazonEKSClusterAdminPolicy`

You can list the current access entries with:

aws eks list-access-entries \
  --cluster-name "$(
    terramate run -C stacks/eks-workloads \
      terraform output -raw cluster_name)"

Which should include the above IAM principal ARN value.

Notes

Its not possible to create multiple container image registries.
- A single registry is automatically created when the AWS Account is created.
- You have to create a separate repository for each of your container images.
  - A repository name can include several path segments (e.g. hello/world).
Terramate does not support flowing Terraform outputs into other Terraform program input variables. Instead, Terraform programs should use Terraform data sources to find the resources that are already created. Those resources should be found by their metadata (e.g. name) defined in a Terramate global.
OpenID Connect Provider for EKS (aka Enable IAM Roles for Service Accounts (IRSA)) is enabled.
- a aws_iam_openid_connect_provider resource is created.
The EKS nodes virtual machines boot from a customizable Amazon Machine Image (AMI).
- This example uses the Amazon Linux 2 AMI.
- The official AMIs source code is available at the Amazon EKS AMI awslabs/amazon-eks-ami repository.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github/workflows		.github/workflows
.vscode		.vscode
stacks		stacks
.gitignore		.gitignore
README.md		README.md
components.png		components.png
components.uxf		components.uxf
config.tm.hcl		config.tm.hcl
renovate.json5		renovate.json5
renovate.sh		renovate.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

.vscode

.vscode

stacks

stacks

.gitignore

.gitignore

README.md

README.md

components.png

components.png

components.uxf

components.uxf

config.tm.hcl

config.tm.hcl

renovate.json5

renovate.json5

renovate.sh

renovate.sh

Repository files navigation

About

Usage (on a Ubuntu Desktop)

Caveats

Notes

References

About

Releases

Packages

Languages

rgl/terramate-aws-eks-example

Folders and files

Latest commit

History

Repository files navigation

About

Usage (on a Ubuntu Desktop)

Caveats

Notes

References

About

Topics

Resources

Stars

Watchers

Forks

Languages