Kubernetes with kubeadm with Equinix Metal

This is a guide to deploying Kubernetes on Equinix Metal, using kubeadm via userdata and cloud-config. This guide is accompanied by a repository that includes automation around this userdata with Terraform and Pulumi.

Warning

This guide relies ONLY on userdata to provision a highly available kubeadm Kubernetes cluster. It does so by using Equinix Metal's customdata fields on the API to pass through secret information, namely the certificates used by Kubernetes. Anyone with access to these certificates has root on your cluster. As such, you MUST disable access to the metadata server within your Kubernetes cluster. You can do this via network policies, or use the cloud-config supplied in the repositories to block the IP with iptables.

The User Data

The user data is provided as multi-part cloud-config. You can pick and choose the components you require, allowing you to configure your own CRI, CSI, etc.

The different components described below are available in their canonical form at https://github.com/equinix-labs/kubernetes-cloud-init.

Step 1. Waiting for BGP Metadata

We are able to offer a highly available control plane via BGP. BGP is enabled per-device after the device is created. As such, there's a slightly race-condition as we need to wait for the BGP peering information to be available.

#!/usr/bin/env sh
until jq -r -e ".bgp_neighbors" /tmp/metadata.json
do
  sleep 10
  curl -o /tmp/metadata.json -fsSL https://metadata.platformequinix.com/metadata
done

Step 2. Downloading Metadata

As this "zero-touch" user data provisioning uses only our metadata APIs, we're going to cache it to the tmp directory for subsequent parts to consume. Here, we cache the complete metadata to /tmp/metadata.json, but also extract the customdata to /tmp/customdata.json.

#!/usr/bin/env sh
set -e

curl -o /tmp/metadata.json -fsSL https://metadata.platformequinix.com/metadata
jq -r ".customdata" /tmp/metadata.json > /tmp/customdata.json

Step 3. Add BGP Routes

In-order for kube-vip to advertise our BGP routes correctly, we need to add routing information for the peer IPs.

#!/usr/bin/env sh
set -e

GATEWAY_IP=$(jq -r ".network.addresses[] | select(.public == false) | .gateway" /tmp/metadata.json)
ip route add 169.254.255.1 via $GATEWAY_IP
ip route add 169.254.255.2 via $GATEWAY_IP

Step 4. Package Repositories

In-order to install the Kubernetes components, we're going to use the official Debian repositories provided by Google Cloud.

#!/usr/bin/env bash
set -e

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list

apt-get update -y
DEBIAN_FRONTEND=noninteractive apt-get install -y apt-transport-https ca-certificates

Step 5. Container Runtime

Currently, we only provide cloud-config for Containerd. However, you can use this step to install Docker, CRIO, or any other container runtime.

#!/usr/bin/env bash
set -e

cat << EOF > /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF

modprobe overlay
modprobe br_netfilter

apt-get install -y  socat ebtables cloud-utils prips containerd

systemctl daemon-reload
systemctl enable containerd
systemctl start containerd

Step 6. Kube-vip for Control Plane

We're going to deploy kube-vip via a static pod manifest to provide BGP advertisements for our control plane EIP. Kube-vip provides a manifest function that generates this manifest, so we're going to use ctr from Containerd to run this.

Kubernetes static pod manifest live in /etc/kubernetes/manifests and we've not actually installed any Kubernetes components yet, so we also need to ensure this directory exists.

#!/usr/bin/env sh
set -e

CONTROL_PLANE_IP=$(jq -r ".controlPlaneIp" /tmp/customdata.json)
METAL_AUTH_TOKEN=$(jq -r ".metalAuthToken" /tmp/customdata.json)
METAL_PROJECT_ID=$(jq -r ".metalProjectId" /tmp/customdata.json)

mkdir -p /etc/kubernetes/manifests

ctr image pull docker.io/plndr/kube-vip:0.3.1
ctr run \
    --rm \
    --net-host \
    docker.io/plndr/kube-vip:0.3.1 \
    vip /kube-vip manifest pod \
        --interface lo \
        --vip $CONTROL_PLANE_IP \
        --controlplane \
        --metal \
        --metalKey $METAL_AUTH_TOKEN \
        --metalProjectID $METAL_PROJECT_ID \
        --bgp \
        | sudo tee /etc/kubernetes/manifests/vip.yaml

Step 7. Kubernetes Prerequisites

Kubernetes requires the host to perform a a few actions before we install the components and ready our control plane. These are covered in more details in the Kubernetes documentation.

#!/usr/bin/env bash
set -e

sed -ri '/\sswap\s/s/^#?/#/' /etc/fstab
swapoff -a
mount -a

cat << EOF > /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables  = 1
net.ipv4.ip_forward                 = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF
sysctl --system

Step 8. Kubernetes Packages

Finally, we can install the Kubernetes components. We're also going to "mark" them as "hold". This will prevent them from being upgraded during apt commands. Upgrading the Kubernetes components needs to be a very explicit process, and we'll document cluster upgrades in a subsequent guide.

#!/usr/bin/env bash
set -e

KUBERNETES_VERSION=$(jq -r ".kubernetesVersion" /tmp/customdata.json)

TRIMMED_KUBERNETES_VERSION=$(echo ${KUBERNETES_VERSION} | sed 's/\./\\./g' | sed 's/^v//')
RESOLVED_KUBERNETES_VERSION=$(apt-cache policy kubelet | awk -v VERSION=${TRIMMED_KUBERNETES_VERSION} '$1~ VERSION { print $1 }' | head -n1)

apt-get install -y kubelet=${RESOLVED_KUBERNETES_VERSION} kubeadm=${RESOLVED_KUBERNETES_VERSION} kubectl=${RESOLVED_KUBERNETES_VERSION}
apt-mark hold kubelet kubeadm kubectl

Step 9. Kubeadm Configuration

As we're using kubeadm to handle the installation of Kubernetes, we need to generate a couple of configuration files. The control plane nodes all use the same user data, so we're going to provision both an init.yaml and a join.yaml.

#!/usr/bin/env bash
set -e

KUBERNETES_VERSION=$(jq -r ".kubernetesVersion" /tmp/customdata.json)
JOIN_TOKEN=$(jq -r ".joinToken" /tmp/customdata.json)
CONTROL_PLANE_IP=$(jq -r ".controlPlaneIp" /tmp/customdata.json)
CERTIFICATE_PRIVATE_KEY=$(jq -r ".certificatePrivateKey" /tmp/customdata.json)
CERTIFICATE_CERT=$(jq -r ".certificateCert" /tmp/customdata.json)

PRIVATE_IPv4=$(curl -s https://metadata.platformequinix.com/metadata | jq -r '.network.addresses | map(select(.public==false and .management==true)) | first | .address')

echo "KUBELET_EXTRA_ARGS=--node-ip=${PRIVATE_IPv4}" > /etc/default/kubelet

cat > /etc/kubernetes/init.yaml << EOF
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
controlPlaneEndpoint: ${CONTROL_PLANE_IP}:6443
kubernetesVersion: ${KUBERNETES_VERSION}
apiServer:
  timeoutForControlPlane: 4m0s
certificatesDir: /etc/kubernetes/pki

---

apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: ${PRIVATE_IPv4}
  bindPort: 6443
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: ${JOIN_TOKEN}
  ttl: "0"
  usages:
  - signing
  - authentication
nodeRegistration:
  kubeletExtraArgs:
    cloud-provider: "external"
  taints: null
EOF

cat > /etc/kubernetes/join.yaml << EOF
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
controlPlaneEndpoint: ${CONTROL_PLANE_IP}:6443
kubernetesVersion: ${KUBERNETES_VERSION}
apiServer:
  timeoutForControlPlane: 4m0s
certificatesDir: /etc/kubernetes/pki

---

apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
controlPlane:
  localAPIEndpoint:
    advertiseAddress: ${PRIVATE_IPv4}
    bindPort: 6443
discovery:
  bootstrapToken:
    apiServerEndpoint: ${CONTROL_PLANE_IP}
    token: ${JOIN_TOKEN}
    unsafeSkipCAVerification: true
  timeout: 5m0s
nodeRegistration:
  taints: null
EOF

Step 10. Kubernetes Certificates

Before we can run kubeadm to do its magic, we must ensure that there's enough certificates on the machine for kubeadm to use, otherwise it'll generate its own. While kubeadm is great at generating certificates, if we don't control the them, we'll lose the ability to add nodes in the future without manual interaction.

#!/usr/bin/env bash
set -e

CERTIFICATE_AUTHORITY_KEY=$(jq -r ".certificateAuthorityKey" /tmp/customdata.json)
CERTIFICATE_AUTHORITY_CERT=$(jq -r ".certificateAuthorityCert" /tmp/customdata.json)
SERVICE_ACCOUNT_KEY=$(jq -r ".serviceAccountKey" /tmp/customdata.json)
SERVICE_ACCOUNT_CERT=$(jq -r ".serviceAccountCert" /tmp/customdata.json)
FRONT_PROXY_KEY=$(jq -r ".frontProxyKey" /tmp/customdata.json)
FRONT_PROXY_CERT=$(jq -r ".frontProxyCert" /tmp/customdata.json)
ETCD_KEY=$(jq -r ".etcdKey" /tmp/customdata.json)
ETCD_CERT=$(jq -r ".etcdCert" /tmp/customdata.json)

mkdir -p /etc/kubernetes/pki/etcd

echo "${CERTIFICATE_AUTHORITY_KEY}" > /etc/kubernetes/pki/ca.key
echo "${CERTIFICATE_AUTHORITY_CERT}" > /etc/kubernetes/pki/ca.crt
echo "${SERVICE_ACCOUNT_KEY}" > /etc/kubernetes/pki/sa.key
echo "${SERVICE_ACCOUNT_CERT}" > /etc/kubernetes/pki/sa.crt
echo "${FRONT_PROXY_KEY}" > /etc/kubernetes/pki/front-proxy-ca.key
echo "${FRONT_PROXY_CERT}" > /etc/kubernetes/pki/front-proxy-ca.crt
echo "${ETCD_KEY}" > /etc/kubernetes/pki/etcd/ca.key
echo "${ETCD_CERT}" > /etc/kubernetes/pki/etcd/ca.crt

Step 11. Kubeadm Exec

Everything is now in-place and we can run kubeadm init or kubeadm join. We use ping to check for a response on the control plane EIP. If we get a response, we can assume a first control plane node is running, and subsequently run kubeadm join. If we get no response, we'll run kubeadm init and assume this is the first node.

We need to ignore some preflight errors because of the generated kube-vip manifest in the earlier step and our seeded certificates.

#!/usr/bin/env bash
CONTROL_PLANE_IP=$(jq -r ".controlPlaneIp" /tmp/customdata.json)

if ping -c 1 -w 30 ${CONTROL_PLANE_IP};
then
  kubeadm join --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests,FileAvailable--etc-kubernetes-pki-ca.crt \
    --config=/etc/kubernetes/join.yaml
else
  kubeadm init --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests,FileAvailable--etc-kubernetes-pki-ca.crt \
    --skip-phases=addon/kube-proxy --config=/etc/kubernetes/init.yaml
fi

rm /etc/kubernetes/{init,join}.yaml

cat >> /etc/network/interfaces << EOF
auto lo:0
iface lo:0 inet static
    address ${CONTROL_PLANE_IP}
    netmask 255.255.255.255
EOF
ifup lo:0

Step 12. Deploying the Cloud Controller Manager

We need to deploy the cloud controller manager, which adds the annotations needed by kube-vip to handle service load balancers within our cluster, which allows ingress controllers to work.

Warning: We're deploying a specific hash of this image for the time being due to some regressions in the latest tagged release.

METAL_AUTH_TOKEN=$(jq -r ".metalAuthToken" /tmp/customdata.json)
METAL_PROJECT_ID=$(jq -r ".metalProjectId" /tmp/customdata.json)

cat << EOF | kubectl --kubeconfig=/etc/kubernetes/admin.conf apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: metal-cloud-config
  namespace: kube-system
stringData:
  cloud-sa.json: |
    {
    "apiKey": "${METAL_AUTH_TOKEN}",
    "projectID": "${METAL_PROJECT_ID}"
    }
EOF

curl -fsSL https://github.com/equinix/cloud-provider-equinix-metal/releases/download/v3.1.0/deployment.yaml \
  | sed -E 's/v3.1.0/7e3189de8abd08dcae35cf052b45326c29f79b7b/g' \
  | kubectl --kubeconfig=/etc/kubernetes/admin.conf apply -f -

Step 13. Container Networking (CNI)

In this guide, we're using Cilium as the CNI. The easiest way to install Cilium is via Helm.

#!/usr/bin/env bash
set -e

CONTROL_PLANE_IP=$(jq -r ".controlPlaneIp" /tmp/customdata.json)

curl -fsSL https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash


helm repo add cilium https://helm.cilium.io/

helm template cilium/cilium  \
  --version 1.10.2 \
  --namespace kube-system \
  --set image.repository=docker.io/cilium/cilium \
  --set global.ipam.mode=cluster-pool \
  --set global.ipam.operator.clusterPoolIPv4PodCIDR=192.168.0.0/16 \
  --set global.ipam.operator.clusterPoolIPv4MaskSize=23 \
  --set global.nativeRoutingCIDR=192.168.0.0/16 \
  --set global.endpointRoutes.enabled=true \
  --set global.hubble.relay.enabled=true \
  --set global.hubble.enabled=true \
  --set global.hubble.listenAddress=":4244" \
  --set global.hubble.ui.enabled=true \
    --set kubeProxyReplacement=probe \
    --set k8sServiceHost=${CONTROL_PLANE_IP} \
    --set k8sServicePort=6443 \
  > /tmp/cilium.yaml

kubectl --kubeconfig=/etc/kubernetes/admin.conf apply --wait -f /tmp/cilium.yaml

Step 14. Deploying kube-vip for Services

Now we can deploy kube-vip again, this time as a DaemonSet. This will provide BGP advertisements from all of our worker nodes, allowing our ingress traffic to be load balanced.

#!/usr/bin/env bash
set -e
INGRESS_IP=$(jq -r ".ingressIp" /tmp/customdata.json)

cat << EOF | kubectl --kubeconfig=/etc/kubernetes/admin.conf apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-vip
  namespace: kube-system

---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  name: kube-vip
rules:
  - apiGroups: [""]
    resources: ["services", "services/status", "nodes"]
    verbs: ["list","get","watch", "update"]
  - apiGroups: ["coordination.k8s.io"]
    resources: ["leases"]
    verbs: ["list", "get", "watch", "update", "create"]

---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kube-vip
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-vip
subjects:
- kind: ServiceAccount
  name: kube-vip
  namespace: kube-system
EOF

ctr image pull ghcr.io/kube-vip/kube-vip:latest
ctr run \
    --rm \
    --net-host \
    ghcr.io/kube-vip/kube-vip:latest \
    vip /kube-vip manifest daemonset \
      --interface lo\
      --inCluster \
      --services \
      --annotations deploy.equinix.com \
      --bgp | kubectl --kubeconfig=/etc/kubernetes/admin.conf apply -f -

Step 15. Block Metadata Access

As we store all of our certificates in the cluster, we need to block access to the metadata server. We can do this via iptables and resolving the metadata.platformequinix.com hostname.

We do this immediately with iptables and then replicate the command into a per-boot script to ensure it survives a reboot.

#!/usr/bin/env bash
set -e

METADATA_IP=$(dig +short metadata.platformequinix.com)

iptables -A OUTPUT -d ${METADATA_IP} -j DROP

mkdir -p /var/lib/cloud/scripts/per-boot/
cat << EOF > /var/lib/cloud/scripts/per-boot/deny-egress-metadata.sh
#!/bin/bash

iptables -A OUTPUT -d ${METADATA_IP} -j DROP
EOF

Kubernetes with kubeadm

On this page

Warning

The User Data

Step 1. Waiting for BGP Metadata

Step 2. Downloading Metadata

Step 3. Add BGP Routes

Step 4. Package Repositories

Step 5. Container Runtime

Step 6. Kube-vip for Control Plane

Step 7. Kubernetes Prerequisites

Step 8. Kubernetes Packages

Step 9. Kubeadm Configuration

Step 10. Kubernetes Certificates

Step 11. Kubeadm Exec

Step 12. Deploying the Cloud Controller Manager

Step 13. Container Networking (CNI)

Step 14. Deploying kube-vip for Services

Step 15. Block Metadata Access

Last updated

Category

Tagged

You may also like

On this page

Warning

The User Data

Step 1. Waiting for BGP Metadata

Step 2. Downloading Metadata

Step 3. Add BGP Routes

Step 4. Package Repositories

Step 5. Container Runtime

Step 6. Kube-vip for Control Plane

Step 7. Kubernetes Prerequisites

Step 8. Kubernetes Packages

Step 9. Kubeadm Configuration

Step 10. Kubernetes Certificates

Step 11. Kubeadm Exec

Step 12. Deploying the Cloud Controller Manager

Step 13. Container Networking (CNI)

Step 14. Deploying kube-vip for Services

Step 15. Block Metadata Access

Last updated

Category

Tagged

You may also like

Crosscloud VPN with WireGuard

Kubernetes Cluster API

OpenStack DevStack

Proxmox