- Home /
- Resources /
- Learning center /
- Considerations for...
Considerations for a Highly Available Control Plane on Self-Managed Kubernetes
In this article, we look into how to make your self-managed kubernetes highly available by using technology like kube-vip and BGP.
Running your own Kubernetes cluster on Equinix Metal can be a fun but challenging endeavour. The first step to getting a Kubernetes cluster online is building a strong foundation with your control plane. You want your control plane to be highly available and resilient to failures. Best practices within the Cloud Native world encourage us to use Infrastructure as Code (IaC) and automate any manual processes, but this presents us with a chicken and egg problem.
The Problem
Running a single node control plane setup is rather trivial, because you can run kubeadm init
and you’re good to go. The reason this doesn’t require any further configuration is because everything the Kubernetes control needs will be available over localhost and as such there’s no "discovery" required. Unfortunately, for a highly available control plane - each of our control plane nodes NEEDs to be able to communicate with one another. The reason for this is because each node will run an etcd
node. Running etcd
in a clustered, highly available, setup requires consensus, using Raft. This means that each etcd
node MUST be able to communicate directly with every other node, which means that each etcd
node NEEDS to know the IP address of every other node within the cluster.
So if we’re adopting best practice and using IaC to creation and bootstrap these nodes, as an atomic action, then the IPs won’t be known until execution time; more concretely, we can’t load the 3 IP addresses of each node into the user data of every other node.
Fortunately, etcd and the kube-api-server can handle this for us if we solve one simple thing: provide a single endpoint that resolves to the API server.
So what can we do?
Using Border Gateway Protocol (BGP), we can advertise the same IP address from each of our control plane nodes and allow the API server and etcd
to do their jobs.
Equinix Metal offers BGP services per project, so we can use our IaC to request a global IP address and advertise it from any device within the project. We can use an open-source CNCF project, kube-vip
, to handle the BGP advertisements for us.
Let’s take a look.
Note: We’ll use Terraform for any IaC and cloud-init for the user-data / shell scripts. You can adapt and use whatever methods you wish, but these examples should serve as a good base.
Enabling BGP
The first thing we need to do is enable BGP and request a global IP.
resource "equinix_metal_project" "our_project" {
name = "our-project"
bgp_config {
deployment_type = "local"
asn = 65000
}
}
The bgp_config
requires a deployment_type
which can be "local" or "global". We’re using "local", because we’re keeping everything inside Equinix Metal. If you’re needs are more bespoke (Bring your own IP or ASN), then you can discuss with the Equinix Metal Solution Engineers around your global BGP needs. You also need to set the asn
to "65000". This is pretty much fixed, but it could potentially change in the future. Check the Terraform docs as required.
Next we need to request an IP to use for the BGP advertisement.
resource "equinix_metal_reserved_ip_block" "bgp_ip" {
project_id = equinix_metal_project.our_project.id
type = "global_ipv4"
quantity = 1
}
Lastly, we need to enable the BGP session on our device.
resource "equinix_metal_bgp_session" "bgp_session" {
device_id = equinix_metal_device.our_device.id
address_family = "ipv4"
}
Preparing the Devices
To broadcast the BGP advertisements, we need to ensure the device is prepared. What do I mean by prepared? Well, we can use the metadata APIs to ensure the BGP configuration is enabled and that routes are available to speak to the peers within the metro. This just takes a couple of commands within your cloud-config.
First, let’s fetch and store the metadata response. When using IaC, there’s a small chicken and egg problem with the BGP enablement on the project ... sometimes, your device can spin up a little quicker. So to ensure we handle that, do a quick loop:
curl -o /tmp/metadata.json -fsSL https://metadata.platformequinix.com/metadata
until jq -r -e ".bgp_neighbors" /tmp/metadata.json
do
sleep 2
curl -o /tmp/metadata.json -fsSL https://metadata.platformequinix.com/metadata
done
Next, we want to ensure that the routes are correct to communicate with the BGP peers. We can grab the peer and gateway information from downloaded metadata.
GATEWAY_IP=$(jq -r ".network.addresses[] | select(.public == false) | .gateway" /tmp/metadata.json)
for i in $(jq -r '.bgp_neighbors[0].peer_ips[]' /tmp/metadata.json); do
ip route add $i via $GATEWAY_IP
done
BGP Advertisements
Now that everything is in-place, we can ask kube-vip
to start advertising our BGP IP.
We’ll be using Kubernetes static manifests to run kube-vip
as part of our Kubernetes control plane. Kube-vip
, as one could infer from its name, is intended to run in this fashion and provides convenience functions for us to do this easily.
So let’s pull down the image and generate our static manifest.
ctr image pull ghcr.io/kube-vip/kube-vip:v0.6.0
ctr run \
--rm \
--net-host \
ghcr.io/kube-vip/kube-vip:v0.6.0 \
vip /kube-vip manifest pod \
--interface lo \
--address $(jq -r '.network.addresses | map(select(.public==true and
.management==true)) | first | .address' /tmp/metadata.json) \
--controlplane \
--bgp \
--peerAS $(jq -r '.bgp_neighbors[0].peer_as' /tmp/metadata.json) \
--peerAddress $(jq -r '.bgp_neighbors[0].peer_ips[0]'
/tmp/metadata.json) \
--localAS $(jq '.bgp_neighbors[0].customer_as' /tmp/metadata.json) \
--bgpRouterID $(jq -r '.bgp_neighbors[0].customer_ip'
/tmp/metadata.json) | tee /etc/kubernetes/manifests/kube-vip.yaml
# This is needed to avoid a port conflict
sed -ri 's#- manager#- manager\n - --promethuesHTTPServer=:2113#g' /etc/kubernetes/manifests/kube-vip.yaml
This runs kube-vip
on each control plane node in it’s --controlplane
mode. This enables each control plane mode to discover the first API server, as it takes ownership of the lease, and in-turn discover the etcd
nodes for consensus.
Conclusion
Running Kubernetes in a highly available fashion on bare metal doesn’t need to be complicated, you just need to understand how BGP can open up new patterns and understand the tools emerging in this space to make traditional networking patterns more Cloud Native friendly.
Kube-vip
is a fantastic new tool that enables BGP for Cloud Native and Kubernetes organisations.
You may also like
Dig deeper into similar topics in our archivesConfiguring BGP with BIRD 2 on Equinix Metal
Set up BGP on your Equinix Metal server using BIRD 2, including IP configuration, installation, and neighbor setup to ensure robust routing capabilities between your server and the Equinix M...
Configuring BGP with FRR on an Equinix Metal Server
Establish a robust BGP configuration on your Equinix Metal server using FRR, including setting up network interfaces, installing and configuring FRR software, and ensuring secure and efficie...
Crosscloud VPN with WireGuard
Learn to establish secure VPN connections across cloud environments using WireGuard, including detailed setups for site-to-site tunnels and VPN gateways with NAT on Equinix Metal, enhancing...
Deploy Your First Server
Learn the essentials of deploying your first server with Equinix Metal. Set up your project & SSH keys, provision a server and connect it to the internet.