You've already made your decision … you're going all-in on Kubernetes. This is a smart decision! Companies across the world are all adopting Kubernetes to run their workloads and it makes sense to adopt the orchestration platform that has millions of developers deploying to it daily. The next question that usually pops up is "Bare metal or virtualised?" and I'm happy to tell you - you don't actually need to choose. We're in a good place in the world where we have the ability to run heterogeneous clusters, mixing bare metal and hypervisor, ARM and x86, RedHat and Ubuntu; Kubernetes abstracts away these concerns and provides a single interface, regardless of substrate.
Now you may think that bare metal also means losing access to "Managed" Kubernetes, such as AKS, EKS, or GKE; but I'm happy to tell you that's not the case either. We'll dive into that a little bit later.
So really the question should be "Should I be adding a bare metal to my infrastructure?"
Short answer: most likely.
Let's learn why.
Reasons to Add Bare Metal
Let's start with the core fact: bringing a hypervisor into your architecture will increase your costs.
Firstly, some hypervisors come with a licensing fee: such as VMware. While there are open-source hypervisors, platforms like VMware allow you to build a platform on-top of the virtual machines and include essential functionality such as migrations.
Secondly, by adding a hypervisor we're increasing the compute demands of our hardware. Not only do we need the hypervisors to run along aside our workloads, each of those virtual machines will also have their own kernel and operating system.
[Research from Ericsson] in 2020 estimated that hypervisors could in-fact add 30% overhead costs alone. Inline with that, Equinix customers have also found that using bare metal can [reduce their costs by upwards of 50%].
Much like the opening statement for costs, performance is almost identical. The moment you add a hypervisor to your architecture, you're allocated CPU cycles, RAM, and IOPS to said hypervisor and removing them from your workloads.
It's estimated that hypervisors can consume around 10% of your compute performance, and that's without including the guest operating systems.
Of course, with everything in technology it's about trade-offs and these small-ish overheads may make sense for the benefits of virtualisation; but there are workloads that are particularly sensitive to these trade-offs, such as AI and blockchains.
These workloads require access to TPUs, GPUs, and they eat up IOPS for breakfast, lunch, dinner, and snacks. If you want to bring AI into your product or company, you're going to need some bare metal capacity.
Your Own Virtualisation
Let's also clarify something else. As we stated earlier on, bare metal and virtualisation aren't mutually exclusive. When you have bare metal capacity, you can take charge of when to provide and accept workloads on virtualised node pools. Kubernetes provides all the features you need to slice and dice those node pools and encourage scheduling with affinities and taints.
WeaveWorks has been working on a pattern called "Liquid Metal" which using this type of hybrid architecture. The idea is that your control plane runs on virtual machines and the worker nodes run on bare metal. The benefit here is that the workloads on the worker nodes run as close to the metal as possible and can take advantage of your hardware, while the control plane (which doesn't tend to run many workloads) will run on the virtual machines and can take advantage of this ephemerality to make upgrades and remediation a lot simpler in the event something goes wrong.
So, Should I?
From a cost perspective, bringing bare metal capacity to most Kubernetes clusters is a smart idea; however it's not all about cost.
If your organisation does any machine learning/AI, High Performance Computing (HPC), financial transactions, or telecommunications - there's likely the performance benefits that will not only lead to better cost management, but better efficiency and effectiveness of the underlying hardware.
Challenges of Bare Metal
Now, this isn't all cherry blossoms and bells. There are some challenges, but fortunately there's also solutions.
The elephant in the room … how do we even get ourselves some bare metal Kubernetes?
We need to include kubeadm in this list, because it's as close to an official deployment vehicle as there is. However, it relies on strong infrastructure as code and configuration management. If you go this route, you'll need Terraform or Pulumi to manage the hardware, or Tinkerbell for on-premises, and Chef, Ansible, Salt, or Puppet to ensure that the hardware coming online can be provisioned into a Kubernetes node.
This is usually a fair amount of effort and comes down to the experience of your team and weighing the balance of CapEx and OpEx; but it can be a pretty great route if you've already got some of these tools in-place.
Cluster API is a Kubernetes subproject from sig-cluster-lifecycle. Cluster API solves the management of the lifecycle (create, scale, upgrade, destroy) of Kubernetes-conformant clusters using a declarative API. By running the Cluster API controllers on a Kubernetes cluster, you can use this declarative API to then subsequently provision another Kubernetes cluster, or 12.
The nice thing about this approach is that the initial Kubernetes cluster can be rather simple, such as minikube, k3s, and so forth. You can even bring the newly provisioned clusters under management of themselves, meaning they then run the Cluster API controllers on the new cluster.
There are many providers for Cluster API, which mean you can use it for bare metal and virtualised clusters alike; but of course we need to give special mention to the [Equinix Metal] and [Tinkerbell] providers.
Managed Kubernetes services exist and they're pretty great. You can use these services and enrich them with bare metal infrastructure. It's a pretty great pattern and AWS, Azure, and GCP provide the tools to make this happen.
Arc (Azure), EKS Anywhere (AWS), and Anthos (GCP) allow you to bring a piece of the cloud to your bare metal Kubernetes. They often work in two days:
- Providing a control plane for your bare metal Kubernetes clusters
- Allowing those bare metal Kubernetes clusters to work directly with cloud resources
This approach opens up a whole new world of hybrid architectures that allow you to have your cake and eat it too.
Node failures with bare metal clusters can be catastrophic and 9 times out of 10, it's because of storage. If your workloads don't store any state, then you're off the hook. However, if you have no state - we're going to assume you have no revenue either; so let's assume there's some state.
State and storage has always been a huge burden for on-premise bare metal, because bits have gravity and moving GBs may be easy enough, but moving TBs, PBs, or EBs can be challenging.
Kubernetes has a storage spec, the CSI, that provides the fundamentals for storage on Kubernetes clusters. While these volumes are backed on disks, the read and write requests are proxied through the CSI implementation allowing for replication and redundancies. One exciting implementation here is Mayastor from OpenEBS. It's a Rust based implementation that leverages SPDK, iouring, and NVMEoF to provide a best-in-class near metal performance for distributed and replicated storage on Kubernetes.
This allows node failures to be less catastrophic and more of a small burden. You'll still need to get that node back online, but your workloads will likely continue on another node.
Bare metal Kubernetes can be a huge accelerator for your team and organisation by reducing costs maximising performance, but you need to be aware of the trade-offs your making and we hope that this guide helps you approach them with the latest approaches to minimising the challenges and OpEx of such adoption.
Go forth and have some fun, and remember to reach out if we can help.
Last updatedJanuary 30, 2024
Ready to kick the tires?
Sign up and get going today, or request a demo to get a tour from an expert.