Skip to main content

How We Built a Cloud-Like OS Experience On Bare Metal

It took a village, but our lineup of “officially” supported operating systems now has more options and, importantly, is far easier to update and grow.

Headshot of Gema Gomez
Gema GomezSenior Engineering Manager, Systems Engineering
Headshot of Mike Mason
Mike MasonSenior Software Engineer
Headshot of Jacob Smith
Jacob SmithVP, Digital Services Strategy & Marketing
How We Built a Cloud-Like OS Experience On Bare Metal

The first time Dave Cottlehuber heard of any variant of BSD, the widely deployed open source Unix derivative, was in the late 90s, from a very excited colleague. But that exposure was limited. He simply learned of its existence and didn’t really get acquainted with it until about a decade later, when he was doing work for a client that required FreeBSD. Cottlehuber installed it on his MacBook. The OS, it turned out, didn’t support Apple’s video hardware at the time, so he spent most of his time working in a very small terminal window, in the process learning a lot about FreeBSD.

He found that he liked working with the operating system, later bringing it to other projects his small firm, SkunkWerks, did for clients. Cottlehuber thought FreeBSD was a lot easier to work with than Linux distros, which at the time were switching to systemd. Getting things done for customers didn’t require so much tinkering with the server OS. Later, in 2015, Skunkwerks saw its bet on FreeBSD really pay off, when OpenSSL, the open source cryptography library project, revealed a number of serious security vulnerabilities. Cottlehuber and team had to patch A LOT of things repeatedly, and because they were running FreeBSD, they switched their applications over to use LibreSSL (provided by the OpenBSD project), avoiding a lot of patchwork as a result.

You may not know it, but FreeBSD is a heavyweight in both consumer and enterprise OS leagues. The OpenConnect appliance that streams video to Netflix customers, for example, is based on FreeBSD. JunOS, which powers many Juniper switches, routers and other appliances, is based on it. Apple’s operating systems “borrow” heavily from the project’s code base. The operating systems on multiple gaming consoles, such as the Sony PlayStation 4 and Nintendo Switch, use FreeBSD code. Storage vendors Dell and NetApp have entire product lines based on the OS, and WhatsApp, famously, ran 2 million users per FreeBSD server before being gobbled up by Facebook. Many of the companies that use it contribute improvements upstream to the FreeBSD project.

Over the past year or so, a group of engineers at Equinix expanded the list of operating systems validated to run on Equinix Metal, our bare metal cloud service, and FreeBSD is one of the recent additions to that list. In our effort to bring the bare metal user experience closer to that of traditional virtualized cloud platforms, we wanted to give users a hassle-free way to boot a server OS of their choice on any of the hardware configurations we offer. We relied on several contributors to the various open source projects to help us make them “just work” on Equinix Metal. Cottlehuber, who over the years has become a contributor to FreeBSD, was instrumental in making that happen for that OS, working through this unusual but powerful arrangement as our FreeBSD expert and liaison with the project’s community.

Today, a year or so later, we can safely say that most of the OSes we originally set out to add to our lineup just work—and receive regular and timely updates. A user can easily boot any of them on any available bare metal server config in any of our data centers around the world. We have learned a lot over this period, both how differently various operating systems are put together, and also how open source projects work. We have completely redesigned our process for building Metal-compatible server OS images and validating them across our hardware fleet, making the process of adding new operating systems to the lineup and updating the current ones fully automated—and much faster!

Why the Server OS Experience Is Important

In the early days of Packet (which was acquired by Equinix in 2020, eventually becoming Equinix Metal), we knew that getting our operating system support right was important—very hard to do, but important. The first thing anyone does with a server is install and then boot up an OS, which is made relatively simple by the use of virtual machines on traditional, virtualized cloud platforms. We knew that if we were to get to parity with the cloud user experience but on bare metal we had better figure this one out.

We validated several initial operating systems on our platform. Some of our customers were using those and some would use our Custom iPXE, booting their own server OS on the platform via iPXE scripts. One of our early successes was CoreOS. We knew founder Alex Polvi and the team, and they helped us get CoreOS to boot smoothly on our hardware. That helped us get noticed by a lot of new customers who used CoreOS (which was blowing up at the time thanks to the rise of Linux containers).

This past year we decided to grow our server OS lineup, and evolve our internal systems from manual to automated. Parity with the cloud experience requires more than supporting just the things most people use. A diverse set of customers had to be able to bring the tools they used elsewhere to what was now called Equinix Metal—even if those tools didn’t have the greatest market share. We decided to make as many operating systems as we possibly could work “out of the box.” After all, why should Equinix be an arbiter of which operating systems are important and which aren’t? Our job is to make sure our platform works with whatever server OS a customer chooses.

So, we listened to our community and users, made a list of all the operating systems to add to the lineup, and got to work. Besides making Metal useful to more people, we saw this initiative as an opportunity to engage more deeply with the open source community and the vendors within it. We have fantastic engineers, but we aren’t experts in each and every operating system. Instead of randomly popping into open source project chats whenever we needed something we approached it as an experiment: could we formalize our engagements with these communities? So, we reached out to several of the key contributors to the projects and asked them to help us—as paid contractors—get the operating systems they are so passionate about to boot smoothly on Metal.

Automation to the Rescue

As any engineer at a fast-growing business will tell you, there’s never a “good time” to add a big new project to the engineering team’s to-do list. Our engineers were already busy working on Metal’s third-generation hardware launch. We were deploying a batch of brand new server platforms in our data centers and had to make sure all the things that worked on the previous generation of hardware worked just as well on the new stuff. That included operating systems, which all had to be tested and validated on each new config. And now we were suddenly talking about testing and validating a batch of new operating systems to the already mounting pile of work we had in front of us. It felt overwhelming… (Deep breaths.)

As any engineer at a fast-growing business can also tell you, automation is often the answer to a mismatch between the size of a task and the available resources. As sensible engineers, we asked ourselves how much of the process could be automated. The answer was: quite a bit! 

Previously, to get a server OS gelling with our platform we would hardcode OS-specific and vendor-specific installer logic into our stack. This made both adding support for new operating systems and updating existing ones a lengthy, cumbersome process. Glaringly, writing custom code for each OS installer is about as far from “automation” as one can get.

Our solution was Vendor Services, a piece of software we wrote to handle all the vendor-specific logic (including Metal’s) and ensure that each of the various OS installers has everything it needs. And their needs vary widely from one to another—a big part of the reason we had been writing custom code for our stack to handle the installers in the first place. NixOS and Talos, for example, each needs only a couple of files to install (the kernel and initramfs). FreeBSD only needs the disk image. Meanwhile, Flatcar and Nutanix AOS both need custom config files that are quite a bit heavier (in addition to the basic kernel and initramfs files).

Vendor Services is a framework that gives us a way to a) express to the OS engineers what Metal needs for their images to work while abstracting away Metal-specific information they don’t need, and b) get each OS installer the files it requires. Once an implementation of Vendor Services for an operating system is created to correlate file endpoints with the right store file paths, the software automatically looks up and fetches all the necessary specifications and files.

To automate regular OS updates, we built Vendor Updater, which regularly checks if updates are available. If there’s a Talos update, for example, the Updater will automatically test it on all the Metal server configs and update our image. So, the next time a customer wants to boot Talos, they’ll get the latest version, while Vendor Services will know where to get all the files the Talos installer needs to install it. We created an automated CI pipeline (naming it Bob the Builder ?) to validate new server OS images. It now takes about 30 minutes to test and validate an OS update on our entire global fleet of servers!

Going to the Source

The first implementation of the Vendor Services framework was for NixOS, the Linux variant built on the open source Nix package manager, popular for its declarative configuration and package reproducibility. Metal had already supported NixOS, thanks largely to Graham Christensen, a passionate Nix and NixOS maintainer and founder of Determinate Systems

Determinate is a startup that helps clients use Nix, and Christensen had already done a lot of work deploying NixOS on Metal for a customer using NetBoot via our Custom iPXE. So, Metal contracted Determinate to develop a server OS image that would be validated “out of the box.” But that was done the old, hardcoded way, which meant the Metal NixOS image wasn’t getting updated anywhere near as frequently as the upstream operating system was. Now, with a Vendor Services implementation of the OS, updates to NixOS on Metal get applied as soon as they become available upstream.

Similarly, Cottlehuber, the FreeBSD maintainer, was familiar with the Metal platform and had gotten FreeBSD to run on it before we approached him to help us with our official image. Several other folks we worked with in this way had also been contributing to their respective open source OS projects (AlmaLinux, Alpine Linux, Talos Linux), doing a lot of work for free, driven by passion for the projects and their businesses’ commercial interests. Not all of them wanted a contract with Equinix—but some did, giving us the advantage of having the best experts in the operating systems we wanted to add do a lot of the technical work to build the right images.

As Cottlehuber puts it, all the work done by the maintainers of the open source projects you use is easy to take for granted sometimes. The typical ways in which big businesses support open source projects are contributing their own engineers’ time and donating to the foundations established around some of them. Both are great, but not all projects have foundations and are set up to handle donations and corporate support. We found that our approach—compensating maintainers for doing open source work that benefits our business specifically—is effective from a technical perspective, while also not feeling exploitative and enabling us to stay close to the community.

We look forward to seeing the improvements that have been made to these operating system projects through this process benefit others who deploy on bare metal. NixOS, for example, now has very robust support for a wide variety of bare metal hardware (including Ampere Arm servers). This would have been much more difficult, had one of its core maintainers not been engaged so closely with Equinix.

Overall, we consider our experiment with this model of engagement with the maintainers a success. We get the benefit of folks who know exactly how these operating systems work making them gel with our platform. Meanwhile, they don’t feel that they’re being asked for free stuff and get to improve the projects in ways that make them more robust in the long run.

Published on

12 January 2023