How Tinkerbell Got Its Wings
Join Gianluca and the Equinix Metal team on their adventures with Tinkerbell and bare metal.
When I joined Equinix Metal (Packet at the time) earlier this year, I jumped right into what I knew best, starting with our Kubernetes Cluster API provider. While I was working on the Cluster API, another project caught my eye: Tinkerbell. Growing up in the cloud, I didn't have much experience with physical hardware, but I was immediately intrigued. It was a new challenge, and with Equinix's commitment to open source and staying "foundational" I knew we could build something impactful.
Gathering The Pixie Dust
With Tinkerbell, our mission is to automate the lowest part of the stack: taking inanimate servers (no matter what they are, where they live, or who owns them) and bringing them to life to be consumed by software.
Getting started was an adventure in some very fundamental and, err... mature technologies like PXE and DHCP. Let's just say these have been around for a long time! Coming from the cloud native world, I'm used to working with technologies that change regularly, but at least the infrastructure underneath it was consistent. With Tinkerbell, it's quite the opposite: we are taking well-established tools and technologies and bringing them into the cloud native era.
Tinkerbell was built by the original Packet team way before I arrived. It was a key part of their stack, helping to automate the provisioning of thousands of heterogeneous servers, tens of thousands of times per month, across a footprint of dozens of data centers. It worked so well that the team thought it could be useful to others, so we open sourced it. Having a community behind the project, with people and teams working on provisioning different hardware types, means that Tinkerbell is becoming useful for an increasing number of use cases beyond our internal ones.
The open source model makes perfect sense here: we can give something to the world, and we know the experience of others will make Tinkerbell stronger for everyone.
The Repos are Public, Now What?
One of the first steps we took with Tinkerbell was opening up the project to some great early adopters like Alex Ellis, Graham Christensen, Kinvolk, and Container Solutions. They contributed to the code and helped us create an intitial roadmap for the project. They also became advocates and helped spread the word about Tinkerbell.
Alex Ellis talked about Tinkerbell in his Bare Metal in a Cloud Native World article, and then shared his Awesome Bare Metal repo. Iago López Galeiras at Kinvolk demostrated Provisioning Flatcar Container Linux with Tinkerbell. Adam Otto at Container Solutions wrote a 3-part series on Bootstrap as Code, covering bootstrapping bare metal, putting that into practice in a homelab, and applying the previous two articles to a hybrid cloud model, all using Tinkerbell.
Oh Crap, We Need a Process!
Of course, with this initial success came issues. Getting an open source project ready for the world isn't easy, and we learned that the hard way. We made mistakes early on, like accidentally breaking Tinkerbell for the community (twice!) with updates that are made to the project for our internal use (sorry!). Fortunately, everyone was understanding, and it caused us to take a step back and look at how we were maintaining Tinkerbell. This highlighted the need to develop processes to ensure our budding community's needs were always front of mind.
An initial issue for contributors was experimentation. Not everyone has physical servers at home, and we didn't want to force people to provision Equinix Metal servers if they didn't want to. We created a Vagrant setup for Tinkerbell that allows contributors to experiment on their local machine (as long as they have enough RAM!).
As more people began engaging with Tinkerbell, issues started landing on our repos faster than we could process them, and often lacked key information. We started by introducing issue templates to make refinement easier and helped to signal which issues would be suitable for contributors with different experience levels. With the deluge of issues under control, we turned our attention to making the roadmap easier to add to and understand by adopting a proposals system.
With the rapid pace of change across the repositories, it quickly became clear that we needed to insulate contributors from the bleeding edge. To do this, we created a sandbox project, through which contributors can clone and deploy known tested versions of Tinkerbell's various microservices. This also made getting started more intuitive, instead of forcing people to understand the various microservices upfront.
Another challenge with a growing community is communication. The issue templates and proposal system streamlined our asynchronous communication, but chatting in real-time about emerging ideas and issues was key to our success. We solved this in two ways:
- Through the #tinkerbell channel in the Equinix Metal community Slack, which helped us to move all of our internal conversations into the public eye.
- By introducing public bi-weekly "Tinkerbell Community Meetings" in which contributors can bring their questions and ideas to ask the Equinix Metal team, or other contributors. You can find out about figure community meetings via the mailing list, and view all previous community meetings on the Tinkerbell YouTube channel.
Another issue was CI/CD. Although the initial version of Tinkerbell had tests, the increased rate of change highlighted areas where further testing would help us catch problems earlier and point contributors in the right direction when debugging. We implemented broader CI/CD using GitHub Actions, which runs end-to-end tests, unit tests, standard tools like prettier, gofmt, and golangci-lint, and various other checks.
Although implementing all this looks like a lot of work, the results are clear. Contributions from our partners are up, and introducing new features (like the recent multi-architecture builds for ARM, AMD64, and Intel X64 chips) has become much easier.
An extra special shout out also needs to go to our friends at Infracloud: Gaurav Gahlot, Aman Parauliya and Chitrabasu Khare. They've been instrumental in making Tinkerbell happen, and we couldn't have come so far, so soon, without their help.
Tinkerbell Takes Flight
Last month, Tinkerbell was accepted as a CNCF sandbox project. This is important to me; contributing Tinkerbell to the CNCF guarantees it will grow as a truly open source project. Tinkerbell isn't only for Equinix Metal anymore–it's part of a larger movement.
In addition to important elements like governance, the CNCF also helps us with another issue OSS projects face–building a community. For Tinkerbell to survive in the wild, it needs a robust and active community of people and companies committed to developing a reinforcing loop that keeps the project Tinkerbell running and growing and in line with community goals. As a CNCF project, others see that we have been vetted and will be more likely to get involved. Plus, we have access to help from the CNCF and their huge cloud native community.
Exciting Times Ahead!
Over the past 16 years since AWS started their incredible journey as the first cloud provider, API-driven infrastructure has changed how software is built. Tinkerbell helps to bring this API-driven approach to physical infrastructure. As the most exciting developments in API-driven infrastructure are clustered around the CNCF's member companies and projects, it's a natural home for Tinkerbell too.
We have big plans for the future, and we're looking forward to sharing some of the bigger news items in January 2021, but the sneak preview is this: more features and more end users! Personally, I'm excited about the next steps we're taking to make Tinkerbell more fully-featured and easier to contribute to. As a maintainer, I like this part best.
Would you like to join Gianluca and the team on this incredible adventure into bare metal? Join us in the Tinkerbell Slack, or give us some feedback by creating an issue, or submitting a proposal.