Kubernetes, today's go-to container orchestration system, is hosted by the Cloud Native Computing Foundation.
Since it was founded in 2014, Kubernetes has grown into one of the highest-velocity projects in the history of open source. The container orchestration system was started at Google and donated to the Linux Foundation as a seed technology for the Cloud Native Computing Foundation. And in the past year, “about 3,000 separate folks made a specific commit, a contribution,” says CNCF Executive Director Dan Kohn. “The total number of folks learning about it is probably in the high thousands or even ten thousand. And it requires an insane level of cooperation and coordination to engage all those people.”
Building Tools for CNCF on Equinix Metal
To help with this daunting task, CNCF Senior Developer Lukasz Gryglicki built DevStats, which Kohn calls “a congestion diagnosis system for one of the highest throughput communities in almost any measure of cooperation.” The app provides real-time analysis of activity on Kubernetes’ GitHub: for example, the number of contributions, the level of engagement of contributors, how long it takes to get a response after an issue is opened, which special interest groups (SIGs) are the most responsive. “We had experience using some other tools with a similar goal, but none of them had been developed with the CNCF and the Kubernetes community in mind,” says Ihor Dvoretskyi, Developer Advocate for CNCF. “We decided to move forward with our own solution where we can implement almost everything that we made.”
Data, Lots of Data
The data was easy enough to access: Every change that’s made to any public repo in GitHub is housed in GitHub Archive. But Gryglicki needed to find what was relevant to Kubernetes among the several terabytes of data in the Archive.
“GitHub now has about 77 million known public repos, of which we care about 40,” says Kohn. “So Lukasz throws away all the information except for the 40 that we care about, and goes through and processes it, one commit, one change at a time, for the four-year history of Kubernetes.” The pertinent data is then placed in a Postgres database, where some Postgres series are performed, and then moved into an InfluxDB database and displayed with a Grafana frontend. GitHub publishes changes hourly, so DevStats has to go through the process of crunching all the data on the hour as well.
A Use Case Calling for Equinix Metal
And that’s where Equinix Metal—and especially its m1.xlarge bare metal server config—came into the picture. “Our database size and the hardware performance requirements are pretty high,” says Dvoretskyi. “We retrieve all archived data from the entire history of every repo on GitHub, and then extract only the data that we need for the projects we care about.”
Processing this data quickly benefits from the combination of a fast CPU (in this case, dual 12-core Intel Xeon processors, for a total of 24 cores or 48 threads), a large amount of ultra-fast NVMe flash storage, and plenty of RAM. “For the quickest performance, we wanted to work with the database locally, and that’s why we decided to take advantage of Equinix Metals bare metal offering instead of the virtual machines that are more common in the public cloud,” adds Dvoretskyi.
Gryglicki initially wrote the application in Ruby, but “it’s very complex to get Ruby to efficiently make use of multiple threads, to be able to concurrently share the work,” says Kohn. “He rewrote it in Go in a couple of weeks, and was able to get something like a 20x speedup in performance, which is just another way of saying he was making full use of the server. The total time to process all of the source data is only a few hours, and for each hourly update, it only takes a few minutes.”
Helping CNCF Contributors
The stories that the DevStats graphs tell are of the utmost importance to the ContribX group, a part of the Kubernetes community that is dedicated to improving the experience of contributors. So that team is working with Gryglicki to fine-tune the graphs delivered by the app, including providing data grouped by release cycle. “Right now the issues we have are around information overload, trying to do more concise graphs or telling better stories,” says Kohn. “We’re trying to focus on the ones that actually have useful information for the community.”
DevStats started with Kubernetes, but now tracks all 25 of CNCF’s projects, plus the CNCF projects aggregated together. “It was just a couple weeks’ work to port it over,” says Kohn. “What’s amazing is the whole pipeline is open source, and all the code that Lukasz has written is open source, so anyone can recreate this.”
One impressive statistic that’s being tracked by DevStats is the total number of developers who have contributed to CNCF projects, which stands at 38,571 as of August 2018—and counting. “When you look at the companies they come from, it’s all the big public clouds, all the biggest enterprise software companies, hundreds of the most innovative startups,” says Kohn.
More Development on Equinix Metal
That’s not the only CNCF development Equinix Metal has enabled.
Launched earlier this year, the CrossCloud dashboard displays the status of both the latest release and the latest development version, and daily updates of how CNCF projects are running on seven different public clouds, including Equinix Metal. The dashboard currently covers five of the CNCF technologies, plus the ONAP networking project, with one new project added about every month. In the future, more clouds will be included as well. “I think it’s going to blow away networking folks in particular because they know in principle it would be nice to run the same software on multiple clouds, but to actually see in practice, hit deploy and it goes out to all of them, is really cool,” says Kohn.
Additionally, Equinix Metal is being used as the sandbox infrastructure for different projects within CNCF. “Developers can use bare metal servers that can be provisioned with an API as easily as virtual cloud servers. With bare metal, they don’t have some specific limitations that other public cloud resources may have, which can be especially important while developing or testing software,” says Dvoretskyi.
And that’s the promise of Equinix Metal’s bare metal cloud. “DevStats and CrossCloud are definitely just two examples of the things you can do,” says Kohn, “but the whole point of bare metal is there’s no limit to what you can do with it.”
Members of CNCF and the broader open source community can request access to Equinix Metal infrastructure using the CNCF Community Infrastructure Lab. Learn more, and request access, by clicking here.
Note: Former CNCF Executive Director Dan Kohn passed away Nov. 1, 2020 at age 47 due to complications of colon cancer. Based in New York City, Dan was a friend to many and his contributions to open source, conformance standards for Kubernetes, and as a CEO of several start-ups have made a lasting impact on the tech industry.