Skip to main content

The Offload: A Bare Metal Cloud Use Case for SmartNICs

How having a Linux computer for a network card opens up a world of possibilities for network configuration.

Headshot of Levon Tarver
Levon TarverStaff Software Engineer
The Offload: A Bare Metal Cloud Use Case for SmartNICs

Last year, when we learned of the opportunity to participate in the NVIDIA DPU hackathon, we’d already been working on a way to leverage SmartNICs to make tools for network management more flexible for us and our customers. We thought the NVIDIA® BlueField® DPU could be the right hardware to do this, but having been only recently unveiled by NVIDIA, there were no DPUs to be had to test our theory. We had been using virtual machines to build the functionality we needed. The hackathon presented a chance to run it on real hardware.

Later we won second place in the hackathon. Besides confirming that our solution worked as designed, the recognition for our work taught us something new about the nature of innovation.

NVIDIA data processing units are sometimes referred to as SmartNICs, a term that describes a broad category of hardware designed to offload processing associated with tools for network management from the server CPU.

Making Tools for Network Management Much More Capable

NVIDIA BlueField DPUs are valuable to a bare metal cloud platform for a couple of reasons. One: they enable us to designate a network device as being exclusively associated with a specific customer server. Since the customer- and server-specific configuration can take place on the DPU instead of the switch, it greatly reduces the complexity of managing multi-tenant switch configurations and better isolates the impact of customer network configuration changes.

Two: they improve operational efficiency by enabling us to separate management of a server’s network connectivity from the server’s operating system. Deploying and managing advanced network infrastructure for servers is a common challenge for tools for network management in bare metal orchestration systems. Any sort of seriously customizable L2/L3 configuration often involves the difficult challenge of keeping the network infrastructure’s configuration and the customer’s server network configuration in sync. The three typical options for solving this are:

  1. Deploying a static network configuration from which the server should not deviate
  2. Using software to manage configuration changes on the network infrastructure and requiring the customer to make the same changes on the server
  3. Installing an agent on the customer’s server to synchronize network configuration changes as they occur on the network infrastructure

Option one seriously impacts our customers’ ability to deploy bare metal infrastructure in whatever arrangement they need for their workloads. Option two creates additional work for the customer when automating and deploying their infrastructure. Option three is often untenable, because it requires us to maintain a variation of every agent for every operating system we offer (plus manage the necessary changes for different versions of those operating systems, kernel updates, feature updates, etc.). And that’s glossing over the fact that customers may not be comfortable with running tools for network management as outside agents on their servers.

Currently, our network orchestration service provides the second option to our customers. The bare metal servers are deployed with a working, highly available Layer 3 network configuration. If the customer needs a more advanced network topology, they can change the configuration of the network fabric via the web console or our API. Unfortunately, depending on the configuration they choose, they may then be required to use out-of-band connectivity to manually change the server’s network configuration to match the fabric’s network configuration. Once they do this, the server will work properly on their new topology. It’s possible to use one-shot automation to work around this, but it can be brittle and tedious to diagnose when it doesn’t work correctly.

Using DPUs for network management helps us obtain the bare metal capabilities of option three without the drawbacks. It allows us to extend the network fabric we manage to the NIC without requiring access to the customer’s operating system. NVIDIA BlueField-3 DPUs in particular are capable of running an entirely separate Linux operating system that enables us to make L2 and L3 configuration changes on the NIC itself. They’re also powerful enough to run Network Function Virtualization directly on the card, increasing our ability to scale out flexible topologies to meet our customers’ network requirements.

This gives us more flexibility in designing, selecting, and deploying our physical network infrastructure. We can move more of the customer-specific configuration to the DPU, including things like BGP peering, VXLAN/EVPN services, more advanced network security capabilities, and the ability to present multiple virtual interfaces to the customer’s operating system. With this, our customers can replicate much more complex topologies with greater ease, while their tools for network management are less limited by the number of physical interfaces on the server or the capabilities of the top-of-rack switch.

Test Driving the NVIDIA BlueField DPU

Clearly, this is exciting technology for a bare metal cloud platform operator, but one of the immediate things we needed to consider was how to tie it into our platform in a scalable way. By managing network configuration at the server’s NIC instead of the switch, we’re multiplying the number of endpoints we have to manage by a significant factor. For example, in a data center with 100 cabinets, with 40 servers and two TOR switches to a cabinet, the number of network devices in your orchestration system goes from 200 to 4,000 (or 4,200 if you continue to manage the TORs).

We needed an agent that we could run on the DPUs, and it needed to be fast, lightweight, and simple to deploy and update for thousands of endpoints. We started prototyping an agent that could perform the necessary configuration for us, however, we had to do most of our testing using virtualization, since DPUs were not widely available at the time. Towards the end of 2021, we were invited to participate in NVIDIA’s North America DPU Hackathon. When we found out that we’d get a chance to test our concepts on real hardware, we jumped at the opportunity.

Our team members were all excited to get a chance to interact with the capabilities of theDPU. NVIDIA provided a few bootcamp sessions beforehand and gave us access to the DPU API docs and the NVIDIAⓇ DOCA™ SDK and runtime environment for writing tools for network management that take advantage of all the things a DPU can do. We didn’t have a lot of time to get familiar with the many DOCA SDK features and capabilities, so we all read through a chunk of it and reported our findings in a team huddle. We needed to get an example of our agent compiled for the DPU’s embedded Arm CPU architecture and begin testing functionality. It turned out that our initial concept, a gRPC agent configuring OVS on Linux, mapped really well to the features DOCA has out of the box. Because we used Golang to build it, getting a static binary running on the DPUs was simple.

We were quite fortunate to have a concept that aligned well with the DPU’s design. The proof of concept we had worked on earlier that year ran in a virtual machine. But, as we saw, it also ran flawlessly on real hardware, giving us greater confidence in our plans going forward. Our prototype was awarded second place—a huge bonus on top of just having the chance to test our concepts on the real thing. We ended the year on quite a high note.

Innovation Is a Grind

We started on this journey with the goal of improving our platform and our customer experience. When providing networking for virtual machines, there’s a lot you can handle at the hypervisor, but things become more difficult when you’re giving the customer full control of the server. However, that power is a big part of the reason customers come to us, and we can’t compromise on that. We had to find another way.

Working on this project with our team, I realized that innovation doesn’t necessarily look like it does in the movies. It’s not usually a bunch of super geniuses having a “eureka” moment. More often than not, it’s a result of relentlessly solving the problems at hand, one by one, day by day, until at some point you look up and realize that you’ve created something pretty cool, and others gather around the creation and become just as excited about it as you are.

Published on

14 March 2022
Subscribe to our newsletter

A monthly digest of the latest news, articles, and resources.