How We Navigated a Topsy-Turvy Supply Chain
Armed with robust data, careful planning, tight relationships and a lot of creative problem solving, we ensure that we have the servers and switches when and where we need them.
Earlier this month, when news outlets reported an assessment that global supply chains had “returned to normal,” it was welcome news. Our operations team had already felt this return to normal, following a long period of working with all hands on deck. Around this time three years ago we were facing something of a perfect storm: spiking demand combined with a chaotic IT supply chain.
Much like it strained the systems that organizations of all types and sizes relied on, the COVID-19 pandemic tested our ability to manage our supply chain—part of our core value proposition. A foundational promise Equinix Metal makes is that we’ll have bare metal servers ready to fire up when and where you need them. We couldn’t afford to let disarray in the global system for manufacturing and shipping products cause us to break that promise. Thanks to the flexibility and creativity of our team and the strong relationships we’ve built with people throughout this tight-knit industry, we weathered the storm and can now reflect on how we did it.
Let’s rewind to 2020. In January of that year Equinix agreed to acquire Packet (which later became Equinix Metal) and closed the transaction a couple of months later—just before things started shutting down around the world in order to “flatten the curve.” Demand for digital infrastructure was skyrocketing, as companies were fast-tracking their digital transformation initiatives and switching to remote work. In addition, we needed more gear than ever to take our platform to new markets now that we could leverage the global Equinix data center footprint.
Meanwhile, IT hardware supply chains (like supply chains for everything else) were increasingly chaotic. Manufacturers were working at partial capacity, and so was the shipping and logistics industry. Different countries had different (often fluid) rules about crossing borders as the COVID-19 pandemic was gathering steam. As a result, suppliers were unable to fulfill orders nearly as quickly as they had been able to in the “before” times. If earlier we had been able to order a switch or a server and have it shipped out the next day, lead times of six, 12, or even 18 months were not uncommon.
Obviously, longer lead times meant we had to forecast our capacity needs much further into the future. Around this time we also switched from capacity planning at the individual server level to planning at the rack level. In other words, a full rack of servers replaced a single server as the smallest unit of capacity. As a result, our hardware order volumes grew substantially. We’d been working in this way for a few months when the pandemic hit, which was a small stroke of good luck.
Pandemic Meant Staying Closer to the Supply Chain Than Ever
As the supply chain started to get wobbly, our operations team switched to a regime of constant contact with vendors and other stakeholders in the community. This was crucial because lots of decisions needed to be made on the fly to respond in a fluid situation.
A server or a switch that shows up at the loading dock of one of our data centers is a finished product by one of the manufacturers who themselves rely on a complex supply chain made up of a multitude of component makers, sellers and transporters. Getting the hardware up and running in our data centers also depends on us having other components on hand, such as power cables, network cables (both fiber and copper), port optics and so on. We use multiple vendors for everything, from network switches to cables, and managing our supply chain successfully means taking all of them into account.
We needed to understand our suppliers’ processes to make sure we placed orders far enough in advance and to see where we could help them shorten their lead times. While we couldn’t speed up delivery of steel to make server chassis, for example, we could pay for flying a finished order of servers instead of having them shipped by boat, reducing the wait time from six months to two because shipping ports were so backed up. If a vendor couldn’t fulfill our order fast enough because they couldn’t get enough flash memory on time, we would use our own flash manufacturer contacts to source the components. It was a collaboration with suppliers and, really, all stakeholders who participated in the supply chain to find solutions that benefitted everybody. This collaboration was instrumental to getting the gear we needed to serve customers during the COVID-19 pandemic.
Digital infrastructure may be a global industry, but the community that supplies much of the equipment is relatively small. In most cases people wanted to help each other out and did. Relationships had always been important to what we do, but over the last three years they were the glue that held everything together.
Combining Lead and Lag Strategies
Deciding how much equipment to order and when became an ongoing exercise in data gathering and analysis in order to place well informed bets. Of course, having reliable data was—and still is—table stakes. We needed it to make our forecasts, but also to explain to our business leaders why we were making bigger purchases than before.
We track how we consume resources (individual server configurations, rack unit utilization, power availability and so on) and use the data in our forecasting model. Analyzing historical data from the last nine months on a rolling basis, we extrapolate and make a projection for one year into the future using the “match strategy,” which is a combination of “lead” and “lag” strategies used in capacity planning, allowing for maximum flexibility to adjust forecasts based on new data as it comes in.
Rethinking Units of Capacity
A server rack is a useful abstraction in data center capacity planning, since our racks hold a standard set of IT gear and have uniform space and power requirements. It isn't the only unit of capacity we use, but it’s a central one, and everything else revolves around it.
We look at a rack’s life cycle in phases. The first phase is when an empty rack is deployed in a colocation cage, fitted out with structured cabling and powered. The second is when a network switch is installed in the rack, making it ready to have servers installed. The final stage is when it’s populated with servers and ready to accept customer workloads.
We use historical data on rates of takeup of empty racks in a cage as the lagging indicator and the number of empty racks that are deployed there as a leading indicator to forecast the number of switches we will need in that cage (and whether it’s time to add another cage at that site). In a similar manner, we use the number of racks that are already outfitted with switches together with historical data to forecast the number of servers we will need. For example, if we see that the number of empty racks in a cage is below a certain threshold, we order more racks for that site. If the number of racks outfitted with switches only is running low, we’ll order more switches. If server utilization rate at that site has been high, we’ll order servers to populate those racks.
This approach has made a positive impact not only on our ability to manage capacity and deploy capital efficiently, it’s helped with securing orders from hardware vendors. It allows us to order in higher volumes, giving suppliers a strong incentive to put our orders closer to the “top of the pile.”
The news reports that supply chains are back to normal ring true overall. While there is the occasional supply hiccup, our team doesn’t spend half the amount of time it used to spend talking with vendors, negotiating, trading and figuring out how to move goods from A to B faster. But the practices we relied on to get through the COVID-19 panemic's most challenging initial period will continue carrying us forward.
As we continue to scale the Metal platform across the global Equinix data center footprint, having the right data is only going to grow in importance. The same goes for managing capacity at the rack level: the more you scale, the bigger your units of capacity should be. Perhaps the most important lesson from the last few years, however, was about the importance of relationships. Without having built and maintained close relationships with our suppliers and peers in the industry, no amount of data or sophisticated analytics would be enough for us to secure the hardware we needed, when and where we needed it to serve our customers. The pandemic put all those relationships to the test, and we feel lucky to say that they withstood it!