If you’re a cloud user, you have probably never heard of BMC (baseboard management controller), IPMI (intelligent platform management interface) or SOL (serial over lan). What about ILOM (Integrated Lights Out Manager) or iDRAC (Integrated Dell Remote Access Controller)?
However, if you’ve operated your own physical servers in a datacenter or private colocation environment, I’m sure some of these technologies has saved your butt during a late night maintenance emergency!
These acronyms (and many others) represent standard or proprietary vendor tools for interacting with a server’s base management controller -- essentially a chip on the motherboard that can perform certain actions or access information without hitting the operating system.
The most common platform is IPMI. Currently in version 2.0, IPMI gives “out of band” access over ethernet to things like rebooting a server, measuring temperature or fan speed, or accessing an interface to a server such as IP-KVM or Serial-over-Lan.
So, as a cloud user, why should you care? Read on, young hardware jedi!
Why do I need Out of Band Anyways?
Imagine this scene, or simply recall your last 3am emergency ops ‘situation’:
- Pagerduty sends you an alert at 2am, letting you know about an issue on one of your core application servers.
- You try to SSH to the machine but you can’t.
- The server is responding to ICMP tests, but your app appears to be offline.
- You are sitting there scratching your head / rubbing your eyes wondering what has happened to your server and what you can do about it.
- It’s time to tap into one of these acronyms.
Out-of-band access gives you the privilege to connect to your machine even if key services like SSH stop functioning so you can figure out why your server can’t get past GRUB or why OOM killer got rid of your SSH service. Pretty handy, eh?
Now, in a virtual cloud environment, OOB access (usually a virtual console) is provided by the hypervisor itself. But in a bare metal cloud platform like Equinix Metal where we give our clients direct access to each server with its own operating system and kernel (read: no underlying hypervisor layer), we have to rely on something in the hardware itself. In comes IPMI.
In the Equinix Metal platform, we leverage IPMI for several key functions, including power cycling our machines, reading critical health information on server instances and providing that oh-so-useful “serial console” to customers.
Given the maturity of IPMI, nearly every major server lineup supports it. We’re mainly a Supermicro and Quanta shop, whose machines support IPMI 2.0 almost across the board, but other vendors have their own proprietary implementations (e.g. Dell with iDRAC, etc). Some servers implement an ultra lean BMC, like Open Compute. I’d argue that is one of the key reasons why you won’t find many enterprises using OCP gear -- the lack of a common BMC changes the paradigm for lights out management dramatically.
Limitations of IPMI and RedFish 1.0
There are several good tools to interact with IPMI. A short list is provided below:
|IPMITool||Linux, Solaris, BSD, CygWin*1|
|IPMIUtils||Linux, Windows, Solaris, BSD, EFI|
|FreeIPMI||Linux, BSD, Solaris, CygWin*1|
|SMC IPMI Tool||Windows, Linux|
However, IPMI itself is a pretty old protocol and can’t really be thought of as API-friendly. For most users, who occasionally access the BMC, this isn’t an issue. But at Equinix Metal, we leverage the BMC via IPMI alot, performing thousands of IPMI interactions an hour and pushing the limits of what IPMI and the tiny BMC chips were designed for.
Scripting against IPMI is done via line-by-line bash scripts (or something similar) - leaving much to be desired for modern systems automation.
Moving and improving standards like IPMI in such a complicated ecosystem like the server hardware business is no small task. But over the past few years, there has been momentum in the Distributed Management Task Force and its RedFish project to bring enhanced functionality, security and reliability to BMC management. Version 1.0 of the specification finally passed in August 2015 and already we’re seeing adoption among hardware vendors (including Supermicro!) that are anxious to meet thedemands of automation-hungry customers like Packet.
Give me REST(ful) or Give me Death
So we know why the industry went about creating RedFish -- even sys admins need modern, API-friendly tools. But what’s actually included in RedFish 1.0? What’s missing?
RedFish’s most immediate achievement is that you can interact with the BMC using good data structure, specifically JSON payloads. I can’t underscore how awesome this is for a guy that is used to parsing character strings with expect! You can browse what’s available in a typical RedFish implementation via the DMTF developer’s site here.
Other highlights include:
- Straight forward, common API - Most BMCs already come with a web UI but very few have had any API support. RedFish gives us a reusable API standard across compatible servers. And, as a developer, I can now expect to get an HTTP error code back versus an obscure error message (or worse, just a process exit code that is caught, sometimes, a few layers up in the stack!). That is huge!
- Upgrading the core tech - IPMI is old. Like 15 years old. And IPMI 2.0 wasn’t really an upgrade from 1.5 as much as a necessary patch due to security and authentication deficiencies. It’s worth keeping in mind that IPMI is implemented mostly over UDP / RCMP. RedFish fixes a lot of this, giving standard HTTPS API support and "human readable" information (using JSON) which can be more easily interpreted and automated.
It’s also important to note that higher machine density from cloud providers, multi-chassis systems and web-scale architectures are driving the demand for easier management via the BMC. Frankly the more machines you need to manage, the more you have to rely on tooling. The BMCs of yesterday simply won’t cut it anymore.
Challenges and What’s Next
Different vendors sometimes have different implementations of IPMI and VNC, for graphical console access or console over LAN. Adopting any standard industry-wide is hard, and so the same will be the case with RedFish.
One of the biggest boosts for standardizing on RedFish was Open Compute’s desire to incorporate RedFish into its hardware management projects. This is an awesome move forward for both RedFish and OCP -- in that there is significant interest and momentum in OCP, but it has lacked a usable out-of-band mechanism. RedFish could be the answer that paves the way for OCP gear in more datacenters, while helping to drive wider adoption of the RedFish standard.
With hardware innovation coming from all levels of the industry (think Facebook and the Open Compute Project), it is no surprise that cutting-edge consumers are pushing for adoption. A great example is Fidelity Labs, which is advocating for a more modular approach to RedFish. The idea is that as RedFish grows, feature bloat could become a security and usability concern. There are proposals to allow a modular approach that would allow a “core” set of functions and then modules that could be functionality driven or vendor specific. At Equinix Metal, we believe it keeping things as lean as possible, so having the ability to leave out excess features while tackling the industry specification would be ideal.
We’re currently testing RedFish on our Supermicro lab gear -- so far things are looking good and our dev team is pleased with the functional and scalability improvements we’re seeing. If all goes well, we’ll have RedFish in the production Equinix Metal Bare Metal cloud in the very near future.
Ready to kick the tires?
Sign up and get going today, or request a demo to get a tour from an expert.