On or Off the Cloud: a Checklist
What basic questions to ask when faced with the daunting task of an application infrastructure rethink.
As they look for ways to optimize their public-cloud spend, teams responsible for companies’ tech stacks ask themselves whether they should rethink the infrastructure strategy for some of their applications. They wonder whether they could save by running some of their workloads elsewhere. But how do you go about making that decision, often referred to as “cloud repatriation?” There is so much to consider! The answer is almost never straightforward and varies from one workload to the next. Some will have to stay in place, and some will likely be better off if moved. Where do you even start? To help, we’ve put together a checklist of the most critical areas to consider.
It All Begins With Cost
This is the big one. There is a tremendous amount of value in using public cloud services, but that value comes at a cost, which over time many have found to be staggering, and which usually leads them to start thinking about cloud repatriation in the first place.
There are multiple components to the overall cost of using cloud services, the most basic ones being of the raw-infrastructure variety. Compute, storage and network can each have a significant impact on the total. Compute usually constitutes the biggest portion of the bill. This includes the cost of the virtual machines or containers used to run the workload and sometimes also accelerators, such as GPUs or FPGAs.
Cloud storage is also a big line item, and the more data you store, the more you pay—which in most cases means it’s an expense that will continue growing indefinitely. It includes both the cost of storage capacity itself and the cost of services used to manage and access your data.
An element of cloud cost that is often overlooked is the cost of networking. This includes the cost of data transfer and the cost of any network services used to connect to a cloud or manage network traffic within the cloud. A notorious contributor in this department is data egress. Cloud providers charge customers for moving data out of their clouds and for moving it between availability zones, or even between services that are hosted in different locations. Larger cloud providers tend to charge more for egress than smaller ones, and there is a lot of nuance to how these costs are calculated from one provider to another.
Dependencies Can Make Cloud Repatriation a Nonstarter
Often the single deciding factor in whether or not a workload should or can be subjected to cloud repatriation is the extent of its integration with specific cloud services. The more dependent it is on a cloud provider’s services beyond raw cloud infrastructure, the harder it is to move it. Examples would be certain cloud-based machine learning services, serverless functions or specific database services. Moving a workload that relies on these would mean standing up equivalent capabilities on premises or redesigning the application so that it can do without them, both of which could prove impractical and cost prohibitive if not altogether infeasible.
Even if it is possible to replicate these dependencies on premises, consider whether losing the benefits of them being automatically updated, enhanced with the latest features and made easily scalable by the cloud provider is worth it.
Performance and Resource Utilization
It’s important to closely examine performance of your application and get a granular understanding of how it is using specific resources, such as CPU, memory and bandwidth. If it’s consistently close to maxing out one or two of these in a lopsided fashion, perhaps it could use a hardware configuration that’s finetuned for its specific needs. That custom configuration may or may not be available on the cloud platform you’re using.
This, by the way, could affect cost as well as performance. The fewer resources are left idle, the lower the overall infrastructure cost.
The Degree of Burstiness
Cloud environments are designed for seamless scalability and elasticity, so it’s important to consider how consistently a workload is using cloud resources. Is it humming along at the same level of capacity utilization all the time? Is it growing overall but the growth rate is fairly consistent and predictable? If the answer to either of these questions is yes, an on-prem alternative could save you a lot of money.
On the other hand, if the workload’s demand spikes in unpredictable ways or spikes predictably but dramatically (like e-commerce applications do during holiday shopping seasons or live-streaming platforms during big popular events), it is probably not a good candidate for cloud repatriation, where it can be scaled up or down as needed using cloud autoscaling tools.
How Does It All Connect?
A workload’s connectivity requirements and the network architecture that’s in place to meet them are a crucial consideration. Cloud networking provides a great degree of configuration flexibility, since it’s all software defined and changes don’t require tinkering with hardware switches and routers.
The key areas to examine are how the workload connects to other systems and services, its bandwidth requirements and the limit of tolerable latency. A high-bandwidth and low-latency application may benefit from using private network connections and being distributed to edge locations across the globe instead of being hosted in centralized cloud data centers and relying on the public internet for connectivity. On the other hand, if its bandwidth and latency requirements aren’t as stringent, the scale tilts further in the public cloud’s direction.
It’s important to map the workload’s communication dependencies. Which systems and services does it need to communicate with? These could be other applications, databases, external APIs, user interfaces and so on. If the dependencies are mostly within the same network, it might make sense to keep the workload in that network. But if it’s frequently pinging external systems, it may do so more efficiently from a network-rich data center provider.
Overall, how well is the current network architecture serving the workload’s needs? If the answer is “it could do better,” it’s worth considering an alternative.
Laws and Regulations
If you operate in heavily regulated industries, such as healthcare, finance, or government, compliance greatly influences where and how your applications and data are hosted. The same is true if you operate outside of those industries but store and process private data belonging to customers who live in countries with data sovereignty regulations. If a country requires that its citizens’ data be stored within its borders and you provide services to those citizens, you need to have infrastructure in that country, cloud or otherwise.
Industry-specific data protection and privacy requirements can vary greatly depending on the industry and the jurisdiction in question. Cloud providers offer various industry tailored compliance services—for a fee—so it’s important to consider whether cloud repatriation will make compliance more difficult and/or costly.
The Fine Print
Your cloud provider’s uptime SLA (Service Level Agreement) and their historical ability to meet it are important factors to consider. And they need to be weighed against your ability to meet the workload’s uptime requirements using whatever alternative infrastructure you may be thinking about.
It is important to thoroughly review an SLA to understand what it does and doesn’t cover, and how it aligns with your workload's uptime and reliability requirements. You should also be clear about how the cloud provider commits to compensating you in case they fail to meet the SLA. (Often, the compensation comes in the form of service credits.)
How Much of Your Stack Can You Secure?
Remember: you share responsibility for security of your cloud workloads with your cloud provider. The provider is responsible for securing the underlying infrastructure, while the customer is responsible for securing their data and applications. Moving a workload to a different data center may mean taking responsibility for securing the entire stack, unless that data center is operated by a provider that also uses the shared-responsibility model.
Is Cloud Repatriation Even Possible?
Perhaps the most important factor—and, really, one that has a bearing on all of the above—is whether or not cloud repatriation is technically and operationally feasible for a workload.
How complex would the migration be? The answer will depend on the size and nature of the workload. It could involve moving huge volumes of data, reconfiguring systems and ensuring compatibility with new environments. Data security during the migration could be challenging, and so could minimizing downtime or preventing it altogether if necessary.
If the alternative is an in-house data center or a colocation facility, you will need to ensure you have a team that knows how to deploy and operate its own infrastructure. Is the team prepared to implement a new set of operating procedures, such as adopting new backup and recovery practices, system maintenance and security management? If an application needs to deliver a low-latency user experience at a wide scale, do you have a way to deploy it in multiple locations, close to end users?
There are alternatives that make all of the above achievable without doing everything in house. Using a dedicated cloud service like Equinix Metal is a way to get all the performance and control benefits of an on-prem bare metal infrastructure while retaining the ability to deploy globally on demand, provision and manage infrastructure as code and pay as you go, like you do with typical public cloud services. This option also gives you the ability to privately interconnect your non-cloud workloads and data with public clouds so that they can be used in concert with the workloads best suited to run on a traditional hyperscale cloud platform.
The Decision
With so many complex, interlocking factors at play, each carrying major technological and staffing implications, the decision whether to move a cloud workload elsewhere can feel daunting. You have to view the potential cost savings in the context of the overall impact of the migration, including technical capabilities, scale and operational requirements of the alternative platform being considered. In other words, will the non-public-cloud infrastructure provide the same or better performance, availability and feature set for the application as the public cloud without creating an unreasonable amount of disruption for your teams and application users—all while saving enough money to make it worth the trouble of moving it?
Breaking it down into individual factors highlighted above and taking the time to thoroughly understand each of them one by one is a good way to tackle it and make it less overwhelming.