Skip to main content

The ABCs of Cloud Network Design

What to consider when designing a cloud network and common mistakes to watch out for

Headshot of Melissa Palmer
Melissa PalmerIndependent Technology Analyst
The ABCs of Cloud Network Design

Unlike in on-premises data centers, where designing a full network from scratch is rarely done, cloud networks are redesigned all the time to fit changing application and business needs. The role of the network, however, and the importance of that role remain the same: to connect users, applications and data.

Every time an organization adopts a new cloud, it designs a new network. It’s also not uncommon to design new cloud networks for every application an organization deploys. There are many different models, but cloud networks are frequently designed and redesigned.

If you are about to undertake a new cloud network design, there are a few things you can do upfront to ensure a smooth deployment, the most important of which is to take the time to design your network before you start deploying, as opposed to trying to figure things out as you go.

This article is an overview of the most important components to consider during cloud network design and common issues that can be addressed in the important design phase.

More on cloud networking:

What Drives Cloud Networking Design Decisions

Your business requirements are, of course, the main driver behind your cloud network design decisions. You may have an overall design for your cloud environment and then also additional considerations for each specific application. This doesn’t negate the importance of plotting it all out upfront, but things are inherently more flexible in cloud networking.

Creating a logical diagram is a good starting point. It gets you moving in the right direction and helps make sure you don’t miss anything. The diagram can also be a useful reference point in the future, when you deploy additional applications and workloads in the cloud.

Speaking of future additions, those should also start with a logical design. Simply listing the components of your deployment and plotting how they will connect to each other puts you in front of many issues that may surface in the future.

Be sure to include all of your application services, be they cloud services or cloud compute instances, and what cloud region they are located in. It is important to decide how these items will connect to each other, so the diagram should include the connections themselves. It is also a good idea to document which IP address ranges you plan to use for the application or deployment.

Generally, there are four big overarching considerations that affect design decisions in cloud networking, and your design should account for them all: external dependencies, DR and availability, application communication requirements and networking costs.

External Communication Dependencies

Your cloud deployment will often have external communication dependencies—especially if you are beginning to operate a hybrid cloud. Those include things like connections back to your data center and connections to other clouds or services that are outside of the cloud you are using, such as a Content Delivery Network (CDN).

Mapping these out ahead of time is helpful in the implementation phase, but also when applications don’t work as expected and your team needs to troubleshoot.

Availability and Disaster Recovery

Remember: public cloud services assume a shared responsibility model. The cloud is just someone else's data center, but that someone else isn’t responsible for designing your deployments and applications to ensure availability and creating your disaster recovery architecture.

The networking part of this brings us back to the diagram of communication dependencies you’ve hopefully created. If you are operating outside of your cloud provider, for example, you will often want redundant connections back to your own data center for greater resiliency. 

Physical location of your cloud data and applications matters a lot. You would not deploy a single data center for an entire business. Likewise, you should not deploy apps and data critical to the business in only a single cloud availability region.

Which components of your cloud network should be redundant is dictated by your applications’ availability requirements, as well as their RPO and RTO (Recovery Point and Recovery Time Objectives). Thinking in terms of "failure domains" provides a useful framework for making these decisions.

Cloud networking makes designing applications with their availability requirements in mind a lot easier than traditional on-prem data center networking, where it’s often cost-prohibitive or simply unattainable. The cloud offers a blank slate for application design.

Application Communication Requirements

Some applications in your environment may have specific communication requirements that are above the baseline of your initial cloud deployment. Those could be things like a certain CDN service or a specific load-balancing technology that isn’t available from the cloud provider. An application may also need to communicate with on-premises components or systems.

This may call for an alternative network design for some of your applications or changes to your model down the road. Being mindful of any such requirements ahead of time will prove helpful in the future, when the time comes to implement them or, again, when troubleshooting.

Cloud Networking Costs

Cloud network design decisions greatly impact your monthly cloud bill. Networking constructs within the cloud, such as VPCs, may be free, but the bandwidth they use often isn’t. The same is true for cloud services that you pay for, say load balancers or advanced SDN components, which can all drive up your bill by moving bits and bytes between A and B.

Different cloud providers have different approaches to billing for bandwidth, with many of them charging customers for moving data out of their cloud network (to a different service provider or an on-prem data center) and for moving data between availability regions.

Decisions about application availability have a big role to play here. A test application that’s only used occasionally doesn’t necessarily need redundancy across multiple regions, which would drive up cloud bandwidth costs. On the other hand, if an application is mission critical, the cost of a fault-tolerant architecture may be justified. (Here’s a good deep dive into why those cloud egress fees can be so high, what can be done about them and the tradeoffs involved in all these decisions.)

Load Balancing, a Cornerstone of Cloud Networking

Load balancing is often overlooked in the early stages of a cloud deployment, mostly because of how quick and simple it is to start getting your applications up and running in the cloud. Yet, it is a core concept in cloud networking and part of the cloud network design best practices. A load balancer ensures that workloads are distributed evenly across servers, preventing failures that result from driving a server to the limit of its capacity. Load balancers enable applications to scale in and out—the elasticity that makes cloud infrastructure services so valuable.

Load balancing in the cloud can be implemented at different levels—within a single region and/or across multiple regions—depending on what your applications call for. For example, you may need to ensure that user traffic in a certain country is directed only to servers within that country. In another example, you may be designing for maximum performance and directing every user to the instance of the application hosted on servers that are closest to them. Or, you could be directing traffic to a certain region because it’s simply less expensive to use than the others. (Organizations often use the less expensive regions as their disaster recovery sites or for general application scalability.)

There are native load balancing services cloud providers offer themselves and third-party ones. Many vendors known for on-prem data center solutions also make load load balancers for cloud environments.

Common Cloud Networking Errors

Whether on-premises or in the cloud, networking is complex, and there are countless potential issues one can run into. However, some of the avoidable ones that I tend to come across more often than others are: IP address confusion, high latency, lack of observability, things left publicly accessible on the internet and the already mentioned inter-region communication costs.

IP Address Confusion

In any cloud deployment, IP addresses should be chosen wisely and kept track of with an IP Address Management (IPAM) solution, which many public cloud providers offer natively.

This is especially important for hybrid cloud architectures, where re-using private IP address spaces in the cloud and on premises can lead to serious IP address conflicts. It is surprisingly easy to have a conflicting IP address range when an application is using both cloud and on-prem components.

This common cloud networking problem causes issues like decreased performance or packet loss; it can affect applications that are configured to use specific IP addresses or address ranges.

High Latency

Performance issues can manifest in many forms in the cloud, but one common concern is latency. Unacceptably high latency often results from poor understanding of an application’s network communication requirements.

Going back to the inter-region communication example, having a server in one cloud region and its main database in another one that’s thousands of miles away might cause serious performance issues due to high latency.

Lack of Observability

Your ability to identify causes of performance issues—including latency—ties directly to how robust your observability tools are.

Picking an observability tool for the cloud can be challenging, since there are so many options out there. A good place to start is to see whether the observability solution you use in your data center is compatible with your cloud of choice. If it is, you can gain some operational synergies by using the same software in both places.

Many major cloud providers offer their own cloud native observability tools, such as AWS Cloud Ops, Google Cloud Operations Suite or Azure Monitor. Other popular tools include Dynatrace, New Relic, Splunk and Datadog. Under the hood, many observability solutions use OpenTelemetry, the popular open source collection of observability tools, APIs and SDKs.

Making Everything Publicly Available on the Internet

One of the things cloud platforms are great at is enabling you to do things quickly. Unfortunately, this ability can also lead to mistakes in design, simply because engineers are moving too fast. One such mistake that is fairly common is leaving a component of your architecture publicly accessible over the internet. People often do this to quickly test an idea or troubleshoot a networking issue but then forget to secure the component.

This is one of the easiest ways to invite a malicious actor into your cloud environment. They can easily scan the internet for vulnerable devices and exploit a number of known vulnerabilities left unpatched. Opening up access to just one of your cloud instances in this way can be enough to compromise your whole environment.

Racking Up Inter-Region Communication Costs

Having a lot of data moving between regions is a quick way to drive up a cloud bill. Remember, inter-region bandwidth is a lot more expensive than intra-region bandwidth.

Inter-region communication costs are often racked up as a result of a simple misconfiguration, such as connecting a server on the East Coast to a storage volume on the West Coast. Folks often don’t realize they’ve done this until they get their cloud bill.

Not every cloud service is available in every region a cloud provider operates, so sometimes such inter-region connections are made to meet a business requirement (an application needs a service in region A that’s connected to a service only available in region B). It’s important to consider inter-region dependencies in the initial cloud network design phase.

In conclusion, while it is easy to start deploying an application in the cloud, taking the time for some upfront planning can go a long way, which is doubly true for your cloud network design. Understanding your applications and their requirements is critical at this stage. All that said, it’s great to know that, as those requirements change along with the needs of the business, the cloud makes it possible to change your network design as much as you need to!

Published on

11 April 2023