Skip to main content

How to Design Applications That Scale Out

Scaling an app by adding new instances of identical resources (as opposed to beefing up existing instances) helps maintain uninterrupted customer experience, availability and operational flexibility.

Headshot of James Walker
James WalkerSoftware Engineer
How to Design Applications That Scale Out

We call an application scalable when it can handle load variations without getting slow or failing altogether. There are some common patterns for designing apps and services to run undisturbed through gradual ramps or sudden spikes in usage.

One strategy is horizontal scaling, or automatically adding new resources to the system, such as additional containers or compute nodes. This increases availability by balancing the load between a greater pool of resources, but you need to correctly design your application to support the technique.

In this article you will learn about best practices for designing cloud applications that scale horizontally in order to improve deployment resilience.

Horizontal vs. Vertical Scalability

Software systems can be scaled in two main ways: horizontally or vertically. While the former approach is to scale “out,” or add more identical instances of the resources, the latter is to scale “up,” or add physical resources (such as CPU and memory) to the existing instances.

Horizontal Scalability

The new instances added in a horizontally scalable system can be app replicas or compute nodes. In addition to reliability, this approach enhances operational flexibility, as you can easily provision and remove instances on demand without downtime.

Horizontally scalable systems provide high availability (HA) because individual instance failures don't affect their neighbors. It requires your application to be specifically designed so multiple instances can run concurrently.

Vertical Scalability

Vertical scalability adds or resizes the physical resources available or existing instances (such as CPU, memory or network bandwidth). It does not affect your instance count, so there's no improvement to redundancy or high availability.

Setup costs are usually lower compared to horizontal availability, but downtime can be required for scaling changes. Additionally, it doesn't require any modifications to your app's architecture.

This guide focuses on horizontal scalability because it's generally the best option for web apps and services that need comparatively few resources on a per-request basis. Scaling horizontally allows you to increase capacity and availability without downtime; if you use vertical scaling, then you don't gain any extra redundancy. Nonetheless, vertical scaling can be a better fit for processes that consume a lot of physical resources.

Design Principles for Horizontal Scalability

Having learned what horizontal scalability is, it's time to examine the design patterns that let you enable it in your apps. The following techniques make your system scalable by ensuring its data structures, communication patterns and deployment methods are all compatible with the distributed computing paradigm that horizontal scaling produces.

Implement a Stateless Architecture

When you use a stateless architecture, each of your app replicas operates completely independently. The instances don't need to know anything about their neighbors or the requests they've handled, so any replica can handle any request.

Compare this with a stateful system, where information such as a user's logged-in status is stored by the service: if a user first interacts with Replica A but is subsequently directed to Replica B, then Replica B must be informed that the user has already logged in.

Making your apps stateless improves scalability, load distribution and fault tolerance, as there's no persistent data to synchronize between replicas. You can freely modify your replica counts without having to maintain a central data store that must be accessible to every replica. This simplifies configuration and reduces maintenance overheads.

Select an Appropriate Database Type

Different database types can affect the horizontal scalability of your systems based on how they store, shard and replicate data. Many apps continue to use SQL-based relational databases, but these are generally trickier to scale than document-oriented NoSQL engines that are often designed for simple distributed operations.

Structured SQL databases provide useful features such as referential integrity, table locks and transactions, but these can cause problems when the database is scaled across multiple replicas. If a table is locked for a transaction, then the lock needs to be reflected across all the replicas to prevent consistency errors from occurring.

Databases can be horizontally scaled using sharding. This allows you to divide up the data set so that each replica is responsible for maintaining its own portion. Although sharding of SQL databases is technically possible, it's not natively supported by popular engines like PostgreSQL or MySQL. Conversely, leading NoSQL systems like MongoDB have integrated sharding and therefore benefit from simpler horizontal scaling options.

Use Microservices

Selecting a microservices architecture makes it easier to embrace horizontal scalability. Instead of running your entire app as a single monolith, building microservices lets you split the system into several small components that you can independently operate and scale. For example, if you see a spike in user authentication activity, you could scale up the authentication service while leaving other low-demand components unaltered.

Monoliths can be horizontally scaled, but this often causes resource wastage. Every replica of a monolith must have enough resources to run all the functions the system provides, even ones that are infrequently used. But when you adopt microservices, you can scale just the services that are experiencing capacity problems, enabling more efficient load distribution at a lower cost.

Microservices are also an important part of high availability. A failure in one service won't affect any of your other components, improving your fault tolerance. By contrast, when a replica of a monolithic app becomes unhealthy, the availability of every component is impacted.

Adopt Asynchronous Patterns

Microservices and stateless design result in a loosely coupled system that's positioned to support horizontal scalability. However, you still need to plan how your decoupled services will communicate and interact with each other.

Using asynchronous patterns, such as message queues and event-based processing, helps maintain the decoupled architecture. Message queues allow work to be batched up as a list of operations that are pending processing. Different components can then handle the messages once capacity is available. Similarly, firing an event (such as by using a webhook) is an ideal way to notify another service that an activity has occurred without making your system wait for an immediate outcome.

Asynchronicity makes it easier to horizontally scale services by reducing blocking activity between them. For example, you could have a gateway service that issues events to initiate downstream processes, allowing those processes to be independently scaled based on the volume of events being generated.

Prioritize API-First Development

An API-first development approach is another way to make systems more horizontally scalable. It positions APIs as the fundamental unit of your software, requiring that all components use well-defined interfaces to communicate with each other.

An API is a contract between two systems or pieces of code, and an API endpoint is a function that a consumer can call with specific data to obtain a response. As long as the two components always exchange data in the agreed format, then internal changes can be made to either one of the components without affecting the other.

Restricting intercomponent communication to specially built APIs helps improve system consistency and modularity. In turn, this enhances your horizontal scalability options because you can scale out APIs individually without having to modify the consumers that call them. Using load balancers and service meshes ensures services can reach APIs via reliable network addresses that persist after scaling changes.

Build with CI/CD and Automation in Mind

Continuous integration and continuous delivery (CI/CD) pipelines are a critical part of modern DevOps. They can be used to automate your deployment processes, including scaling operations. While you don't have to use CI/CD to benefit from horizontal scalability, a pipeline-based deployment approach makes it easier to apply and revert changes.

CI/CD facilitates consistent scaling of your applications, including across multiple environments and cloud platforms. For example, you could use the same CI/CD pipeline to deploy to developer environments in your internal infrastructure, a staging instance that resides in a single cloud and your multicloud production environment that demands high availability with downtimeless updates. By using environment-level configuration within your pipelines, you can ensure each scenario always applies the correct scaling during the deployment.

CI/CD is complemented by infrastructure as code (IaC) and GitOps methodologies. These practices describe the use of declarative versioned files to define your infrastructure's state, with the actual infrastructure changes made by automated tools that consume your config files. IaC allows you to scale your app by modifying files in a Git repository, committing the change and then running your CI/CD pipeline to apply the updated replica count to your environments. If there's a problem, you can easily revert the commit and rerun the pipeline, providing enhanced operational safety.

Use Real-Time Observability

Building observability into your apps permits real-time monitoring for problems, including capacity constraints or overprovisioning. This helps enable more consistent performance and reliability as your app's utilization changes.

Good observability is also vital so you can troubleshoot any bigger issues, such as pinpointing the cause of a slowdown. Easy access to metrics, logs and traces provides a clear analytical pathway that you can follow to find why a process is running slowly, such as an API call that's impacted by network latency or an I/O-bound operation that's causing disk thrashing.

Of course, it's technically possible to scale horizontally without observability, but this is akin to flying blind. Proper observability allows you to make informed, data-driven decisions about app scalability and performance, resulting in more efficient operations that will usually incur lower costs.

Optimize Your Design for Data Caching

Correct use of caching improves the performance and scalability of your applications by allowing you to reduce the number of database fetches and network calls required. Caching frequently retrieved data in memory means it's already available when the app requires it, ensuring quick and reliable access.

This benefit also leads to scalability advantages. As discussed, stateless apps are easier to scale than stateful ones, but most real-world applications include stateful components like a database. Because a database may be trickier to scale than the other microservices in your architecture, it makes sense to lessen the load it experiences.

Placing a distributed caching system in front of a database is one of the easiest ways you can achieve this. While it can feel as though adding a cache is just another component to scale and maintain, it's much less likely to be a bottleneck than a similarly scaled database deployment. Memory caches are fast and can handle large traffic volumes even when provisioned as a relatively small instance, so by using them, you can often avoid the complexity of horizontally scaling your main stateful database service.

Include Robust Fault Tolerance

Fault tolerance is a key concern for any app running in production. It's also closely related to horizontal scalability, with the two characteristics directly influencing each other.

On the one hand, the ability to scale your service horizontally improves your fault tolerance. Using autoscaling, for example, you can automatically start new replicas of your service as load increases to prevent outages due to capacity being reached. But on the other hand, your use of horizontal scaling doesn't obviate the need to maintain other forms of fault tolerance as well.

Even distributed systems can fail, so you need to be prepared. Having hot standby systems ready to failover to (perhaps hosted using a different cloud provider or region) ensures you've always got options if a large-scale failure occurs.

Similarly, you must configure full backups for your entire service catalog, including components that benefit from horizontal scaling and replication. Creating regular backups at the infrastructure or platform level ensures your resources, data and cloud configurations will be recoverable should a disaster occur.

Evaluate Hosting Options for Horizontal Scalability

Traditional public cloud providers offer services that make horizontal scalability relatively simple, with new virtual compute nodes or containers available in seconds at massive global scale. If you're using Platform-as-a-Service products or a managed app or container engine, check if the provider offers autoscaling so you can dynamically provision extra capacity in response to changing utilization.

For applications that cannot run in a traditional public cloud—because they need reliably high performance, control of the underlying physical infrastructure and networking configuration, bare-metal compute without a virtualization layer or private connectivity that bypasses the public internet—horizontal scaling can be achieved by using dedicated cloud services. 

A dedicated cloud provider like Equinix provides bare metal infrastructure access that enables complete control over hardware and the software stack—the same level of control you have on premises but at global scale, on demand, automated and consumed via an API. It's built for secure private networking with low latency, high throughput and direct Layer 2 and 3 connections—including to external infrastructure like your own networks or all the public clouds. You are in full control of how your packets travel and where and how data egresses to the internet.


Horizontal scalability lets you increase system capacity and fault tolerance by deploying additional instances to serve your traffic. Compared to scaling up vertically, horizontal scaling offers high availability and improved operational flexibility but requires your app to support a distributed compute architecture.

In this guide, you learned how stateless components, microservices and asynchronous API-driven interaction patterns allow you to successfully scale systems horizontally while avoiding unexpected roadblocks. Selecting the right hosting partner is a crucial part of scalability too; your provider needs to support rapid resource provisioning, autoscaling and seamless cross-region transfers so you can effortlessly deploy new replicas when you need them. Try Equinix Deploy to access a global data center network that provides bare metal infrastructure, dependable, automated scalability and private interconnection with public clouds, network service providers and your own enterprise network.

Published on

28 May 2024


Subscribe to our newsletter

A monthly digest of the latest news, articles, and resources.