Skip to main content
  • Blog / 
  • Understanding the...

Understanding the Role of Cluster Autoscaler in Kubernetes

How to use the autoscaling solution to ensure applications have just the right amount of resources at any time.

Headshot of Hrittik Roy
Hrittik RoySoftware Engineer
Understanding the Role of Cluster Autoscaler in Kubernetes

How you manage scaling is a crucial part of the job of ensuring applications continue serving users as demand fluctuates. Kubernetes, the popular platform for deploying containerized applications, abstracts scaling logic away and can be a useful tool for scale management. But scaling with Kubernetes manually can be difficult and time consuming, especially when scaling calls for additional hardware resources.

Cluster Autoscaler (CA) is a solution in Kubernetes itself that automatically adjusts the size of a cluster based on current demand for resources. In this article, you’ll learn how to use Cluster Autoscaler to optimize resource utilization by scaling a cluster up or down as needed and ensure the cluster can handle incoming workloads. We’ll explain how to configure CA, what scaling components you have in your arsenal and best practices for using it. We’ll also highlight a few alternatives to CA, so you can make an informed choice of tools for managing workload scaling in Kubernetes.

More on using Kubernetes:

Using Cluster Autoscaler

Cluster Autoscaler studies resource requests from pods to make scaling decisions for each cluster: when to scale up, when to scale down and how to balance across node groups.

Scaling Up

Perhaps the most critical decision is when to scale up, to maintain performance and ensure optimal user experience. Based on pod resource requests, CA adds nodes to your cluster to maintain a standard of performance that you set.

By default, CA checks a cluster every 10 seconds for any pods in a pending state (you can adjust this using the --scan-interval flag). A pod pending means the scheduler couldn’t assign it to a node because there were not enough resources for it in the cluster.

When there are unscheduled pods, CA generates template nodes for each node group and determines if any of the unschedulable pods can be accommodated on a new node. Nodes are  provisioned with the help of the cloud provider and connected to the cluster via the control plane.

Scaling Down

If scaling up is critical for user experience, scaling down is critical for controlling infrastructure costs when demand is low. CA checks the cluster every 10 seconds for overprovisioned nodes as well.

It considers a cluster overprovisioned when CPU and memory requests by all running pods are for less than 50 percent of what is allocatable (except for default pods). If this state persists longer than 10 minutes (adjustable with the --scale-down-unneeded-time flag) and the pods can be moved to other nodes, the node is considered overprovisioned. CA drains the pods, schedules them to different nodes and terminates the overprovisioned node.

Balancing Across Node Groups

Since introduction of the --balance-similar-node-groups flag in version 0.6, CA supports balancing across node groups. When this flag is set to true, CA automatically identifies node groups with the same instance type and the same set of labels (excluding the automatically added zone label) and attempts to keep their sizes balanced.

This helps you allocate specific workloads or pods to different node groups based on their individual requirements in order to optimize resources. As CA supports autoscaling multiple node groups, you can configure it to support the same set of pending pods and balance your workloads efficiently across node groups.

Configuring Cluster Autoscaler

To utilize CA’s full potential, you have three tasks to manage:

  • Setting scaling limits
  • Customizing scaling policies
  • Monitoring and logging

Setting Scaling Limits

Scaling limits are the minimum and maximum number of nodes your cluster can have. This is an important parameter; you don’t want to exceed quotas or your budget.

For example, in Google Kubernetes Engine (GKE), you can set scaling limits by configuring the node pool’s MIN_NODES and MAX_NODES:

gcloud container clusters create CLUSTER_NAME \
    --enable-autoscaling \
    --num-nodes NUM_NODES \
    --min-nodes MIN_NODES \
    --max-nodes MAX_NODES \

Customizing Scaling Policies

You can configure your scaling policies via your manifest to achieve specific results. For example, you can enable auto discovery for node groups with tags using the --node-group-auto-discovery feature.

Monitoring and Logging

CA has metrics and livenessProbe endpoints that can be accessed on port 8085 (you can change that using the --address flag). These endpoints are located under /metrics and /health-check and provide valuable information about the state of your cluster during scaling operations.

You can use this information to create a dashboard for a clearer overview of your cluster's state. The quick visibility a dashboard provides can help you measure cost and other metrics to make sure the required nodes (say, the most expensive ones) are scaled down appropriately.

Best Practices

Here are three best practices to follow as you perform the three essential tasks outlined above.

Managing Resource Requests and Limits

In CA, pods assist with making scheduling decisions and facilitate the autoscaling process. Specify pod requests that closely match actual resource utilization and include these requests in the manifest.

Properly Labeling Nodes

Assign labels to ensure that important pods aren’t evicted and nodes aren’t deleted accidentally. For example, adding appropriate labels like safe-to-evict ensures the safety and non-eviction of a node intended for a specific function.

"": "false"

Assigning meaningful labels is also helpful for scheduling pods on specific hardware nodes that might be required to perform specific hardware-related operations.

Handling Stateful Applications

The use of Kubernetes to run secure applications is on the rise. If your secure application uses local storage for pods, constantly scaling up and down can be harmful. The best practice here is to use volume mounts and persistent storage to make sure you don’t lose business-critical data and maintain data integrity.

Alternatives to Cluster Autoscaler

CA helps you scale infrastructure to accommodate your pods once they’re modified or created and need to be scheduled. There are also other Kubernetes scaling components that can help you manage scale using metrics other than resource limits or requests.

Horizontal Pod Autoscaler

Horizontal Pod Autoscaler (HPA) adjusts the number of replicas of your deployment. This ensures that your requests are handled to serve your business logic without failure while saving costs. You can think of it as a pod-level autoscaler that can scale pods up and down, just as CA performs cluster-level scaling operations based on measuring CPU utilization.

You can use a custom resource definition object called HorizontalPodAutoscaler; a sample manifest should look something like this:

kind: HorizontalPodAutoscaler
 name: hpa
   apiVersion: apps/v1
   kind: Deployment
   name: hpa
 minReplicas: 1
 maxReplicas: 10
 targetCPUUtilizationPercentage: 30

Vertical Pod Autoscaler

Vertical Pod Autoscaler (VPA) is a set of components that can adjust resource requests and limits of containers. It can downscale pods that are over-requesting resources and upscale pods that are under-requesting resources based on their usage over time.

The custom resource definition object used to achieve this is VerticalPodAutoscaler. The sample manifest looks something like this:

kind: VerticalPodAutoscaler
  name: my-app-vpa
    apiVersion: "apps/v1"
    kind: Deployment
    name: my-app
    updateMode: "Auto"

Custom Autoscaling Solutions

A variety of other autoscaling solutions is available to meet different needs. One such project is KEDA, an event-driven autoscaler based on Kubernetes. KEDA can scale any container in Kubernetes based on the number of events that need to be processed. It works alongside other Kubernetes components like HPA.

Other solutions, like StormForge Optimize Live, can use machine learning to assist Kubernetes with autoscaling.


CA in Kubernetes is crucial for scaling nodes efficiently based on workload demands. Scaling and balancing your nodes ensures optimal resource utilization, improved performance and cost efficiency for your application.

CA is popular (despite not being available for local bootstrap engines) for a couple reasons: most cloud providers support it, and it’s capable of aggressive upscaling and conservative downscaling with fewer configurations and less complexity than VPA and HPA. It gives Kubernetes administrators the assurance that their applications are provisioned optimally and cost-effectively throughout the cluster lifecycle.

Learn more about CA in the official repository and the FAQ on GitHub.

Published on

05 July 2023


Subscribe to our newsletter

A monthly digest of the latest news, articles, and resources.