How you manage scaling is a crucial part of the job of ensuring applications continue serving users as demand fluctuates. Kubernetes, the popular platform for deploying containerized applications, abstracts scaling logic away and can be a useful tool for scale management. But scaling with Kubernetes manually can be difficult and time consuming, especially when scaling calls for additional hardware resources.
Cluster Autoscaler (CA) is a solution in Kubernetes itself that automatically adjusts the size of a cluster based on current demand for resources. In this article, you’ll learn how to use Cluster Autoscaler to optimize resource utilization by scaling a cluster up or down as needed and ensure the cluster can handle incoming workloads. We’ll explain how to configure CA, what scaling components you have in your arsenal and best practices for using it. We’ll also highlight a few alternatives to CA, so you can make an informed choice of tools for managing workload scaling in Kubernetes.
More on using Kubernetes:
- Understanding Kubernetes Network Policies
- Installing and Deploying Kubernetes on Ubuntu
- Unleashing the Power of Multi-Cloud Kubernetes—a Tutorial
- So You Want to Run Kubernetes On Bare Metal
Using Cluster Autoscaler
Cluster Autoscaler studies resource requests from pods to make scaling decisions for each cluster: when to scale up, when to scale down and how to balance across node groups.
Perhaps the most critical decision is when to scale up, to maintain performance and ensure optimal user experience. Based on pod resource requests, CA adds nodes to your cluster to maintain a standard of performance that you set.
By default, CA checks a cluster every 10 seconds for any pods in a pending state (you can adjust this using the
--scan-interval flag). A pod pending means the scheduler couldn’t assign it to a node because there were not enough resources for it in the cluster.
When there are unscheduled pods, CA generates template nodes for each node group and determines if any of the unschedulable pods can be accommodated on a new node. Nodes are provisioned with the help of the cloud provider and connected to the cluster via the control plane.
If scaling up is critical for user experience, scaling down is critical for controlling infrastructure costs when demand is low. CA checks the cluster every 10 seconds for overprovisioned nodes as well.
It considers a cluster overprovisioned when CPU and memory requests by all running pods are for less than 50 percent of what is allocatable (except for default pods). If this state persists longer than 10 minutes (adjustable with the
--scale-down-unneeded-time flag) and the pods can be moved to other nodes, the node is considered overprovisioned. CA drains the pods, schedules them to different nodes and terminates the overprovisioned node.
Balancing Across Node Groups
Since introduction of the
--balance-similar-node-groups flag in version 0.6, CA supports balancing across node groups. When this flag is set to
true, CA automatically identifies node groups with the same instance type and the same set of labels (excluding the automatically added zone label) and attempts to keep their sizes balanced.
This helps you allocate specific workloads or pods to different node groups based on their individual requirements in order to optimize resources. As CA supports autoscaling multiple node groups, you can configure it to support the same set of pending pods and balance your workloads efficiently across node groups.
Configuring Cluster Autoscaler
To utilize CA’s full potential, you have three tasks to manage:
- Setting scaling limits
- Customizing scaling policies
- Monitoring and logging
Setting Scaling Limits
Scaling limits are the minimum and maximum number of nodes your cluster can have. This is an important parameter; you don’t want to exceed quotas or your budget.
For example, in Google Kubernetes Engine (GKE), you can set scaling limits by configuring the node pool’s
gcloud container clusters create CLUSTER_NAME \ --enable-autoscaling \ --num-nodes NUM_NODES \ --min-nodes MIN_NODES \ --max-nodes MAX_NODES \ --region=COMPUTE_REGION
Customizing Scaling Policies
You can configure your scaling policies via your manifest to achieve specific results. For example, you can enable auto discovery for node groups with tags using the
Monitoring and Logging
livenessProbe endpoints that can be accessed on
port 8085 (you can change that using the
--address flag). These endpoints are located under
/health-check and provide valuable information about the state of your cluster during scaling operations.
You can use this information to create a dashboard for a clearer overview of your cluster's state. The quick visibility a dashboard provides can help you measure cost and other metrics to make sure the required nodes (say, the most expensive ones) are scaled down appropriately.
Here are three best practices to follow as you perform the three essential tasks outlined above.
Managing Resource Requests and Limits
In CA, pods assist with making scheduling decisions and facilitate the autoscaling process. Specify pod requests that closely match actual resource utilization and include these requests in the manifest.
Properly Labeling Nodes
Assign labels to ensure that important pods aren’t evicted and nodes aren’t deleted accidentally. For example, adding appropriate labels like
safe-to-evict ensures the safety and non-eviction of a node intended for a specific function.
Assigning meaningful labels is also helpful for scheduling pods on specific hardware nodes that might be required to perform specific hardware-related operations.
Handling Stateful Applications
The use of Kubernetes to run secure applications is on the rise. If your secure application uses local storage for pods, constantly scaling up and down can be harmful. The best practice here is to use volume mounts and persistent storage to make sure you don’t lose business-critical data and maintain data integrity.
Alternatives to Cluster Autoscaler
CA helps you scale infrastructure to accommodate your pods once they’re modified or created and need to be scheduled. There are also other Kubernetes scaling components that can help you manage scale using metrics other than resource limits or requests.
Horizontal Pod Autoscaler
Horizontal Pod Autoscaler (HPA) adjusts the number of replicas of your deployment. This ensures that your requests are handled to serve your business logic without failure while saving costs. You can think of it as a pod-level autoscaler that can scale pods up and down, just as CA performs cluster-level scaling operations based on measuring CPU utilization.
You can use a custom resource definition object called
HorizontalPodAutoscaler; a sample manifest should look something like this:
apiVersion: autoscaling.k8s.io/v1 kind: HorizontalPodAutoscaler metadata: name: hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: hpa minReplicas: 1 maxReplicas: 10 targetCPUUtilizationPercentage: 30
Vertical Pod Autoscaler
Vertical Pod Autoscaler (VPA) is a set of components that can adjust resource requests and limits of containers. It can downscale pods that are over-requesting resources and upscale pods that are under-requesting resources based on their usage over time.
The custom resource definition object used to achieve this is
VerticalPodAutoscaler. The sample manifest looks something like this:
apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: my-app-vpa spec: targetRef: apiVersion: "apps/v1" kind: Deployment name: my-app updatePolicy: updateMode: "Auto"
Custom Autoscaling Solutions
A variety of other autoscaling solutions is available to meet different needs. One such project is KEDA, an event-driven autoscaler based on Kubernetes. KEDA can scale any container in Kubernetes based on the number of events that need to be processed. It works alongside other Kubernetes components like HPA.
Other solutions, like StormForge Optimize Live, can use machine learning to assist Kubernetes with autoscaling.
CA in Kubernetes is crucial for scaling nodes efficiently based on workload demands. Scaling and balancing your nodes ensures optimal resource utilization, improved performance and cost efficiency for your application.
CA is popular (despite not being available for local bootstrap engines) for a couple reasons: most cloud providers support it, and it’s capable of aggressive upscaling and conservative downscaling with fewer configurations and less complexity than VPA and HPA. It gives Kubernetes administrators the assurance that their applications are provisioned optimally and cost-effectively throughout the cluster lifecycle.
Ready to kick the tires?
Sign up and get going today, or request a demo to get a tour from an expert.