AKS patching costs
a true story…
The story starts with an AKS cluster, used for a dev environment... the cluster was backed up by one user node pool and the chosen sizing of the nodes was Standard_B4ms.
The defacto approach concerning the worker node pool, was whenever the quota was reached, another worker node was added to the node pool 🤑🤑🤑.
If you’re not familiar with AKS, you might want to read AKS behind the scenes and manage multiple node pools for a cluster. AKS uses VirtualMachineScaleSets for the node pool, what it means in practical terms is that the node pool CANNOT be vertically scaled (resized in place) but AKS ALLOWS you to create different node pools to match specific workloads to the nodes running in each node pool.
When it comes to addressing costs, here are some easy-to-adopt solutions and their use cases:
- Enable AKS cluster Autoscaler: when you have small and frequent spikes in the workloads e.g. demands differ between business hours and outside business hours, or between workdays and weekends. This approach can be a quick fix especially when the Pods can’t be scheduled on nodes because of resource constraints.
- Use AKS spot node pools: Since AKS allows the usage of multiple node pools within the same AKS cluster, one can create a spot node pool in parallel with the scale set node pool. This approach is well-fitted for non-production environments and low-priority compute requirements.
- Downsize the nodepool: Even though is AKS in a managed service, sometimes patching or upgrading can result in unpleasant service disruption or even worse downtime. Keep the same compute, at the same price just by downsizing the node pool e.g. from 10 B4ms to 5 B8ms.
Just by resizing the node pull, the risk to encounter service disruption is minimized improving directly the dev experience and indirectly the costs.
Important Notes
- AKS autoscaler uses Kubernetes HPA to monitor the resource demand on a cluster and automatically scales the number of workloads replicas, however, the HPA scales pods only on available nodes in the configured nodepool of the cluster. The cluster autoscaler watches for pods that can’t be scheduled on nodes because of resource constraints and automatically increases the number of nodes if needed.
- Scale set node pools consist of regular pay-as-you-go, as opposed to spot node pools which are low-cost instances that can be interrupted with short notice.
These are just a few quick ways for enhancing AKS cost efficiency, which can swiftly yield cost improvements for more advanced strategies is worth looking into the nature of the workloads and implementing node taints and tolerations, and SKU policies.