Cost optimization techniques for AWS Elastic Kubernetes Service

For the past few years microservices architecture became de-facto standard for software development. The focus shifted on making independent loosely coupled services, packed in containers (mostly Docker) and orchestrating those containers to create a distributed system.

Kubernetes, a market leader in container orchestration, is offered by many cloud providers as a managed service to mitigate the pain of setting up a Control Plane (Master) for Kubernetes. However, as with many managed cloud offerings, if not architected properly it can result in huge costs in no time.

In this in-depth blogpost – written for cloud architects and engineers - I’ll share some insights in creating a cost optimized and performant cloud architecture including a few key decisions which help to achieve your goal.


As a starting point, observability – meaning monitoring all your components - is key for providing (cost) optimized systems in any public cloud infrastructure. To facilitate pod and cluster level resource monitoring the Kubernetes metric server already provides the basic CPU and Memory metrics for your cluster pods and nodes. For external visualization, projects like Grafana powered by Prometheus offer extremely efficient monitoring. Continuously refining the resource requirements like cpu and memory prevents cluster over/under provisioning.

Never Predict, ‘Autoscale’

Your initial resource allocation is loosely based on assumptions and predictions and not (yet) on actual data. As a result this approach may lead into over or under provisioning costly cloud resources. Creating a reactive auto scalar strategy takes a long way in optimizing the costs by scaling up and down on the basis of a container resource needs.

AWS autoscaling does not work out of the box for EKS worker nodes due to absence of pod level CloudWatch alarms to Autoscale the nodes (EC2). That’s where CluterAutoscalar and HorizontalPodAutoscalar (HPA) come into the picture.

ClusterAutoscalar : This Kubernetes controller monitors the cluster for pods that are unable to run due to insufficient resources. Whenever this occurs, the Cluster Autoscaler will update the autoscaling group resulting in additional nodes in the cluster. Additionally, the Cluster Autoscaler will detect nodes that have been underutilized and reschedules pods onto other nodes. Cluster Autoscaler will then decrease the desired count for the Auto Scaling group to scale the number of nodes.

HorizontalPodAutoscalar: HPA is an inbuilt controller in Kubernetes which will scale pods based on the metrics of Kubernetes metric server.

The combination of Cluster Autoscaler and Horizontal Pod Autoscaler is an effective way to keep your cloud resources tied as close as possible to the actual utilization of the workloads running in the cluster.


Resource requests, limits and quotas

Kubernetes Pods uses underlaying node resources for CPU and memory requirement. In a public cloud infrastructure, it is extremely essential to have proper sizing of pods and restrictions on resource consumption for each pod. By not doing that a simple memory leak in one pod can force our resources to scale indefinitely and AWS will keep fulfilling the resource requirement by providing on-demand instances and – as a result - you will be paying a huge bill at the end of month. Pod requests and limits help us to optimize resource consumption!

Pod Request and Limits

If you are working with multiple namespaces in a single cluster it is wise to configure resource quota for each namespace. This approach ensures that every team has a set number of resources and they will never drain them from another team.

Besides, it is very essential to continuously optimize your container resource requests with observability tools – as discussed above - and make sure the requested resources are not wasted. The “Rule of thumb” here is to assign the minimum required resources to a pod and let auto scalars take care of any increasing demand.

Combination of On-Demand and Spot Instances

Spot instances are consistently about 60-75 % cheaper than their on-demand counterparts and by effectively leveraging a spot fleet can prove to be a gamechanger for lowering EKS worker node costs.


Some important learnings while using spot instances are:

  1. Control plane components need to be independent of any disruptions by spot instance pricing so create them in EC2 Autoscaling groups (ASG) of on-demand instances. Place application containers in a fleet of EC2 spot instances in an Auto Scaling Group. In this case the Kubernetes node selector can ensure that proper containers got placed in proper worker nodes, spot or on-demand.
  2. Maximize your spot pools (type of ec2) in spot-fleet to maintain redundancy if one instance type price reaches above bid level.
  3. AWS terminates spot instances with a lead time of 2 minutes. So, we have to provide a way to gracefully terminate the nodes after draining the pods and reschedule them on new spot instance based on a new price. The Spot interruption Handler controller is created by AWS to achieve this.


Kubernetes is here to stay and mastering the art of cost optimization and the Kubernetes ecosystem will be the one of foremost requirements for any client. By utilizing the described strategy, we can potentially reduce the costs of running a production EKS cluster up to 70%.

If you want to know more about Cost optimization techniques for AWS Elastic Kubernetes or just have questions, please let me know by sending me an email.