Cost Optimization: How We Slashed $500K Using IaC, Containers, and More
Controlling cloud infrastructure costs is critical for any company, but especially in the early stages of a startup. As you scale, cloud costs can easily spiral out of control if not properly managed. We learned this lesson early on when our AWS bill unexpectedly jumped 20% one month due to inefficient resource utilization.
After some research, we realized that using Infrastructure as Code (IaC) tools like Terraform could help abstract away the underlying cloud provider and allow us to seamlessly transition across AWS, Azure, and GCP. The key benefit was avoiding vendor lock-in – if one provider’s costs increased, we could easily switch to another without rewriting all our infrastructure code.
In this blog, We will walk through how we leveraged Terraform and related tooling to setup easy portability across cloud providers. This has been one of our best return on investment (ROI) decisions, already saving us over $500K in costs while retaining full flexibility to switch providers if needed.
Some key benefits we realized by using Terraform were:
Increased efficiency
Minimized configuration drift
Cost optimization
Collaboration
Reproducibility
Overall, adopting Terraform allowed us to implement IaC principles for provisioning and managing infrastructure efficiently. This increased our agility while reducing risks and costs associated with configuration drift.
Containerization with Kubernetes
By containerizing our application and adopting Kubernetes orchestration, we gain granular control over resource allocation and scalability. Kubernetes’ efficient resource management allows us to pack more workloads onto fewer resources, maximizing utilization and minimizing idle capacity.
Some key benefits we saw from using Kubernetes:
Dynamic resource allocation – We can specify resource requests and limits for each container, and Kubernetes will allocate cluster resources accordingly. This ensures containers get the resources they need without overprovisioning.
Horizontal scaling – We can easily scale our applications up and down by changing the number of pod replicas. Need to handle more users? Simply increase the number of pods.
Optimized hardware utilization – Kubernetes efficiently packs containers onto nodes, maximizing resource utilization. Features like bin packing ensure high density of containers per node.
Automatic bin packing – Kubernetes will automatically schedule containers based on resource requirements and availability, packing them efficiently onto available nodes.
Auto-scaling – Kubernetes allows scaling up or down based on metrics like CPU usage. This ensures we have just enough resources to meet demand.
Service discovery – Containers can easily find and talk to each other using Kubernetes services for discovery. This simplifies things and avoids manual IP address management.
Overall, Kubernetes gave us the flexibility and control we needed to maximize resource efficiency, optimize hardware utilization, and achieve scalability on demand. This in turn minimized idle capacity and helped us reduce infrastructure costs.
Cluster Autoscaler (CA)
The CA takes autoscaling a step further by managing the number of nodes in your Kubernetes cluster. If your cluster is running out of resources due to increased demand, the CA can automatically add new nodes to the cluster. Similarly, if the demand drops and there are unused nodes, the CA can remove them, reducing infrastructure costs.
The key benefits of the CA are:
Improved application availability – By automatically adding nodes when resource constrained, the CA prevents application downtime.
Optimized costs – Only running the number of nodes required to meet demand reduces infrastructure costs.
Automated management – No need for manual intervention to scale the node pool up and down.
The CA allows Kubernetes clusters to automatically scale based on actual resource usage metrics. This ensures high application availability while optimizing infrastructure costs.
Horizontal Pod Autoscaler (HPA)
The HPA is responsible for automatically adjusting the number of pods in a deployment or replica set based on the observed CPU usage or custom metrics. This means that if your application is experiencing a sudden spike in demand, the HPA can scale up the number of pods to handle the increased load. Conversely, if the demand drops, the HPA can scale down the number of pods, reducing resource consumption and cost.
The HPA operates by periodically checking the current resource usage against the target resource utilisation. If the observed utilisation deviates from the target, the HPA adjusts the number of replicas accordingly. You can also configure the HPA to scale based on custom metrics, giving you even more control over your application’s scalability.
Vertical Pod Autoscaler (VPA)
While the HPA focuses on scaling the number of pods, the VPA is all about adjusting the resource limits for individual containers within a pod. This means that if a container is running out of memory or CPU, the VPA can automatically increase the resource limits, allowing the container to continue functioning without disruption.
The VPA operates by monitoring the resource usage of containers and comparing it to the current resource limits. If the observed usage is consistently higher or lower than the limits, the VPA recommends new resource limits for the containers. In some cases, the VPA can also automatically apply these recommendations, ensuring your application always has the right amount of resources
Spot Instances: For non-critical workloads and batch processing tasks, we utilized Spot Instances to take advantage of unused capacity at a fraction of the cost. This flexible approach helped us optimize costs while maximizing resource utilization.
KEDA Implementation
The Kubernetes pods were enabled to perform Horizontal Pod Autoscaling (HPA) based on CPU and memory utilization. The customer wanted to scale the Kubernetes pods to meet the requests coming in for deployment. This is similar in requirement to the native AWS auto-scaling behavior dependent on “requests per target group”
- Event-Driven Scaling: It enables automatic scaling of Kubernetes pods based on the number of events in event sources such as message queues (e.g., AWS SQS, Kafka)
- Scaling to Zero: It allows your Kubernetes pods to scale down to zero when there are no incoming events. This capability is crucial for serverless and event-driven architectures, where resources should only be allocated when there’s actual work to be done. Scaling to zero helps save costs and resources during idle periods.
Kubernetes Kustomize
Kubernetes Kustomize allows you to define and manage variations of Kubernetes configurations for different environments, such as development, staging, and production. This flexibility ensures that resources are provisioned appropriately for each environment, avoiding over-provisioning and unnecessary costs.
For example, you can use Kustomize to create customized Kubernetes manifests for your dev, test, and prod environments. The dev manifests may specify less compute resources, while the prod ones provision more resources to handle production workloads.
Kustomize lets you reuse common configuration components across environments while varying environment-specific settings like replica counts, memory/CPU limits, etc. You don’t have to maintain separate YAML files for each environment.
Instead, you can create a common base and overlay customizations for each environment. Kustomize will then generate the final manifests for you by merging the base and overlays.
This approach streamlines configuration management and ensures consistency across environments. By tailoring configurations to the specific needs of each environment, Kustomize helps optimize resource utilization and ultimately contributes to cost management in Kubernetes deployments.
ArgoCD:
ArgoCD indeed plays a significant role in cost optimization within Kubernetes environments. By automating the deployment and management of applications, ArgoCD reduces the reliance on manual intervention, thereby minimizing human errors and the associated costs of troubleshooting and rectifying them. Additionally, ArgoCD facilitates continuous delivery, ensuring that deployments are efficient, reliable, and consistent across different environments. This automation not only saves time but also optimizes resource utilization by enabling faster rollouts and updates. Overall, ArgoCD contributes to cost optimization in Kubernetes deployments by streamlining the deployment process, reducing operational overhead, and improving overall resource efficiency.
Overall, Terraform has been a huge win in terms of flexibility, productivity, and cost management. The key takeaway for readers is to invest time in learning Infrastructure as code practices and tools like Terraform. The long-term benefits are well worth it, especially as complexity and scale increase.