Autoscaling Kubernetes Deployemts with Horizontal Pod Autoscaler
Published on July 23, 2022
When you have a web application running, the major concern that most people underestimate is that if the resources are enough to serve the expected load. Recently a government website got launched in my country and it was unresponsive due to the traffic spike it got. The performance aspect of a web application must be considered very well.
We have created a Rasberry Pi cluster in the previous blog posts. It is expected to you have that set up or a Kubernetes cluster configured from a cloud provider. I will explain the content here depending on the Raspberry Pi cluster setup.
In Kubernetes when you do a deployment it can be configured for autoscaling using HorizontalPodAutoscaler
(HPA). It can take control of the number of replicas running given the resource constraints for the deployment.
Setting up metrics server
The first thing you need is to set up the metrics server which gives container resource metrics for Kubernetes built-in autoscaling pipelines. Go to kubernetes-sigs/metrics-server releases page and download the component.yaml
under the latest release.
You need to add the following two lines to the spec section, which will allow the metrics server to work with self-signed certificates:
spec:
hostNetwork: true
containers:
- args:
...
...
- --kubelet-insecure-tls=true
Now run the following command to apply the resources to the cluster:
kubectl apply -f component.yaml
This will take a few minutes to complete. You can view the progress in the Kubernetes Dashboard we set up earlier.
Testing the Metrics API
First, we can check how the resource utilization in our actual nodes aka Raspberry Pi boards.
Run the following command:
kubectl top nodes
You might get the output similar to the below:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master 413m 10% 903Mi 24%
node-1 206m 5% 648Mi 17%
node-2 207m 5% 536Mi 14%
node-3 216m 5% 708Mi 19%
According to the output above, we are good with the existing resources. :)
Now you can check the resource usage of the pods (Memory and CPU as a percentage of the allocated amounts). Run the following command to see them:
kubectl top pods
NAME CPU(cores) MEMORY(bytes)
dev-diary-f49dcbd5c-8hsxb 13m 6Mi
dev-diary-f49dcbd5c-c7j9z 17m 5Mi
dev-diary-f49dcbd5c-tfh5b 17m 6Mi
Start Horizontal Pod Autoscaler for a deployment
Create the HorizontalPodAutoscaler
:
kubectl autoscale deployment dev-diary --cpu-percent=5 --min=3 --max=10
We have a dev-diary called deployment with 3 replicas. So I made the min in HPA to also 3 and max as 10 pods. min CPU percentage is set as 10% to trigger HPA easily.
To test the auto scaler next we need to generate some load. I will be using hey which is a tiny program that sends some load to a web application.
Now run the following watch command to continuously check the HPA for dev-diary deployment:
kubectl get hpa dev-diary --watch
In another terminal, run the following command directed at your application to generate some load:
hey -c 1000 -n 2000 https://map-app.com
Above generate 2000 requests from 1000 workers concurrently.
Now you might get an output similar to the below:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
dev-diary Deployment/dev-diary 1%/10% 3 10 3 16s
dev-diary Deployment/dev-diary 11%/10% 3 10 3 45s
dev-diary Deployment/dev-diary 16%/10% 3 10 4 60s
dev-diary Deployment/dev-diary 4%/10% 3 10 5 76s
dev-diary Deployment/dev-diary 0%/10% 3 10 5 91s
dev-diary Deployment/dev-diary 0%/10% 3 10 3 6m32s
We can observe that when the target percentage is passed by the load new pods are spun up by the auto scaler to achieve the desired state we declared. Now you might be thinking that's all we have to consider and we can increase the max number of pods as we get more load. But pods are running in our Nodes and all the resources should get utilized at some point. At that point, you need to vertically scale your cluster by adding new Raspberry Pi nodes to our cluster.
Hope you enjoy the post! Thanks for reading!
If you like it, share it!