author-pic

Amila Senadheera

Tech enthusiast

Autoscaling Kubernetes Deployemts with Horizontal Pod Autoscaler


Published on July 23, 2022

When you have a web application running, the major concern that most people underestimate is that if the resources are enough to serve the expected load. Recently a government website got launched in my country and it was unresponsive due to the traffic spike it got. The performance aspect of a web application must be considered very well.

We have created a Rasberry Pi cluster in the previous blog posts. It is expected to you have that set up or a Kubernetes cluster configured from a cloud provider. I will explain the content here depending on the Raspberry Pi cluster setup.

In Kubernetes when you do a deployment it can be configured for autoscaling using HorizontalPodAutoscaler (HPA). It can take control of the number of replicas running given the resource constraints for the deployment.

Setting up metrics server

The first thing you need is to set up the metrics server which gives container resource metrics for Kubernetes built-in autoscaling pipelines. Go to kubernetes-sigs/metrics-server releases page and download the component.yaml under the latest release.

You need to add the following two lines to the spec section, which will allow the metrics server to work with self-signed certificates:

spec:
    hostNetwork: true
    containers:
    - args:
    ...
    ...
    - --kubelet-insecure-tls=true

Now run the following command to apply the resources to the cluster:

kubectl apply -f component.yaml

This will take a few minutes to complete. You can view the progress in the Kubernetes Dashboard we set up earlier.

Testing the Metrics API

First, we can check how the resource utilization in our actual nodes aka Raspberry Pi boards.

Run the following command:

kubectl top nodes

You might get the output similar to the below:

NAME     CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
master   413m         10%    903Mi           24%       
node-1   206m         5%     648Mi           17%       
node-2   207m         5%     536Mi           14%       
node-3   216m         5%     708Mi           19% 

According to the output above, we are good with the existing resources. :)

Now you can check the resource usage of the pods (Memory and CPU as a percentage of the allocated amounts). Run the following command to see them:

kubectl top pods
NAME                        CPU(cores)   MEMORY(bytes)   
dev-diary-f49dcbd5c-8hsxb   13m          6Mi             
dev-diary-f49dcbd5c-c7j9z   17m          5Mi             
dev-diary-f49dcbd5c-tfh5b   17m          6Mi

Start Horizontal Pod Autoscaler for a deployment

Create the HorizontalPodAutoscaler:

kubectl autoscale deployment dev-diary --cpu-percent=5 --min=3 --max=10

We have a dev-diary called deployment with 3 replicas. So I made the min in HPA to also 3 and max as 10 pods. min CPU percentage is set as 10% to trigger HPA easily.

To test the auto scaler next we need to generate some load. I will be using hey which is a tiny program that sends some load to a web application.

Now run the following watch command to continuously check the HPA for dev-diary deployment:

kubectl get hpa dev-diary --watch 

In another terminal, run the following command directed at your application to generate some load:

hey -c 1000 -n 2000 https://map-app.com

Above generate 2000 requests from 1000 workers concurrently.

Now you might get an output similar to the below:

NAME        REFERENCE              TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
dev-diary   Deployment/dev-diary   1%/10%    3         10        3          16s
dev-diary   Deployment/dev-diary   11%/10%   3         10        3          45s
dev-diary   Deployment/dev-diary   16%/10%   3         10        4          60s
dev-diary   Deployment/dev-diary   4%/10%    3         10        5          76s
dev-diary   Deployment/dev-diary   0%/10%    3         10        5          91s
dev-diary   Deployment/dev-diary   0%/10%    3         10        3          6m32s

We can observe that when the target percentage is passed by the load new pods are spun up by the auto scaler to achieve the desired state we declared. Now you might be thinking that's all we have to consider and we can increase the max number of pods as we get more load. But pods are running in our Nodes and all the resources should get utilized at some point. At that point, you need to vertically scale your cluster by adding new Raspberry Pi nodes to our cluster.

Hope you enjoy the post! Thanks for reading!

If you like it, share it!


© 2022, All Rights Reserved.

Made withusing Gatsby, served to your browser from a home grown Raspberry Pi cluster