Autoscaling enables you to worry less about capacity planning and ensures the uptime of your services during load peaks. All this while you only pay what resources are needed at any given moment.
With autoscaling configured, GKE automatically adds new node(s) to your cluster if you’ve created new Pods that don’t have enough capacity to run; conversely, if a node in your cluster is underutilized and its Pods can be run on other nodes, GKE can delete the node.
Keep in mind that when resources are deleted or moved in the course of autoscaling your cluster, your services can experience some disruption. For example, if your service consists of a controller with a single replica, that replica’s Pod might be restarted on a different node if its current node is deleted. Before enabling autoscaling, ensure that your services can tolerate potential disruption or that they are designed and configured so that downscaling does not disrupt Pods that cannot be interrupted.
All nine Managed GKE clusters come with cluster autoscaling enabled. But there are a few things that have to be configured in order to automatically scale your workload.
Let us know your maximum node count
By default we won’t just scale your cluster to an infinite amount of nodes to guard you from unexpected costs. We have defined a minimum count of 3 nodes and a variable count of maximum nodes. Let us know what your preferred maximum node count is and we will set it for your cluster.
The cluster autoscaler is using this as a base to know how much capacity a node has. Without setting CPU requests the cluster autoscaler does not function. Plus this is good practice regardless if you make use of the autoscaler or not.
To scale your pods with the incoming load you can setup a Horizontal Pod Autoscaler (HPA) to scale the pods on the CPU utilization. As soon as your nodes are full this will in turn trigger the cluster autoscaler to add more nodes. The Kubernetes documentation has a great walkthrough to help you setup a HPA.