A Guide to Resource Quotas and Horizontal Pod Autoscaling
Let's explore how resource quotas and horizontal pod autoscaling (HPA) are implemented in Kubernetes, along with their key concepts, configurations, and interactions.
Resource Quotas
Resource quotas in Kubernetes manage and limit resource usage within a namespace, ensuring fair resource allocation among different namespaces and preventing resource exhaustion. They are defined using a YAML configuration file and applied to a specific namespace.
Key Components of Resource Quotas
Pods: Limits the number of pods that can be created in a namespace.
CPU and Memory Requests: Specifies the minimum amount of CPU and memory resources that must be requested by the pods.
CPU and Memory Limits: Specifies the maximum amount of CPU and memory resources that the pods can use.
Persistent Volume Claims (PVCs): Limits on the number of PVCs and storage usage.
Example Resource Quota Configuration
apiVersion: v1
kind: ResourceQuota
metadata:
name: example-quota
namespace: example-namespace
spec:
hard:
pods: "10"
requests.cpu: "4"
requests.memory: "8Gi"
limits.cpu: "8"
limits.memory: "16Gi"
persistentvolumeclaims: "5"
requests.storage: "100Gi"
In this example:
The namespace
example-namespace
can have up to 10 pods.The total CPU requests can be up to 4 CPUs, and memory requests can be up to 8Gi.
The total CPU limits can be up to 8 CPUs, and memory limits can be up to 16Gi.
Up to 5 PVCs can be created, and the total storage requested can be up to 100Gi.
Horizontal Pod Autoscaler (HPA)
The Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pod replicas in a deployment, replica set, or stateful set based on observed metrics like CPU usage, memory usage, or custom metrics. This helps applications handle different loads efficiently.
Key Components of HPA
Target Resource: The deployment, replica set, or stateful set that the HPA will scale.
Metrics: Metrics used to decide scaling actions, such as CPU usage, memory usage, or custom metrics.
Scaling Policy: Defines the minimum and maximum number of replicas and the target metric value.
Example HPA Configuration
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: example-hpa
namespace: example-namespace
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: example-deployment
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
In this example:
The HPA targets a deployment named
example-deployment
.It maintains between 1 and 10 replicas.
It scales based on CPU utilization, aiming for an average CPU utilization of 50%.
Interaction Between Resource Quotas and HPA
When using resource quotas and HPA together, it's important to ensure that the resource quota limits are compatible with the scaling requirements of the HPA. Here are some considerations:
Quota Limits and Scaling: Ensure that the resource quota limits are high enough to accommodate the maximum number of replicas that the HPA may scale to. If the quota is too restrictive, the HPA may not be able to scale up as needed.
Resource Requests and Limits: Properly set the resource requests and limits for the pods. The HPA will scale based on the metrics, but the actual resource usage should stay within the bounds defined by the resource quotas.
Namespace Constraints: Resource quotas apply at the namespace level, so if multiple HPA-enabled deployments exist within the same namespace, their combined resource usage must not exceed the quota.
Preventing Resource Starvation: Properly configured resource quotas prevent any single application from consuming all the resources, ensuring fair distribution among all applications within the namespace.
Practical Example
Imagine you have a namespace production
with a resource quota and a deployment with HPA:
Resource Quota for production
Namespace
apiVersion: v1
kind: ResourceQuota
metadata:
name: production-quota
namespace: production
spec:
hard:
pods: "20"
requests.cpu: "10"
requests.memory: "20Gi"
limits.cpu: "20"
limits.memory: "40Gi"
persistentvolumeclaims: "10"
requests.storage: "200Gi"
HPA for a Deployment in production
Namespace
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: webapp-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webapp
minReplicas: 2
maxReplicas: 15
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
Summary
By using resource quotas and horizontal pod autoscaling effectively, Kubernetes ensures that applications can scale dynamically to meet demand while respecting resource limits set at the namespace level. This combination helps maintain a balanced and efficient use of cluster resources, preventing any single application from overusing resources and ensuring fair distribution among all applications.