Version: 1.0
Last Updated: April 2026
Target Audience: DevOps Engineers, SREs, Platform Engineers
- Overview
- Horizontal Pod Autoscaling (HPA)
- Vertical Pod Autoscaling (VPA)
- Load Balancer Integration
- GPU-Aware Scaling
- Best Practices
- Troubleshooting
ThemisDB supports multiple auto-scaling strategies to optimize resource utilization and maintain performance under varying loads.
| Strategy | Use Case | Pros | Cons |
|---|---|---|---|
| HPA | Variable request load | Handles traffic spikes, cost-effective | Slower for GPU workloads |
| VPA | Optimize resource allocation | Right-sizes pods automatically | Requires pod restart |
| Manual | Predictable workloads | Full control, no overhead | Requires manual management |
| Scheduled | Known traffic patterns | Proactive scaling | Less flexible |
- Kubernetes 1.23+
- Metrics Server installed
- Prometheus (for custom metrics)
- NVIDIA Device Plugin (for GPU workloads)
Install Metrics Server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yamlDeploy ThemisDB with HPA (Helm):
helm install themisdb ./helm/themisdb \
--set autoscaling.enabled=true \
--set autoscaling.minReplicas=2 \
--set autoscaling.maxReplicas=10 \
--set autoscaling.targetCPUUtilizationPercentage=70Deploy ThemisDB with HPA (kubectl):
# Apply deployment
kubectl apply -f deploy/kubernetes/examples/themisdb-cluster.yaml
# Apply HPA
kubectl apply -f deploy/kubernetes/examples/hpa-basic.yaml# hpa-basic.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: themisdb-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: themisdb
minReplicas: 2
maxReplicas: 10
metrics:
# CPU-based scaling
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
# Memory-based scaling
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # 5 min
policies:
- type: Percent
value: 50 # Remove max 50% of pods
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0 # Immediate
policies:
- type: Percent
value: 100 # Double pods
periodSeconds: 15Prerequisites:
- Prometheus Adapter installed
- Prometheus scraping ThemisDB metrics
Install Prometheus Adapter:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter \
--set prometheus.url=http://prometheus-server.monitoring.svcConfigure Custom Metrics HPA:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: themisdb-custom-metrics-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: themisdb
minReplicas: 2
maxReplicas: 20
metrics:
# Scale based on requests per second
- type: Pods
pods:
metric:
name: themisdb_requests_per_second
target:
type: AverageValue
averageValue: "1000"
# Scale based on queue depth
- type: Pods
pods:
metric:
name: themisdb_queue_depth
target:
type: AverageValue
averageValue: "50"
# Scale based on P95 latency
- type: Pods
pods:
metric:
name: themisdb_p95_latency_milliseconds
target:
type: AverageValue
averageValue: "100" # Scale if P95 > 100ms# Check HPA status
kubectl get hpa themisdb-hpa
# Watch HPA in real-time
kubectl get hpa themisdb-hpa --watch
# Detailed HPA information
kubectl describe hpa themisdb-hpa
# View HPA events
kubectl get events --field-selector involvedObject.name=themisdb-hpaExpected Output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
themisdb-hpa Deployment/themisdb 45%/70%, 60%/80% 2 10 3
Install VPA:
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.shDeploy VPA for ThemisDB (Helm):
helm install themisdb ./helm/themisdb \
--set vpa.enabled=true \
--set vpa.updateMode=AutoDeploy VPA for ThemisDB (kubectl):
kubectl apply -f deploy/kubernetes/examples/vpa.yaml# vpa.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: themisdb-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: themisdb
updatePolicy:
updateMode: "Auto" # Options: Off, Initial, Recreate, Auto
resourcePolicy:
containerPolicies:
- containerName: themisdb
minAllowed:
cpu: 500m
memory: 1Gi
maxAllowed:
cpu: 8
memory: 32Gi
controlledResources: ["cpu", "memory"]
controlledValues: RequestsAndLimits| Mode | Description | When to Use |
|---|---|---|
| Off | Only recommendations, no updates | Testing VPA recommendations |
| Initial | Apply recommendations on pod creation only | Conservative approach |
| Recreate | Evict and recreate pods with new resources | Acceptable downtime |
| Auto | Like Recreate, but respects PDBs | Production (with PDB) |
# View VPA recommendations
kubectl describe vpa themisdb-vpa
# Get recommendations in JSON
kubectl get vpa themisdb-vpa -o jsonpath='{.status.recommendation}'Example Output:
status:
recommendation:
containerRecommendations:
- containerName: themisdb
lowerBound:
cpu: 1
memory: 2Gi
target:
cpu: 2
memory: 4Gi
upperBound:
cpu: 4
memory: 8GiInstall NGINX Ingress Controller:
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install nginx-ingress ingress-nginx/ingress-nginx \
--set controller.metrics.enabled=true \
--set controller.podAnnotations."prometheus\.io/scrape"=trueDeploy with Load Balancer:
kubectl apply -f deploy/kubernetes/examples/load-balancer.yamlConfiguration:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: themisdb-ingress
annotations:
# Load balancing algorithm
nginx.ingress.kubernetes.io/load-balance: "least_conn"
# Session affinity
nginx.ingress.kubernetes.io/affinity: "cookie"
nginx.ingress.kubernetes.io/session-cookie-name: "themisdb-session"
# Rate limiting
nginx.ingress.kubernetes.io/limit-rps: "100"
nginx.ingress.kubernetes.io/limit-connections: "50"
spec:
ingressClassName: nginx
rules:
- host: themisdb.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: themisdb
port:
number: 8080AWS Network Load Balancer:
apiVersion: v1
kind: Service
metadata:
name: themisdb-lb
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 8080
selector:
app: themisdbGCP Load Balancer:
apiVersion: v1
kind: Service
metadata:
name: themisdb-lb
annotations:
cloud.google.com/load-balancer-type: "Internal"
spec:
type: LoadBalancer
loadBalancerIP: 10.0.0.100
ports:
- port: 80
targetPort: 8080
selector:
app: themisdb- GPUs are expensive and scarce
- GPU pod startup time is significant (1-2 minutes)
- GPU memory is fixed per device
- GPU workloads benefit from batching
Use a combination of:
- Request queue monitoring - Scale before saturation
- Conservative scale-down - Avoid thrashing
- Pod Disruption Budgets - Maintain availability
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: themisdb-gpu-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: themisdb-gpu
minReplicas: 1
maxReplicas: 5 # Limited by GPU availability
metrics:
# GPU utilization
- type: Pods
pods:
metric:
name: themisdb_gpu_utilization_percent
target:
type: AverageValue
averageValue: "85"
# Request queue depth (leading indicator)
- type: Pods
pods:
metric:
name: themisdb_request_queue_depth
target:
type: AverageValue
averageValue: "20"
behavior:
scaleDown:
stabilizationWindowSeconds: 600 # 10 min (conservative)
policies:
- type: Pods
value: 1 # Remove only 1 GPU pod at a time
periodSeconds: 120
scaleUp:
stabilizationWindowSeconds: 30 # Quick scale up
policies:
- type: Pods
value: 1 # Add 1 GPU pod at a time
periodSeconds: 30apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: themisdb-pdb
spec:
minAvailable: 1 # Keep at least 1 pod running
selector:
matchLabels:
app: themisdbGKE Autopilot / EKS with Karpenter:
# Karpenter Provisioner for GPU nodes
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: gpu-provisioner
spec:
requirements:
- key: node.kubernetes.io/instance-type
operator: In
values: ["p3.2xlarge", "p3.8xlarge"] # AWS GPU instances
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
limits:
resources:
nvidia.com/gpu: 32 # Max 32 GPUs
ttlSecondsAfterEmpty: 600 # Wait 10 min before removing empty nodes-
Start Conservative
- Begin with higher min replicas
- Use longer stabilization windows
- Monitor for a week before adjusting
-
Set Appropriate Thresholds
- CPU: 60-70% for stateless apps
- Memory: 70-80% (avoid OOM)
- Custom metrics: Based on SLA targets
-
Use Multiple Metrics
- Combine CPU/memory with custom metrics
- Use leading indicators (queue depth) for faster response
-
Implement Pod Disruption Budgets
- Prevent too many pods from being evicted
- Maintain service availability during scaling
Scale-Up:
- Be aggressive (quick response to increased load)
- Use short stabilization windows (0-30s)
- Allow large percentage increases (100%+)
Scale-Down:
- Be conservative (avoid thrashing)
- Use long stabilization windows (5-10 min)
- Limit rate of decrease (25-50%)
resources:
requests:
cpu: "1" # HPA uses this for % calculation
memory: "2Gi"
limits:
cpu: "2" # Allow bursting
memory: "4Gi" # Prevent OOMGuidelines:
- Set requests based on typical usage
- Set limits 1.5-2x requests
- VPA will adjust these over time
Check metrics availability:
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
kubectl top nodes
kubectl top pods -n themisdbCheck HPA status:
kubectl describe hpa themisdb-hpa
# Look for errors in conditions:
# - AbleToScale: False - HPA can't scale
# - ScalingActive: False - Metrics not available
# - ScalingLimited: True - At min/max replicasCommon Issues:
-
Metrics Server not installed
kubectl get deployment metrics-server -n kube-system
-
Invalid resource requests
- HPA requires CPU/memory requests to be set
- Check pod spec for resource requests
-
Custom metrics not available
kubectl get apiservice v1beta1.custom.metrics.k8s.io kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1
Symptoms:
- Frequent scale up/down cycles
- Pods constantly being created/terminated
Solutions:
-
Increase stabilization windows:
behavior: scaleDown: stabilizationWindowSeconds: 600 # Increase to 10 min
-
Adjust thresholds:
- Lower scale-up threshold
- Raise scale-down threshold
- Create hysteresis
-
Use min replicas buffer:
- Set minReplicas higher than typical load
- Reduces frequency of scaling events
Check GPU availability:
kubectl describe nodes | grep -A 5 "nvidia.com/gpu"Check resource requests:
resources:
limits:
nvidia.com/gpu: 1Check node taints:
kubectl get nodes -o json | jq '.items[].spec.taints'# Current replicas
kube_deployment_status_replicas{deployment="themisdb"}
# Desired replicas (HPA target)
kube_hpa_status_desired_replicas{hpa="themisdb-hpa"}
# Scaling events
rate(kube_hpa_status_condition{condition="ScalingLimited"}[5m])
# Resource utilization vs target
kube_hpa_status_current_metrics_average_utilization /
kube_hpa_spec_target_metric
Create dashboard with panels:
- Current vs Desired Replicas
- Resource Utilization (CPU/Memory)
- Custom Metrics Trends
- Scaling Events Timeline
- Cost Impact (replicas × cost per pod)
# 1. Deploy ThemisDB
helm install themisdb ./helm/themisdb \
--set autoscaling.enabled=true \
--set autoscaling.minReplicas=2 \
--set autoscaling.maxReplicas=10
# 2. Verify deployment
kubectl get deployment themisdb
kubectl get hpa themisdb
# 3. Generate load to test scaling
kubectl run -it --rm load-generator --image=busybox --restart=Never -- /bin/sh
# Inside pod:
while true; do wget -q -O- http://themisdb:8080/v1/inference; done
# 4. Watch scaling in action
kubectl get hpa themisdb --watch
# 5. Check scaling events
kubectl get events --sort-by='.lastTimestamp' | grep themisdb-hpaDocument Version: 1.0
Last Updated: April 2026
Next Review: April 2026