Bạn có thắc mặc HPA sẽ scale download như thế nào không?
Here are some additional details on how the Horizontal Pod Autoscaler (HPA) handles pod terminations when scaling down:
- The HPA controller gets metric values from either the resource metrics API (for CPU/memory) or external metrics API (for custom metrics). It compares to target values.
- Scaling operations are controlled by the upscale/downscale stabilization window – the HPA will wait this long between scale operations. Defaults to 15 seconds.
- The HPA increments or decrements the replica count on the target resource (Deployment, ReplicaSet, etc) via the scale subresource.
- The controller for that resource handles terminating pods to reach the new replica count. It uses the standard Kubernetes pod termination process.
- To decide which pods to terminate, the controller will try to balance across failure zones if available, as well as terminate newest pods first.
- The pod graceful termination timeout defines the maximum time Kubernetes will wait for a pod to exit normally after a SIGTERM. Default is 30s.
- Kubernetes uses SIGKILL after the grace period to forcibly terminate pods still running. This defaults to 30s after SIGTERM.
- The HPA has cooldown periods that restrict how often it can trigger scaling operations. Defaults to 5 minutes for downscaling.
So in summary, the HPA adjusts replica counts, the pod controllers gracefully terminate pods to reach desired counts, using configurable delays and grace periods to avoid disruptions.
Bạn có thể thao khảo bài này:
Kubernetes best practices: terminating with grace