Pod CPU Throttling

https://stackoverflow.com/questions/54099425/pod-cpu-throttling - Pod CPU Throttling
https://github.com/kubernetes/kubernetes/issues/67577 - CFS quotas can lead to unnecessary throttling · Issue #67577 · kubernetes/kubernetes
https://github.com/kubernetes/kubernetes/issues/51135#issuecomment-373454012 - Avoid setting CPU limits for Guaranteed pods · Issue #51135 · kubernetes/kubernetes
https://github.com/libero/reviewer/issues/1023 - Default CPU limit leads to pods getting throttled · Issue #1023 · libero/reviewer
https://medium.com/omio-engineering/cpu-limits-and-aggressive-throttling-in-kubernetes-c5b20bd8a718 - CPU limits and aggressive throttling in Kubernetes

From briefly perusing through the above links and others, there's a few conclusions I've come to:

CPU limits are more complicated and nuanced than memory limits under the hood
It seems that there was a CFS (Completely Fair Scheduler) bug in the Linux kernel. These posts describe in more detail:
https://engineering.indeedblog.com/blog/2019/12/unthrottled-fixing-cpu-limits-in-the-cloud/ - CPU Throttling - Unthrottled: Fixing CPU Limits in the Cloud
https://engineering.indeedblog.com/blog/2019/12/cpu-throttling-regression-fix/ - CPU Throttling - Unthrottled: How a Valid Fix Becomes a Regression

This has supposedly been patched into the EKS nodes

https://github.com/aws/containers-roadmap/issues/175 - Use kernel 4.18 in EKS and ECS Amazon Linux AMIs to solve CFS throttling issues. · Issue #175 · aws/containers-roadmap
Fix is in Amazon Linux 2 with Kernel 4.14.154
We're running Kernel 4.14.232+

However, even with this fix, it does seem like unexpected CPU throttling is still a common issue. e.g.

https://github.com/kubernetes/kubernetes/issues/97445 -CPU Throttling on Linux kernel 5.4.0-1029-aws · Issue #97445 · kubernetes/kubernetes

This article led to this ycombinator with quite a varied range of opinions about if CPU limits including this interesting one: <quote> The core principle most readers miss is that CPU limits are tied to CPU throttling, which is markedly different than CPU time sharing. I would argue that in 99% of cases, you truly do not need or want limits.

limits cause CPU throttling, which is like running your process in a strobe light. If your quota period is 100ms, you might only be able to make progress for 10ms out of every 100ms period, regardless of whether or not there is CPU contention, just because you've exceeded your limit.

requests -> CFS time sharing. This ensures that out of a given period of time, CPU time is scheduled fairly and according to the request as a proportion of total request (it just so happens that the Kube scheduler won't schedule such that sum[requests] > capacity, but theoretically it could because requests are truly relative when it comes to how they are represented in cgroups) </quote>

Detect pods without resources set

kubectl get po -n csi-drivers -oyaml | grep -e '^      resources: {}' -e "^    name:" | grep '^      resources: {}' -B1

Kubernetes/Resources and Limits

Pod CPU Throttling

Detect pods without resources set

Navigation menu

Kubernetes/Resources and Limits

Pod CPU Throttling

Detect pods without resources set

Navigation menu

Search