Kubernetes Resource Requests and Limits: The Difference Between Crashing and Thriving
Why your pods get OOM-killed, throttled, or evicted — and how to stop it

Most Kubernetes pain that gets blamed on “the cluster being flaky” is actually a
resource-management problem. A pod that vanishes at 3am, a service that goes
sluggish under load for no obvious reason, a node that grinds to a halt — nine
times out of ten the root cause is a requests value someone copied from a
tutorial and a limits value nobody thought about. So let me walk through how
this actually works, because the scheduler’s behaviour is not intuitive until
you’ve been burned by it.
1 Requests are for scheduling, limits are for policing
These two fields look symmetric but do completely different jobs. A request is a reservation: the scheduler uses it to decide which node has room for your pod. It does not cap anything at runtime. A limit is a ceiling enforced by the kernel — exceed it and you get punished, the punishment depending on the resource.
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 3
template:
spec:
containers:
- name: api
image: registry.example.net/api:1.8.2
resources:
requests:
cpu: "250m" # reserved at schedule time
memory: "256Mi"
limits:
cpu: "1000m" # CPU is throttled, not killed
memory: "512Mi" # memory over this = OOM-killThe asymmetry is the part that bites people. CPU is compressible: blow past your CPU limit and the kernel’s CFS scheduler simply throttles you — your process runs slower, but it keeps running. Memory is incompressible: blow past your memory limit and the container is killed outright with an OOMKilled status. There’s no graceful degradation. This is why an over-tight memory limit is far more dangerous than an over-tight CPU limit.
2 Quality of Service: the tier you didn’t know you were in
Kubernetes silently sorts every pod into one of three QoS classes based purely on how you set requests and limits, and that class decides who dies first when a node runs out of memory.
$ kubectl get pod api-7d9f -o jsonpath='{.status.qosClass}'
Burstable- Guaranteed — requests equal limits for every resource on every container. Last to be evicted. This is what you want for databases and anything stateful.
- Burstable — at least one request set, but not matching limits. The common middle ground.
- BestEffort — no requests or limits at all. First against the wall when the node is under memory pressure. Never run anything you care about as BestEffort.
When a node’s memory fills, the kubelet evicts BestEffort pods first, then Burstable pods that have exceeded their requests, and only touches Guaranteed pods as a last resort. Setting requests well is therefore a survival strategy, not just a scheduling hint.
3 The CPU-limits controversy
Here is an opinion that will annoy half the people reading: in most clusters you should set CPU requests and skip CPU limits entirely. CFS throttling is brutal — it operates in 100ms accounting periods, and a bursty latency-sensitive service can get throttled even when the node has idle CPU to spare, because it exhausted its quota inside one window. The symptom is maddening: p99 latency spikes, flat CPU graphs, no obvious cause.
# the smoking gun — high throttling despite low average usage
$ kubectl exec api-7d9f -- cat /sys/fs/cgroup/cpu.stat
nr_periods 41022
nr_throttled 8137 # ~20% of periods throttled
throttled_usec 9931204If nr_throttled is a meaningful fraction of nr_periods, your CPU limit is
hurting you. Memory limits, on the other hand, you should almost always keep —
they’re your protection against one leaky pod taking down a whole node. The
guidance “always set both limits” is cargo-culting; treat the two resources
differently because the kernel does.
4 Right-sizing without guessing
Don’t pull numbers from the air. Observe real usage first.
$ kubectl top pods --containers
POD NAME CPU(cores) MEMORY(bytes)
api-7d9f api 180m 310Mi
api-2k1c api 210m 298MiSet the memory request near steady-state usage plus headroom, the memory limit
high enough to absorb spikes (so a transient bump doesn’t trigger an OOM-kill),
and the CPU request to typical load. For anything serious, run the Vertical Pod
Autoscaler in recommendation mode and let it watch for a week before you
commit to values. Also set a LimitRange per namespace so a forgotten manifest
can’t ship a BestEffort pod into production by accident.
5 The verdict
Is fiddling with millicores and mebibytes worth your time? If you run anything beyond a toy cluster, absolutely — this is the single highest-leverage knob in Kubernetes reliability, and it costs nothing but attention. The mental model to keep: requests buy you a seat, limits are the bouncer, CPU gets throttled, memory gets killed, and QoS class decides who survives a bad night. Get those four facts straight and most of your mysterious 3am pages disappear. This is for anyone running real workloads; if you’re just kicking the tyres on minikube, set sensible requests and move on.




