Kubernetes Resource Requests and Limits: The Difference Between Crashing and Thriving

Why your pods get OOM-killed, throttled, or evicted — and how to stop it

Most Kubernetes pain that gets blamed on “the cluster being flaky” is actually a resource-management problem. A pod that vanishes at 3am, a service that goes sluggish under load for no obvious reason, a node that grinds to a halt — nine times out of ten the root cause is a requests value someone copied from a tutorial and a limits value nobody thought about. So let me walk through how this actually works, because the scheduler’s behaviour is not intuitive until you’ve been burned by it.

Advertisement

These two fields look symmetric but do completely different jobs. A request is a reservation: the scheduler uses it to decide which node has room for your pod. It does not cap anything at runtime. A limit is a ceiling enforced by the kernel — exceed it and you get punished, the punishment depending on the resource.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: api
          image: registry.example.net/api:1.8.2
          resources:
            requests:
              cpu: "250m"      # reserved at schedule time
              memory: "256Mi"
            limits:
              cpu: "1000m"     # CPU is throttled, not killed
              memory: "512Mi"  # memory over this = OOM-kill

The asymmetry is the part that bites people. CPU is compressible: blow past your CPU limit and the kernel’s CFS scheduler simply throttles you — your process runs slower, but it keeps running. Memory is incompressible: blow past your memory limit and the container is killed outright with an OOMKilled status. There’s no graceful degradation. This is why an over-tight memory limit is far more dangerous than an over-tight CPU limit.

Kubernetes silently sorts every pod into one of three QoS classes based purely on how you set requests and limits, and that class decides who dies first when a node runs out of memory.

$ kubectl get pod api-7d9f -o jsonpath='{.status.qosClass}'
Burstable
  • Guaranteed — requests equal limits for every resource on every container. Last to be evicted. This is what you want for databases and anything stateful.
  • Burstable — at least one request set, but not matching limits. The common middle ground.
  • BestEffort — no requests or limits at all. First against the wall when the node is under memory pressure. Never run anything you care about as BestEffort.

When a node’s memory fills, the kubelet evicts BestEffort pods first, then Burstable pods that have exceeded their requests, and only touches Guaranteed pods as a last resort. Setting requests well is therefore a survival strategy, not just a scheduling hint.

Here is an opinion that will annoy half the people reading: in most clusters you should set CPU requests and skip CPU limits entirely. CFS throttling is brutal — it operates in 100ms accounting periods, and a bursty latency-sensitive service can get throttled even when the node has idle CPU to spare, because it exhausted its quota inside one window. The symptom is maddening: p99 latency spikes, flat CPU graphs, no obvious cause.

# the smoking gun — high throttling despite low average usage
$ kubectl exec api-7d9f -- cat /sys/fs/cgroup/cpu.stat
nr_periods 41022
nr_throttled 8137      # ~20% of periods throttled
throttled_usec 9931204

If nr_throttled is a meaningful fraction of nr_periods, your CPU limit is hurting you. Memory limits, on the other hand, you should almost always keep — they’re your protection against one leaky pod taking down a whole node. The guidance “always set both limits” is cargo-culting; treat the two resources differently because the kernel does.

Don’t pull numbers from the air. Observe real usage first.

$ kubectl top pods --containers
POD        NAME   CPU(cores)   MEMORY(bytes)
api-7d9f   api    180m         310Mi
api-2k1c   api    210m         298Mi

Set the memory request near steady-state usage plus headroom, the memory limit high enough to absorb spikes (so a transient bump doesn’t trigger an OOM-kill), and the CPU request to typical load. For anything serious, run the Vertical Pod Autoscaler in recommendation mode and let it watch for a week before you commit to values. Also set a LimitRange per namespace so a forgotten manifest can’t ship a BestEffort pod into production by accident.

Is fiddling with millicores and mebibytes worth your time? If you run anything beyond a toy cluster, absolutely — this is the single highest-leverage knob in Kubernetes reliability, and it costs nothing but attention. The mental model to keep: requests buy you a seat, limits are the bouncer, CPU gets throttled, memory gets killed, and QoS class decides who survives a bad night. Get those four facts straight and most of your mysterious 3am pages disappear. This is for anyone running real workloads; if you’re just kicking the tyres on minikube, set sensible requests and move on.

Advertisement

Related Content

Advertisement
Smarc
Written by Smarc

Founder and editor of vo.rs. A lifelong tinkerer who self-hosts far more than is sensible, hardens Linux boxes for fun, and prods the latest AI tools to see what they can really do. The how-to guides here are the notes Smarc wishes had existed the first time round.