Kubernetes Resource Requests and Limits: The Difference Between Crashing and Thriving

Why your pods get OOM-killed, throttled, or evicted — and how to stop it

Smarc Included in

08-09-2025 1850 words 9 min read

Kubernetes Resource Requests and Limits: The Difference Between Crashing and Thriving

Contents

I once spent the better part of a Saturday chasing a pod that vanished every few hours on my home cluster. No crash logs, no panic, nothing in the application output — the container simply wasn’t there when I looked, and its restart count kept ticking up. The describe output eventually gave it away: OOMKilled, exit code 137. The pod had a 256Mi memory limit copied from a blog post, and a nightly job inside it briefly allocated 400Mi. The kernel did exactly what it was told and shot the process. Nothing was flaky. I had told Kubernetes to kill it.

Most Kubernetes pain that gets blamed on “the cluster being flaky” is actually a resource-management problem. A pod that vanishes at 3am, a service that goes sluggish under load for no obvious reason, a node that grinds to a halt — nine times out of ten the root cause is a requests value someone copied from a tutorial and a limits value nobody thought about. So let me walk through how this actually works, because the scheduler’s behaviour is not intuitive until you’ve been burned by it, and the failure modes are silent until they aren’t.

Requests are for scheduling, limits are for policing

These two fields look symmetric but do completely different jobs. A request is a reservation: the scheduler uses it to decide which node has room for your pod. It does not cap anything at runtime. A limit is a ceiling enforced by the kernel — exceed it and you get punished, the punishment depending on the resource.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: api
          image: registry.example.net/api:1.8.2
          resources:
            requests:
              cpu: "250m"      # reserved at schedule time
              memory: "256Mi"
            limits:
              cpu: "1000m"     # CPU is throttled, not killed
              memory: "512Mi"  # memory over this = OOM-kill

The asymmetry is the part that bites people. CPU is compressible: blow past your CPU limit and the kernel’s CFS scheduler simply throttles you — your process runs slower, but it keeps running. Memory is incompressible: blow past your memory limit and the container is killed outright with an OOMKilled status. There’s no graceful degradation. This is why an over-tight memory limit is far more dangerous than an over-tight CPU limit.

Under the hood on a modern cgroup v2 node, these fields map onto kernel knobs. The memory limit becomes memory.max — the hard ceiling the OOM killer enforces. The CPU limit becomes a CFS quota (cpu.max), a bucket of run-time refilled every 100ms. The request, by contrast, is mostly a scheduling number: unless you’ve enabled the MemoryQoS feature gate, the kernel has no idea your container “requested” 256Mi. It’s a promise the scheduler makes on your behalf, not a floor the kernel defends. That distinction — request as scheduler bookkeeping, limit as kernel enforcement — is the whole model, and once it clicks the rest follows.

1
2
3
4
# inspect the enforced ceilings the kernel actually sees (cgroup v2)
$ kubectl exec api-7d9f -- sh -c 'cat /sys/fs/cgroup/memory.max /sys/fs/cgroup/cpu.max'
536870912          # 512Mi, matches the manifest
100000 100000      # 1 full CPU per 100ms period

Quality of Service: the tier you didn’t know you were in

Kubernetes silently sorts every pod into one of three QoS classes based purely on how you set requests and limits, and that class decides who dies first when a node runs out of memory.

1
2
$ kubectl get pod api-7d9f -o jsonpath='{.status.qosClass}'
Burstable

Guaranteed — requests equal limits for every resource on every container. Last to be evicted. This is what you want for databases and anything stateful.
Burstable — at least one request set, but not matching limits. The common middle ground.
BestEffort — no requests or limits at all. First against the wall when the node is under memory pressure. Never run anything you care about as BestEffort.

When a node’s memory fills, the kubelet evicts BestEffort pods first, then Burstable pods that have exceeded their requests, and only touches Guaranteed pods as a last resort. Setting requests well is therefore a survival strategy, not just a scheduling hint.

There’s a subtlety worth internalising here, because it explains a lot of “why did that pod die?” confusion. Eviction (the kubelet’s doing, node-level, graceful-ish) and OOM-kill (the kernel’s doing, container-level, instant) are two different mechanisms that both react to memory pressure. The kubelet watches node memory and proactively evicts whole pods by QoS class and by how far each has overshot its request. The kernel’s OOM killer fires when a single cgroup hits its own memory.max, and it kills the offending process without consulting Kubernetes at all. A Guaranteed pod is safe from kubelet eviction but will still be OOM-killed the instant its own container exceeds its limit — because that limit is its memory.max. QoS protects you from your noisy neighbours, not from yourself.

The CPU-limits controversy

Here is an opinion that will annoy half the people reading: in most clusters you should set CPU requests and skip CPU limits entirely. CFS throttling is brutal — it operates in 100ms accounting periods, and a bursty latency-sensitive service can get throttled even when the node has idle CPU to spare, because it exhausted its quota inside one window. The symptom is maddening: p99 latency spikes, flat CPU graphs, no obvious cause.

1
2
3
4
5
# the smoking gun — high throttling despite low average usage
$ kubectl exec api-7d9f -- cat /sys/fs/cgroup/cpu.stat
nr_periods 41022
nr_throttled 8137      # ~20% of periods throttled
throttled_usec 9931204

If nr_throttled is a meaningful fraction of nr_periods, your CPU limit is hurting you. Memory limits, on the other hand, you should almost always keep — they’re your protection against one leaky pod taking down a whole node. The guidance “always set both limits” is cargo-culting; treat the two resources differently because the kernel does.

The counter-argument, which I take seriously, is that CPU limits give you predictability: a workload that can never burst is a workload whose behaviour you can reason about, and in a shared multi-tenant cluster with hostile neighbours that’s worth something. On my own kit, where I know every workload and nobody is adversarial, I’d rather have the spare cycles used than sitting idle behind a quota. Pick according to your blast radius. If you do keep CPU limits, at least set them generously — a limit of 2 or 4 cores on a service that averages 200m still catches a runaway loop without throttling normal bursts. This is the same tension that shows up when you’re deciding what crashes your cluster at 2am: the defaults are conservative because they have to survive strangers.

Right-sizing without guessing

Don’t pull numbers from the air. Observe real usage first.

1
2
3
4
$ kubectl top pods --containers
POD        NAME   CPU(cores)   MEMORY(bytes)
api-7d9f   api    180m         310Mi
api-2k1c   api    210m         298Mi

A word on units, because they trip everyone up once. Memory suffixes come in two flavours: Mi is a mebibyte (1024²) and M is a megabyte (1000²), and mixing them is how a limit ends up 5% off from what you meant. Stick to the binary Mi and Gi throughout. CPU is measured in millicores: 1000m is one full core, 250m is a quarter of one, and fractional cores are perfectly normal — a web service that spends most of its life idle might genuinely request 50m and be right.

Set the memory request near steady-state usage plus headroom, the memory limit high enough to absorb spikes (so a transient bump doesn’t trigger an OOM-kill), and the CPU request to typical load. For anything serious, run the Vertical Pod Autoscaler in recommendation mode and let it watch for a week before you commit to values. Also set a LimitRange per namespace so a forgotten manifest can’t ship a BestEffort pod into production by accident:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
apiVersion: v1
kind: LimitRange
metadata:
  name: sane-defaults
  namespace: apps
spec:
  limits:
    - type: Container
      default:            # applied if a container sets no limit
        memory: "512Mi"
      defaultRequest:     # applied if it sets no request
        cpu: "100m"
        memory: "128Mi"
      max:
        memory: "2Gi"     # nobody ships a 16Gi limit by accident

That single object turns “someone forgot to set anything” from a BestEffort landmine into a Burstable pod with sane defaults — cheap insurance for a shared namespace.

Troubleshooting: reading the wreckage

When a pod misbehaves, the diagnosis is almost always in three places, in this order.

A pod keeps restarting. Check the last termination, not the current state — kubectl describe pod shows Last State: Terminated, Reason: OOMKilled with exit code 137 when memory was the culprit. Exit 137 is 128 + 9 (SIGKILL); that 9 is the tell. If you see it, the container hit its memory limit. Raise the limit, or find the leak. Don’t just bump the limit forever — a limit you keep raising is usually a leak you keep feeding.

1
2
3
4
$ kubectl describe pod api-7d9f | grep -A3 'Last State'
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137

A service is mysteriously slow with flat CPU graphs. That’s throttling, not load. Check cpu.stat as shown above; if nr_throttled is climbing, either raise or remove the CPU limit. The classic false lead here is scaling out replicas to “add capacity” when each replica is being throttled individually — you’ve multiplied the problem, not solved it.

A pod won’t schedule and sits Pending. Run kubectl describe pod and read the events: 0/3 nodes are available: Insufficient memory means your request (not usage) doesn’t fit anywhere. This is the request doing its job — it’s refusing to overcommit the node. Either lower the request to reality or add capacity. A request wildly larger than actual usage silently wastes a chunk of every node it lands on, which is how a cluster ends up “full” at 40% real utilisation.

The verdict

Is fiddling with millicores and mebibytes worth your time? If you run anything beyond a toy cluster, absolutely — this is the single highest-leverage knob in Kubernetes reliability, and it costs nothing but attention. The mental model to keep: requests buy you a seat, limits are the bouncer, CPU gets throttled, memory gets killed, and QoS class decides who survives a bad night. Get those four facts straight and most of your mysterious 3am pages disappear. This is for anyone running real workloads; if you’re just kicking the tyres on minikube, set sensible requests and move on.

The place this bites hardest is GPU and inference work, where a single model can swallow all the memory on a node and evict everything around it — I go deeper on that in running AI inference on Kubernetes, where getting requests and limits right is the difference between a shared GPU and a node that thrashes itself to death. Get the fundamentals here first; the exotic scheduling problems are the same problem wearing a bigger hat.

Written by Smarc

Founder and editor of vo.rs. A lifelong tinkerer who self-hosts far more than is sensible, hardens Linux boxes for fun, and prods the latest AI tools to see what they can really do. The how-to guides here are the notes Smarc wishes had existed the first time round.

Tagged#kubernetes #containers #performance #reliability

Contents

Kubernetes Resource Requests and Limits: The Difference Between Crashing and Thriving

Why your pods get OOM-killed, throttled, or evicted — and how to stop it

Requests are for scheduling, limits are for policing

Quality of Service: the tier you didn’t know you were in

The CPU-limits controversy

Right-sizing without guessing

Troubleshooting: reading the wreckage

The verdict

Related Content

Container Networking Debugging: tcpdump, nsenter, and What Packets Are Actually Doing

Automated Chaos: Using Fault Injection to Build Resilience Before Your Users Notice

Why Your Kubernetes Cluster Crashes at 2 a.m. and How to Stop It

Kaniko: Building Container Images Inside Kubernetes