CoreDNS and Kubernetes DNS: What Actually Happens When a Pod Looks Up a Name

Following a single DNS query from a pod's resolv.conf to the answer

For something so fundamental, Kubernetes DNS is astonishingly easy to take for granted. You write http://my-service in your code, it resolves, traffic flows, everyone goes home. Then one day a pod can’t reach another service, nslookup returns SERVFAIL, and you discover you have no idea what was happening under the hood the entire time. I have been that person at 1am, and I would rather you weren’t.

So let’s follow a single DNS lookup end to end. No magic, just a chain of unremarkable Linux mechanics that happen to be wired together rather cleverly.

Advertisement

When the kubelet starts a pod, it writes a /etc/resolv.conf into it. Exec into almost any pod and you’ll see something like this:

nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

Three lines, and every one of them matters. The nameserver is the cluster DNS service — a ClusterIP that fronts the CoreDNS pods. The search list is built from the pod’s namespace, so a pod in default gets default.svc.cluster.local first. And ndots:5 is the line that trips everyone up, which we’ll come back to because it deserves its own paragraph.

When your app asks for my-service, the C resolver (glibc) doesn’t send my-service straight to the nameserver. The name has fewer than five dots, so ndots:5 tells the resolver to treat it as a relative name and try the search domains first. In order, it queries:

  • my-service.default.svc.cluster.local
  • my-service.svc.cluster.local
  • my-service.cluster.local
  • and only then my-service as an absolute name

The first one hits, so you never notice the rest. But ask for google.com (one dot, still under five) and the resolver dutifully tries google.com.default.svc.cluster.local, google.com.svc.cluster.local, and so on — all guaranteed to fail — before finally querying google.com. as written. That’s three pointless round-trips for every external lookup. This is the single most common cause of “why is my DNS slow” in Kubernetes, and the fix is often just appending a trailing dot to fully-qualified external names, or tuning ndots per-pod via dnsConfig.

The query lands on CoreDNS, which is a small Go DNS server built entirely from plugins. Its behaviour lives in a ConfigMap called coredns in kube-system, and the file it produces is the Corefile:

.:53 {
    errors
    health
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods insecure
        fallthrough in-addr.arpa ip6.arpa
        ttl 30
    }
    prometheus :9153
    forward . /etc/resolv.conf
    cache 30
    loop
    reload
    loadbalance
}

Read it top to bottom: it’s a pipeline. The kubernetes plugin is the star — it watches the API server for Services and Endpoints, holds them in memory, and answers any query under cluster.local directly. For my-service.default.svc.cluster.local it returns the Service’s ClusterIP. For a headless service it returns the individual pod IPs instead.

Anything that isn’t a cluster name falls through to forward . /etc/resolv.conf, which hands the query to the upstream resolvers the node uses — so external names go out to your real DNS. The cache plugin keeps answers for 30 seconds so CoreDNS isn’t hammered, and loadbalance shuffles A-record order for crude round-robin.

CoreDNS doesn’t talk to etcd or scrape the network. It maintains an informer against the Kubernetes API. Create a Service and within a second or two its name resolves, because CoreDNS got a watch event and updated its in-memory map. This is why DNS breaks in interesting ways when the API server is unhealthy, and why a CrashLoopBackOff on CoreDNS takes the whole cluster’s name resolution down with it. There are usually two replicas for exactly this reason.

A quick sanity check from inside the cluster, when things go sideways:

$ kubectl run -it --rm dnstest --image=busybox:1.36 --restart=Never -- \
    nslookup kubernetes.default
Server:    10.96.0.10
Address:   10.96.0.10:53
Name:      kubernetes.default.svc.cluster.local
Address:   10.96.0.1

If that resolves, your DNS plumbing is fundamentally sound and the problem is elsewhere. If it doesn’t, check the CoreDNS pods, the kube-dns Service endpoints, and whether a NetworkPolicy is quietly eating UDP 53.

If you only ever run other people’s manifests on a managed cluster, you can probably coast on faith for years. But the moment you self-host — and especially the moment you debug intermittent timeouts that turn out to be the ndots:5 tax — this knowledge pays for itself in one sitting. DNS is the layer everything else assumes works. Spend an afternoon reading your Corefile and tracing one lookup by hand, and you’ll never be the person staring blankly at SERVFAIL again. That alone is worth the price of admission.

Advertisement

Related Content

Advertisement
Smarc
Written by Smarc

Founder and editor of vo.rs. A lifelong tinkerer who self-hosts far more than is sensible, hardens Linux boxes for fun, and prods the latest AI tools to see what they can really do. The how-to guides here are the notes Smarc wishes had existed the first time round.