CoreDNS - How it solved the service discovery problem in k8s
Published on March 11, 2023
DNS, it's one of the world's largest distributed databases. There is a bunch of RFCs explaining how DNS protocol actually works and how it evolved. Think of the Kubernetes environment as a small world that needs some way to do internal service discovery. There were some difficulties integrating existing DNS servers like BIND to the k8s environment as Miek Gieben describes in this CNCF talk. So they build a new DNS server called CoreDNS which is now the default DNS server when you bootstrap a K8s cluster using kubeadm
.
The Role of CoreDNS in K8s environment
In Kubernetes networking, every Pod gets an IP address (via a pause container) and each Pod is able to communicate with each other using these pod IP addresses. This is done by the network plugin in the environment. If this is possible then why it needed a service discovery mechanism? The truth about a pod is, it is ephemeral. Due to many reasons, Pod can get killed. A node can fail and the pod will be reborn in a new Node in the cluster. Or, the pod exceeded the memory limits and got restarted. Now it got a new IP and any Pod who was communicating needs to know this new IP. CoreDNS is the solution for the dynamic nature of k8s environments.
CoreDNS implements kubernetes dns spec in a plugin called kubernetes (CoreDNS has a plugin architecture) which describes how and which DNS records are supported by CoreDNS.
All the plugins can be configured in the Corefile
. You can see its content in a ConfigMap
. Run the following command:
kubectl get cm coredns -n kube-system -o yaml
You will get an output similar to this:
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
...
...
.:53
means all the DNS requests to port 53
should execute plugins within the block. Even though you specify the plugins in a random order in the Corefile its order is defined in the plugin.cfg and built into CoreDNS binary. If you want to change it then need to build your own binary after changing the order there. I will describe more on this in another article.
CoreDNS generates on-the-fly DNS responses
Back in our cluster discussion, With this kubernetes
plugin enabled CoreDNS works as a k8s operator and watches all the Services
and Endpoints
in the cluster via k8s API Server. So now it can create all the DNS records as defined in the spec and cache. But it does not do that, instead, it creates DNS responses on the fly and caches the generated responses for a while (In a public DNS server scenario this is in hours or days). Why is that? The same reason K8s environment has a dynamic nature. Services and Pods IPs can change. You might be thinking why need Endpoints
? That is because there are two types of Services ClusterIP
and Headless
(which means the Service does not have a ClusterIP, you do that by setting the service type to None).
DNS A record sample for a ClusterIP Service
Record Format:
<service>.<ns>.svc.<zone>. <ttl> IN A <cluster-ip>
Question Example:
kubernetes.default.svc.cluster.local. IN A
Answer Example:
kubernetes.default.svc.cluster.local. 4 IN A 10.3.0.1
DNS A records for a Headless Service
Record Format:
<service>.<ns>.svc.<zone>. <ttl> IN A <endpoint-ip>
Question Example:
headless.default.svc.cluster.local. IN A
Answer Example:
headless.default.svc.cluster.local. 4 IN A 10.3.0.1
headless.default.svc.cluster.local. 4 IN A 10.3.0.2
headless.default.svc.cluster.local. 4 IN A 10.3.0.3
How a Pod knows to talk to CoreDNS for DNS resolution
The answer to this question is very clear if you exec
into one of non-coredns pods in the cluster and run:
cat /etc/resolv.conf
Sample output will be:
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local domain.local
options ndots:5
So, this nameserver IP address is the ClusterIP
address of the kube-dns
service. You can confirm if you run:
kubectl get svc kube-dns -n kube-system
which outputs:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 355d
Who did configure it? It's one of the responsibility of kubelet
. How it knew pod wants the CoreDNS as its nameserver? It was given in the pod.spec.dnsPolicy
with the value of ClusterFirst
. For, CoreDNS service pods, that is Default
. The Default
policy actually means to use the host node's DNS configuration. CoreDNS uses this policy so that it can resolve external names (If your service type was ExternalName
) using the configured upstream name servers for the node.
Now it's clear how DNS resolution happens. So a pod gets a DNS response with endpoint or cluster IP addresses depending on the service type as we described above.
DNS load balancing
When a client gets a DNS response according above there are two scenarios of load balancing occur.
Server-side load balancing
If you use a ClusterIP
service to communicate with the underlying Pods then the load balancing happens by the linux iptables
configured by kube-proxy
which is another control plane component. It routes traffic to endpoints randomly.
Client-side load balancing
If you used a Headless
service then you receive a set of endpoint IP addresses. So, it's the client Pod's responsibility to decide to which endpoint to send traffic. Usually, the CoreDNS server shuffles the endpoint IP addresses in the DNS response to be fair.
Hope I clear a few things on this title. Refer to the k8s docs for more information.
See you soon with a new post, Good Bye!!
If you like it, share it!