[CoreDNS] How to improve the Coredns performance.

Contents

NodeLocal DNSCache

NodeLocal DNSCache enhances Kubernetes cluster DNS performance by deploying a DNS caching agent on each node as a DaemonSet. This setup allows Pods to query a local DNS cache on the same node, reducing latency and avoiding potential bottlenecks associated with centralized DNS services.

curl -sL https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml \
  | sed -e 's/__PILLAR__DNS__DOMAIN__/cluster.local/g' \
  | sed -e "s/__PILLAR__DNS__SERVER__/$(kubectl get service --namespace kube-system kube-dns -o jsonpath='{.spec.clusterIP}')/g" \
  | sed -e 's/__PILLAR__LOCAL__DNS__/169.254.20.10/g' \
  | kubectl apply -f -

PILLAR__DNS__DOMAIN is “cluster.local” by default.
PILLAR__LOCAL__DNS is the local listen IP address chosen for NodeLocal DNSCache.

Note: The local listen IP address for NodeLocal DNSCache can be any address that can be guaranteed to not collide with any existing IP in your cluster. It’s recommended to use an address with a local scope, for example, from the ‘link-local’ range ‘169.254.0.0/16’ for IPv4 or from the ‘Unique Local Address’ range in IPv6 ‘fd00::/8’.

Use a Link-Local IP Address

It is highly recommended to use a link-local IP address from the reserved range 169.254.0.0/16. These addresses:

Are valid only on the local node.
Cannot be routed externally or across nodes.
Are ideal for ensuring traffic is confined to the node itself.

Examples of valid link-local IPs:

169.254.20.10 (default)
169.254.1.1
Any other address in the 169.254.0.0/16 range.

Để đánh giá được node-local-dns bạn có thể quan sát metrics:

curl node-local-dns.kube-system:9253/metrics

# HELP coredns_build_info A metric with a constant '1' value labeled by version, revision, and goversion from which CoreDNS was built.
# TYPE coredns_build_info gauge
coredns_build_info{goversion="go1.22.3",revision="",version="1.11.3"} 1
# HELP coredns_cache_entries The number of elements in the cache.
# TYPE coredns_cache_entries gauge
coredns_cache_entries{server="dns://169.254.20.10:53",type="denial",view="",zones="."} 1
coredns_cache_entries{server="dns://169.254.20.10:53",type="denial",view="",zones="cluster.local."} 1
coredns_cache_entries{server="dns://169.254.20.10:53",type="denial",view="",zones="in-addr.arpa."} 1
coredns_cache_entries{server="dns://169.254.20.10:53",type="denial",view="",zones="ip6.arpa."} 1
coredns_cache_entries{server="dns://169.254.20.10:53",type="success",view="",zones="."} 0
coredns_cache_entries{server="dns://169.254.20.10:53",type="success",view="",zones="cluster.local."} 0
coredns_cache_entries{server="dns://169.254.20.10:53",type="success",view="",zones="in-addr.arpa."} 0

Cluster Proportional Autoscaler

Đây là 1 work load được sử dụng riêng cho việc scale CoreDns

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: kube-dns-autoscaler-dev
  namespace: argocd
spec:
  destination:
    namespace: kube-system
    name: 'arn:aws:eks:us-west-2:043701111869:cluster/dev-mdcl-mdaas-engines'
  project: meta-structure
  source:
    repoURL: https://kubernetes-sigs.github.io/cluster-proportional-autoscaler
    targetRevision: "1.1.0"
    chart: cluster-proportional-autoscaler
    helm:
      values: |
        fullnameOverride: "kube-dns-autoscaler"
        nodeSelector: 
          kubernetes.io/os: linux
        resources:
          requests:
            cpu: 20m
            memory: 100Mi
        config:
          linear:
            coresPerReplica: 100
            nodesPerReplica: 8
            min: 2
            max: 20
            preventSinglePointFailure: true
            includeUnschedulableNodes: true
        options:
          namespace: kube-system
          target: "deployment/coredns"

Helm Chart:
- We’re using the Cluster Proportional Autoscaler Helm chart from the official repository: https://kubernetes-sigs.github.io/cluster-proportional-autoscaler
- chart: cluster-proportional-autoscaler
- The chart version is pinned to 1.1.0.
Custom Values:
- fullnameOverride: The autoscaler is named kube-dns-autoscaler.
- nodeSelector: Ensures the autoscaler runs only on Linux nodes.
- resources: Resource requests are set to 20m CPU and 100Mi memory.
- config.linear:
  - coresPerReplica: 1 replica per 100 cores.
  - nodesPerReplica: 1 replica per 8 nodes.
  - min: Minimum of 2 replicas.
  - max: Maximum of 20 replicas.
  - preventSinglePointFailure: Ensures at least 2 replicas are always running. (Nó sẽ ngăn chặn chỉ có 1 pod coredns nếu con đó ngỏm thì cả cluster ngỏm nên nó sẽ luôn là 2)
  - includeUnschedulableNodes: Includes unschedulable nodes in scaling calculations. (nó sẽ đếm luôn các node Unschedude và các node bình thường để thực khi tính toán tổng số node của cluster)
- options:
  - The autoscaler targets the coredns deployment in the kube-system namespace.

Auto Scale CoreDNS by EKS – AWS Configuration.

Trong khi bạn provision 1 eks cluster bằng terraform bạn có thể enable autoscale coredns trong cluster_addons

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.31"

  cluster_name    = "example"
  cluster_version = "1.31"

  # Optional
  cluster_endpoint_public_access = true

  # Optional: Adds the current caller identity as an administrator via cluster access entry
  enable_cluster_creator_admin_permissions = true

  cluster_compute_config = {
    enabled    = true
    node_pools = ["general-purpose"]
  }

  vpc_id     = "vpc-1234556abcdef"
  subnet_ids = ["subnet-abcde012", "subnet-bcde012a", "subnet-fghi345a"]

  tags = {
    Environment = "dev"
    Terraform   = "true"
  }

  cluster_addons = {
    coredns = {
      configuration_values = jsonencode({
        autoScaling = {
          enabled     = true
          minReplicas = 2    # Optional: Set the minimum number of replicas
          maxReplicas = 10   # Optional: Set the maximum number of replicas
        }
      })
    }
  }
}

và cấu hình sẽ được apply như sau: