Network Policy Blocked DNS and the Pods Couldn't Tell You Why
External API calls from the backend started failing with EAI_AGAIN after adding Network Policies. The fix required supporting both CoreDNS and Azure DNS in the same egress rule.
ON THIS PAGE
Two days after deploying Network Policies to the
{"status":"unhealthy","service":"safaricom-api","error":"getaddrinfo EAI_AGAIN sandbox.safaricom.co.ke"}
The pods were running. The service was reachable. Only outbound DNS resolution to external hosts was broken.
Environment
| Component | Detail |
|---|---|
| Cluster | AKS (<cluster>), West Europe |
| Backend | Node.js, port 3000 |
| DNS policy on pods | dnsPolicy: Default (Azure DNS, 168.63.129.16) |
| CoreDNS namespace | kube-system |
What the Network Policy said
The backend’s egress rule for DNS allowed UDP 53 to the kube-system namespace only:
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53
This works when pods use dnsPolicy: ClusterFirst — which routes DNS through CoreDNS, running in kube-system. But the backend pods had dnsPolicy: Default, which bypasses CoreDNS entirely and sends DNS queries directly to the node’s DNS server — Azure’s virtual DNS at 168.63.129.16, outside the cluster.
The Network Policy allowed DNS egress to kube-system, but 168.63.129.16 is not in kube-system. Those queries were silently dropped. EAI_AGAIN is what Node.js surfaces when DNS resolution times out.
Why Default instead of ClusterFirst
dnsPolicy: Default was set intentionally in an earlier session to resolve a different DNS issue — Node.js was caching stale DNS responses from CoreDNS and failing to pick up updated service endpoints. At the time it fixed the problem. The downstream effect on the Network Policy wasn’t noticed until a day later.
The fix
Two options:
Option A: revert to ClusterFirst DNS
Change pods back to dnsPolicy: ClusterFirst and add ndots: 2 to stop Node.js from treating every hostname as a search-domain candidate:
dnsConfig:
options:
- name: ndots
value: "2"
Option B: keep Default DNS and extend the egress rules
Keep dnsPolicy: Default but add a second egress rule allowing DNS to the Azure resolver:
egress:
# CoreDNS (dnsPolicy: ClusterFirst)
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53
# Azure DNS (dnsPolicy: Default)
- to:
- ipBlock:
cidr: 168.63.129.16/32
ports:
- protocol: UDP
port: 53
I went with Option B to avoid touching the pod spec. The 168.63.129.16/32 CIDR is an Azure-reserved virtual IP — it’s the metadata/DNS endpoint for every Azure VM and is stable across all regions.
Applied the updated Network Policy:
kubectl apply -f k8s/network-policy-backend.yaml --context <cluster>
Health check immediately returned:
{"status":"healthy","service":"safaricom-api"}
The same rule in the wrong namespace
While fixing the backend, I also noticed the frontend Network Policy had the same issue — and a second problem. The frontend’s egress was restricted to kube-system for DNS and the backend pod for application traffic. But there was no ingress rule allowing the NGINX controller namespace to reach the frontend.
The NGINX Ingress controller runs in the ingress-nginx namespace. The frontend’s ingress rule was:
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
The label name: ingress-nginx wasn’t actually on the namespace — the correct label key in newer Kubernetes is kubernetes.io/metadata.name. The traffic was getting through anyway because there was a broad fallback rule, but the intent wasn’t being enforced.
Fixed the label selector and tightened the egress DNS rule to use 168.63.129.16/32 instead of the open 0.0.0.0/0 that had been used as a temporary workaround.
What to watch for
When you write a Network Policy DNS egress rule, the destination depends entirely on dnsPolicy:
dnsPolicy | DNS goes to | Network Policy needs |
|---|---|---|
ClusterFirst (default) | CoreDNS in kube-system | namespaceSelector: kube-system |
Default | Node resolver (Azure: 168.63.129.16) | ipBlock: 168.63.129.16/32 |
None | Whatever dnsConfig.nameservers says | Match accordingly |
If you change dnsPolicy on a pod after the Network Policy is already in place, DNS will silently break. EAI_AGAIN is the symptom. The pod logs won’t tell you why.
Namespace selector labels are not always what you expect. name: is a user-applied convention; kubernetes.io/metadata.name: is the system-guaranteed label on every namespace since Kubernetes 1.21. Use the latter.
Discussion