Four Things That Broke on the First AKS Deployment

The app — a Next.js frontend with a Node.js backend — had been running on local Docker Compose. Moving it to AKS was supposed to be the easy part. The Terraform-provisioned cluster was ready, the images were built, the manifests were written. First deployment: four things broke simultaneously.

Environment

Component	Detail
Cluster	AKS (`<cluster>`), West Europe
Frontend	Next.js 15, Node.js 18
Backend	Node.js (Express), port 3000
Database	Azure Database for MySQL Flexible Server
Ingress	NGINX Ingress Controller
Namespaces	`<project>-dev`, `<project>-prod`

Blocker 1 — Backend can’t reach MySQL

The backend pod came up but every database call timed out. The logs showed:

Error: connect ETIMEDOUT
    at TCPConnectWrap.afterConnect [as oncomplete]

The MySQL server was an Azure Managed MySQL instance — external to the cluster. The initial Network Policy had been written with a specific ipBlock for internal traffic but nothing covering the external database host.

The egress rule allowing MySQL was scoped to the internal VNET CIDR only:

egress:
  - to:
      - ipBlock:
          cidr: 10.50.0.0/16
    ports:
      - protocol: TCP
        port: 3306

Azure Database for MySQL uses a public endpoint (<project>-dev.mysql.database.azure.com) that resolves outside the VNET range. The fix was to open port 3306 without a destination IP restriction, relying on the ssl_mode: require connection parameter to enforce encryption at the database layer:

egress:
  - ports:
      - protocol: TCP
        port: 3306

That resolved the timeouts. SSL was already enforced by the MySQL server config — the connection string had ssl_mode = require — so the lack of IP restriction on the Network Policy rule didn’t open plain-text access.

Blocker 2 — Frontend hardcoded to localhost

The frontend was calling http://localhost:8000 for every API request. This worked in Docker Compose where the services shared a network namespace, but in Kubernetes each pod has its own localhost.

The issue was in the Axios client config:

// lib/client.ts
const client = axios.create({
  baseURL: process.env.NEXT_PUBLIC_API_URL || 'http://localhost:8000',
});

NEXT_PUBLIC_API_URL was set at runtime via a Kubernetes env var — but that doesn’t work with Next.js. Variables prefixed NEXT_PUBLIC_ are inlined at build time into the JavaScript bundle. By the time the container is running, the env var is irrelevant; the hardcoded fallback was what got baked in.

The fix was to pass the variable as a Docker build argument:

ARG NEXT_PUBLIC_API_URL=/api
ENV NEXT_PUBLIC_API_URL=${NEXT_PUBLIC_API_URL}

RUN npm run build

And pass it during the image build:

docker build \
  --build-arg NEXT_PUBLIC_API_URL=/api \
  -t <acr-registry>.azurecr.io/<project>-frontend:20251024-113044 \
  -f Dockerfile .

The value /api works because NGINX Ingress routes /api paths to the backend service on the same cluster. No absolute URL needed.

Blocker 3 — Ingress stripping the /api prefix

With the frontend correctly calling /api/endpoint, requests started reaching NGINX — then returning 404 from the backend.

The ingress had a rewrite-target annotation:

annotations:
  nginx.ingress.kubernetes.io/rewrite-target: /$2
spec:
  rules:
    - http:
        paths:
          - path: /api(/|$)(.*)
            pathType: Prefix

This stripped /api before forwarding to the backend. The backend expected the full path including /api — it registered its routes as /api/users, /api/msisdn/health, etc. After the rewrite, requests arrived as /users, /health, which matched nothing.

Removed the rewrite annotation and changed the path to a simple prefix match:

spec:
  rules:
    - http:
        paths:
          - path: /api
            pathType: Prefix

NGINX forwards the full path as-is. Backend routes resolved.

Blocker 4 — LimitRange rejecting the backend pod

The backend pod was staying in Pending with an event:

Error creating: pods "<project>-backend-xxx" is forbidden:
[maximum cpu usage per Container is 1500m, but limit is 2000m]

The AKS node pool had a LimitRange object in the namespace (created by the cluster provisioning templates) capping container CPU at 1500m. The initial backend manifest had:

resources:
  limits:
    cpu: 2000m
    memory: 2Gi

Adjusted to stay within the LimitRange:

resources:
  requests:
    cpu: 500m
    memory: 1Gi
  limits:
    cpu: 1400m
    memory: 2Gi

Pod scheduled. Both namespaces came up with backend 20251024-113044 and frontend 20251024-131541.

What landed

Backend reaching Azure MySQL over SSL
Frontend calling /api — no localhost, no absolute URL
Ingress routing /api/* to backend with the full path preserved
Pods within LimitRange bounds
Public IP: <public-ip>

The session took longer than expected, but each failure was discrete. None of them required changes to application logic — only to the deployment configuration.

What to watch for

NEXT_PUBLIC_ variables are a build-time concern, not a runtime one. Setting them as Kubernetes env vars silently does nothing for anything the browser renders. If the value isn’t in the Docker build args, it isn’t in the app.

LimitRange objects are namespace-scoped and often created by cluster provisioning automation. kubectl describe limitrange -n <namespace> before writing resource specs saves a scheduling failure.

Ingress rewrite-target is only appropriate if the backend doesn’t share the URL path the browser uses. If backend routes mirror what the ingress exposes, don’t rewrite.