Four Things That Broke on the First AKS Deployment
Deployed a Node.js/Next.js app to AKS for the first time and hit MySQL timeouts, a hardcoded localhost URL, an ingress rewrite stripping the API prefix, and a LimitRange wall — all in the same session.
ON THIS PAGE
The app — a Next.js frontend with a Node.js backend — had been running on local Docker Compose. Moving it to AKS was supposed to be the easy part. The Terraform-provisioned cluster was ready, the images were built, the manifests were written. First deployment: four things broke simultaneously.
Environment
| Component | Detail |
|---|---|
| Cluster | AKS (<cluster>), West Europe |
| Frontend | Next.js 15, Node.js 18 |
| Backend | Node.js (Express), port 3000 |
| Database | Azure Database for MySQL Flexible Server |
| Ingress | NGINX Ingress Controller |
| Namespaces | <project>-dev, <project>-prod |
Blocker 1 — Backend can’t reach MySQL
The backend pod came up but every database call timed out. The logs showed:
Error: connect ETIMEDOUT
at TCPConnectWrap.afterConnect [as oncomplete]
The MySQL server was an Azure Managed MySQL instance — external to the cluster. The initial Network Policy had been written with a specific ipBlock for internal traffic but nothing covering the external database host.
The egress rule allowing MySQL was scoped to the internal VNET CIDR only:
egress:
- to:
- ipBlock:
cidr: 10.50.0.0/16
ports:
- protocol: TCP
port: 3306
Azure Database for MySQL uses a public endpoint (<project>-dev.mysql.database.azure.com) that resolves outside the VNET range. The fix was to open port 3306 without a destination IP restriction, relying on the ssl_mode: require connection parameter to enforce encryption at the database layer:
egress:
- ports:
- protocol: TCP
port: 3306
That resolved the timeouts. SSL was already enforced by the MySQL server config — the connection string had ssl_mode = require — so the lack of IP restriction on the Network Policy rule didn’t open plain-text access.
Blocker 2 — Frontend hardcoded to localhost
The frontend was calling http://localhost:8000 for every API request. This worked in Docker Compose where the services shared a network namespace, but in Kubernetes each pod has its own localhost.
The issue was in the Axios client config:
// lib/client.ts
const client = axios.create({
baseURL: process.env.NEXT_PUBLIC_API_URL || 'http://localhost:8000',
});
NEXT_PUBLIC_API_URL was set at runtime via a Kubernetes env var — but that doesn’t work with Next.js. Variables prefixed NEXT_PUBLIC_ are inlined at build time into the JavaScript bundle. By the time the container is running, the env var is irrelevant; the hardcoded fallback was what got baked in.
The fix was to pass the variable as a Docker build argument:
ARG NEXT_PUBLIC_API_URL=/api
ENV NEXT_PUBLIC_API_URL=${NEXT_PUBLIC_API_URL}
RUN npm run build
And pass it during the image build:
docker build \
--build-arg NEXT_PUBLIC_API_URL=/api \
-t <acr-registry>.azurecr.io/<project>-frontend:20251024-113044 \
-f Dockerfile .
The value /api works because NGINX Ingress routes /api paths to the backend service on the same cluster. No absolute URL needed.
Blocker 3 — Ingress stripping the /api prefix
With the frontend correctly calling /api/endpoint, requests started reaching NGINX — then returning 404 from the backend.
The ingress had a rewrite-target annotation:
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /$2
spec:
rules:
- http:
paths:
- path: /api(/|$)(.*)
pathType: Prefix
This stripped /api before forwarding to the backend. The backend expected the full path including /api — it registered its routes as /api/users, /api/msisdn/health, etc. After the rewrite, requests arrived as /users, /health, which matched nothing.
Removed the rewrite annotation and changed the path to a simple prefix match:
spec:
rules:
- http:
paths:
- path: /api
pathType: Prefix
NGINX forwards the full path as-is. Backend routes resolved.
Blocker 4 — LimitRange rejecting the backend pod
The backend pod was staying in Pending with an event:
Error creating: pods "<project>-backend-xxx" is forbidden:
[maximum cpu usage per Container is 1500m, but limit is 2000m]
The AKS node pool had a LimitRange object in the namespace (created by the cluster provisioning templates) capping container CPU at 1500m. The initial backend manifest had:
resources:
limits:
cpu: 2000m
memory: 2Gi
Adjusted to stay within the LimitRange:
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1400m
memory: 2Gi
Pod scheduled. Both namespaces came up with backend 20251024-113044 and frontend 20251024-131541.
What landed
- Backend reaching Azure MySQL over SSL
- Frontend calling
/api— no localhost, no absolute URL - Ingress routing
/api/*to backend with the full path preserved - Pods within LimitRange bounds
- Public IP:
<public-ip>
The session took longer than expected, but each failure was discrete. None of them required changes to application logic — only to the deployment configuration.
What to watch for
NEXT_PUBLIC_ variables are a build-time concern, not a runtime one. Setting them as Kubernetes env vars silently does nothing for anything the browser renders. If the value isn’t in the Docker build args, it isn’t in the app.
LimitRange objects are namespace-scoped and often created by cluster provisioning automation. kubectl describe limitrange -n <namespace> before writing resource specs saves a scheduling failure.
Ingress rewrite-target is only appropriate if the backend doesn’t share the URL path the browser uses. If backend routes mirror what the ingress exposes, don’t rewrite.
Discussion