Blog Field Notes Manual Kubernetes Deployment When the Pipeline Breaks
Platform #kubernetes#docker#azure#aks#git#cicd

Manual Kubernetes Deployment When the Pipeline Breaks

When Azure DevOps went down with unshipped PRs in flight, I rebuilt frontend and backend images manually using git worktrees, pushed to ACR, and rolled out to two namespaces without downtime.

· Gideon Warui
ON THIS PAGE

The Azure DevOps pipeline API was down. There were two merged PRs on the backend branch and one on the frontend that hadn’t shipped. Waiting wasn’t an option. The manual path exists for exactly this situation.


Repository structure

The codebase has a single git repository with two active branches:

  • <core-system>-frontend-dev — Next.js frontend
  • <core-system>-backend-dev — Node.js backend (separate package.json, separate Dockerfile)

Both branches deploy to <namespace-uat> (UAT) and <namespace-prod> (production) in the same AKS cluster. A standard checkout only gets one branch at a time, which means building both without juggling git checkout calls.


Using git worktrees for the second branch

git worktree lets you check out a second branch into a separate directory without cloning again. The working copy at /tmp/<core-system>-backend is the backend branch:

# Remove old worktree if present
git worktree remove /tmp/<core-system>-backend --force 2>/dev/null || true

# Check out the backend branch into a temp directory
git worktree add /tmp/<core-system>-backend origin/<core-system>-backend-dev

The frontend stays in the main working directory. The backend builds from /tmp/<core-system>-backend. Both are at their latest commits.


Pulling the latest

# Frontend
cd /home/byteslinger/projects/<client>/<core-system>
git stash          # preserve any local k8s manifest edits
git pull origin <core-system>-frontend-dev
git stash pop

git stash here is for uncommitted manifest changes that shouldn’t be discarded but also shouldn’t be in the build context. The backend worktree picks up origin/<core-system>-backend-dev automatically when added.


Building the images

Image tags use a YYYYMMDD-HHMMSS format — timestamp of the build, not the commit. This makes the tag sortable and makes it obvious when a given image was built.

TAG=$(date +"%Y%m%d-%H%M%S")
ACR="<acr-registry>.azurecr.io"

az acr login --name <acr-registry>

# Frontend — NEXT_PUBLIC_API_URL must be passed at build time
docker build \
  --build-arg NEXT_PUBLIC_API_URL=/api \
  -t ${ACR}/<core-system>-frontend:${TAG} \
  -t ${ACR}/<core-system>-frontend:latest \
  -f Dockerfile .

# Backend — from the worktree directory
docker build \
  -t ${ACR}/<core-system>-backend:${TAG} \
  -t ${ACR}/<core-system>-backend:latest \
  -f /tmp/<core-system>-backend/Dockerfile \
  /tmp/<core-system>-backend/

docker push ${ACR}/<core-system>-frontend:${TAG}
docker push ${ACR}/<core-system>-frontend:latest
docker push ${ACR}/<core-system>-backend:${TAG}
docker push ${ACR}/<core-system>-backend:latest

NEXT_PUBLIC_API_URL is a Next.js build-time environment variable — it gets inlined into the JavaScript bundle during npm run build. Passing it as a Docker --build-arg is the only way to change its value. Setting it as a Kubernetes env var at runtime has no effect on what the browser receives.


Rolling out

kubectl config use-context <cluster>

for NS in <namespace-uat> <namespace-prod>; do
  kubectl set image deployment/<core-system>-frontend \
    <core-system>-frontend=${ACR}/<core-system>-frontend:${TAG} -n ${NS}

  kubectl set image deployment/<core-system>-backend \
    <core-system>-backend=${ACR}/<core-system>-backend:${TAG} -n ${NS}
done

# Wait for both namespaces to stabilise
for NS in <namespace-uat> <namespace-prod>; do
  kubectl rollout status deployment/<core-system>-frontend -n ${NS} --timeout=5m
  kubectl rollout status deployment/<core-system>-backend  -n ${NS} --timeout=5m
done

kubectl set image patches the deployment spec in-place and triggers a rolling update. With maxUnavailable: 1 and maxSurge: 0, the old pod is terminated before the new one starts — there is a brief gap on a single-replica deployment. For zero downtime with one replica, maxUnavailable: 0 and maxSurge: 1 is the correct setting: the new pod becomes ready before the old one is removed.


Updating the manifest

After a successful rollout, update the deployment YAML to reflect the actual running image:

# In k8s/prod/deployment-backend.yaml and k8s/uat/deployment-backend.yaml
sed -i "s|<core-system>-backend:.*|<core-system>-backend:${TAG}|" k8s/prod/deployment-backend.yaml

If the manifest stays at the old tag, the next kubectl apply -f will roll back to that image. Keeping manifests in sync with what’s running prevents surprises.


What went wrong the first time

On the first attempt, the backend worktree failed to build because /tmp/<core-system>-backend still existed from a previous session and had a different commit checked out. git worktree add fails if the path already exists. The --force removal at the top of the script handles this.

Docker build caching was also an issue — an earlier failed build had cached an intermediate layer with a stale npm install. Added --no-cache on the second attempt for the backend:

docker build --no-cache \
  -t ${ACR}/<core-system>-backend:${TAG} \
  -f /tmp/<core-system>-backend/Dockerfile \
  /tmp/<core-system>-backend/

After the pipeline came back

The manual build tags (20260313-081305 for frontend, 20260313-082326 for backend) are what ran in production for the next three days until the next pipeline-triggered build superseded them. The timestamps in the tags made it easy to confirm which build was running:

kubectl get deployment <core-system>-backend -n <namespace-prod> \
  -o jsonpath='{.spec.template.spec.containers[0].image}'
# <acr-registry>.azurecr.io/<core-system>-backend:20260313-082326
#kubernetes#docker#azure#aks#git#cicd