Blog Field Notes ArgoCD Sync Waves and Gateway API: HTTPRoute Order Is Not Optional
Debug #kubernetes#argocd#gitops#gateway-api#traefik#sync-waves

ArgoCD Sync Waves and Gateway API: HTTPRoute Order Is Not Optional

Diagnosed an ArgoCD sync loop where Gateway API HTTPRoutes failed on missing backend Services, and fixed it by annotating resources with explicit sync wave numbers.

· Gideon Warui
ON THIS PAGE

The Problem

An ArgoCD application was stuck in a sync loop. Every sync attempt would partially succeed, then fail, then retry. The ArgoCD UI showed the application as “Progressing” indefinitely, never reaching a healthy state.

Examining the Traefik logs (the Gateway API controller) revealed this error repeating:

Parent traefik-gateway: Cannot load HTTPBackendRef namespace/service-web:
getting service: service "service-web" not found

The HTTPRoute was trying to reference a Service that did not exist yet. ArgoCD was applying resources, but the HTTPRoute validation was failing because it ran before the Service was created. Each sync cycle, the same pattern: Service not ready, HTTPRoute fails, ArgoCD retries, forever.

This was confusing because similar applications had been deployed using Kubernetes Ingress without this problem. Why was Gateway API different?


The Investigation

The first step was examining the ArgoCD application status to understand which resources were failing:

kubectl get application service-sandbox -n argocd \
  -o jsonpath='{range .status.resources[*]}{.kind}/{.name}: {.status}{"\n"}{end}'

Output:

Namespace/service-sandbox: Synced
ExternalSecret/service-secrets: Synced
Certificate/service-tls: Synced
HTTPRoute/service-route: OutOfSync     # <-- Problem
ConfigMap/service-config: OutOfSync
Service/service-web: OutOfSync
Deployment/service-web: OutOfSync

The HTTPRoute showed OutOfSync while other resources were still pending. Checking the events in the namespace confirmed that the HTTPRoute was being created and immediately failing validation:

kubectl get events -n service-sandbox --sort-by='.lastTimestamp' | tail -20

The Gateway controller was rejecting the HTTPRoute because the backend Service reference could not be resolved.

Looking at the ArgoCD application’s sync operation details:

kubectl get application service-sandbox -n argocd \
  -o jsonpath='{.status.operationState.syncResult.resources[*]}' | jq .

This showed that all resources were being applied in the same sync phase with no ordering. ArgoCD was sending them all to the cluster simultaneously — it was a race condition whether the Service or HTTPRoute was processed first.


The Root Cause

ArgoCD applies resources using sync waves. Each resource can be annotated with a wave number, and ArgoCD processes waves in numerical order, waiting for all resources in a wave to become healthy before proceeding to the next.

Without explicit wave annotations, all resources default to wave 0. Within a single wave, ArgoCD applies resources in an undefined order. The Kubernetes API server processes them as they arrive, which means ordering depends on network timing, API server load, and other unpredictable factors.

This usually works because most Kubernetes resources have eventual consistency. A Deployment can reference a ConfigMap that does not exist yet, and it will just fail to schedule pods until the ConfigMap appears. Kubernetes keeps retrying, and eventually everything converges.

Gateway API HTTPRoutes are different. They perform eager validation.

When creating an HTTPRoute that references a backend Service, the Gateway controller immediately tries to resolve that Service. Unlike Ingress controllers that typically use lazy resolution and tolerate missing backends, Gateway API controllers validate references at creation time. If the Service does not exist, the HTTPRoute is marked as failed:

status:
  parents:
    - parentRef:
        name: traefik-gateway
        namespace: traefik
      conditions:
        - type: Accepted
          status: "False"
          reason: BackendNotFound
          message: "service service-web not found"

This strict validation is a feature, not a bug. Gateway API was designed to provide better error feedback than Ingress. But it means relying on eventual consistency does not work. Resources must exist in the correct order.


The Fix

The fix is sync wave annotations to control deployment order. Resources with lower wave numbers are applied first, and ArgoCD waits for them to become healthy before processing higher waves.

Updated Service manifest:

# service-web.yaml
apiVersion: v1
kind: Service
metadata:
  name: service-web
  annotations:
    argocd.argoproj.io/sync-wave: "10"
spec:
  selector:
    app: service-web
  ports:
    - name: http
      port: 80
      targetPort: 8000
  type: ClusterIP

Updated HTTPRoute manifest (must come after the Service):

# httproute.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: service-route
  annotations:
    argocd.argoproj.io/sync-wave: "20"
spec:
  parentRefs:
    - name: traefik-gateway
      namespace: traefik
      sectionName: websecure-service
  hostnames:
    - service.domain.com
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /
      backendRefs:
        - name: service-web
          port: 80

With these annotations, ArgoCD will:

  1. Apply wave 10 resources (Service, Deployment)
  2. Wait for them to become healthy
  3. Apply wave 20 resources (HTTPRoute)

The HTTPRoute creation now happens after the Service exists, so the Gateway controller resolves the backend reference successfully.


Standard Wave Ordering

After hitting this, I settled on a standard wave ordering for all applications:

WaveResourcesRationale
-10Custom Resource DefinitionsMust exist before any CRs
-5NamespacesMust exist before namespaced resources
0Secrets, ConfigMaps, ExternalSecretsReferenced by pods, need to exist first
0ServiceAccountsReferenced by deployments
5PersistentVolumeClaimsStorage must be ready before pods mount it
10Deployments, StatefulSets, ServicesCore workloads
15JobsPost-deployment tasks like migrations
20HTTPRoutes, IngressesNeed backend services to exist
25NetworkPoliciesCan be applied after workloads are running

If resource B references resource A, put A in an earlier wave.


Where to Define Sync Wave Annotations

Put the annotation directly in the base resource:

# base/httproute.yaml
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "20"

Pros: applies to all environments automatically. Cons: cannot vary ordering per environment.

Option 2: In Overlay Patches

Create an overlay patch that adds the annotation:

# overlays/sandbox/patch-sync-waves.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: service-route
  annotations:
    argocd.argoproj.io/sync-wave: "20"

Pros: can customize per environment. Cons: more files to maintain.

Option 3: In Kustomization commonAnnotations

# overlays/sandbox/kustomization.yaml
commonAnnotations:
  argocd.argoproj.io/sync-wave: "0"

Pros: single place to define. Cons: applies the same wave to everything, defeating the purpose.

Option 1 works best for most cases. Sync wave ordering is typically consistent across environments.


Sync Waves vs Sync Phases

ArgoCD has two ordering mechanisms that are often confused.

Sync Waves order resources within a sync operation. All resources in wave 0 are applied, then wave 1, and so on.

Sync Phases are broader: PreSync, Sync, PostSync, and SyncFail. These are used for hooks that run at different stages of the sync lifecycle.

For example:

  • PreSync phase for database migration jobs
  • Sync phase for normal resources (with waves for ordering)
  • PostSync phase for notification or cleanup jobs
# Database migration job - runs before main sync
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migrate
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: myapp:latest
          command: ["python", "manage.py", "migrate"]
      restartPolicy: Never

This job runs before any wave 0 resources are applied, ensuring the database schema is ready before the application starts.


Gateway API vs Ingress: Validation Differences

Ingress Controllers (typically):

  • Accept Ingress resources even if backends are missing
  • Continuously reconcile, adding backends when they appear
  • Show warnings in logs but do not reject the resource
  • Route errors result in 502/503 errors, not admission failures

Gateway API Controllers:

  • Validate backend references at creation time
  • Reject HTTPRoutes if referenced Services do not exist
  • Provide explicit status conditions indicating the problem
  • Designed for earlier failure feedback

The Gateway API approach is more correct from a configuration management perspective. Configuration errors surface immediately rather than at runtime. But it requires thinking about resource ordering.


Debugging Sync Wave Issues

When ArgoCD sync is stuck or looping:

  1. Check application sync status:

    kubectl get application myapp -n argocd -o yaml | grep -A 50 status:
  2. Look for resources stuck in OutOfSync or Progressing state.

  3. Check events in the target namespace:

    kubectl get events -n myapp --sort-by='.lastTimestamp'
  4. Examine HTTPRoute status conditions:

    kubectl get httproute myroute -n myapp -o yaml | grep -A 20 status:
  5. Verify sync wave annotations are present:

    kubectl get httproute myroute -n myapp -o jsonpath='{.metadata.annotations}'
  6. If stuck, try a hard refresh:

    kubectl patch application myapp -n argocd --type merge \
      -p '{"metadata":{"annotations":{"argocd.argoproj.io/refresh":"hard"}}}'

Production Rule

Annotate every Gateway API resource with a sync wave higher than its backend dependencies. The eventual-consistency model that makes Ingress controllers forgiving does not apply to Gateway API. HTTPRoutes fail at creation time if their Services are missing — no graceful degradation, no automatic retry from the controller, just a BackendNotFound condition and an ArgoCD sync loop until someone adds the annotation.

If you are migrating from Ingress to Gateway API, auditing sync ordering is not optional. Patterns that worked with Ingress for years will produce sync loops with Gateway API the first time a dependency is missing at apply time. Put wave annotations in base manifests, not overlays, so the ordering is consistent across environments without per-environment patches.

#kubernetes#argocd#gitops#gateway-api#traefik#sync-waves