ArgoCD Sync Waves and Gateway API: HTTPRoute Order Is Not Optional
Diagnosed an ArgoCD sync loop where Gateway API HTTPRoutes failed on missing backend Services, and fixed it by annotating resources with explicit sync wave numbers.
ON THIS PAGE
The Problem
An ArgoCD application was stuck in a sync loop. Every sync attempt would partially succeed, then fail, then retry. The ArgoCD UI showed the application as “Progressing” indefinitely, never reaching a healthy state.
Examining the Traefik logs (the Gateway API controller) revealed this error repeating:
Parent traefik-gateway: Cannot load HTTPBackendRef namespace/service-web:
getting service: service "service-web" not found
The HTTPRoute was trying to reference a Service that did not exist yet. ArgoCD was applying resources, but the HTTPRoute validation was failing because it ran before the Service was created. Each sync cycle, the same pattern: Service not ready, HTTPRoute fails, ArgoCD retries, forever.
This was confusing because similar applications had been deployed using Kubernetes Ingress without this problem. Why was Gateway API different?
The Investigation
The first step was examining the ArgoCD application status to understand which resources were failing:
kubectl get application service-sandbox -n argocd \
-o jsonpath='{range .status.resources[*]}{.kind}/{.name}: {.status}{"\n"}{end}'
Output:
Namespace/service-sandbox: Synced
ExternalSecret/service-secrets: Synced
Certificate/service-tls: Synced
HTTPRoute/service-route: OutOfSync # <-- Problem
ConfigMap/service-config: OutOfSync
Service/service-web: OutOfSync
Deployment/service-web: OutOfSync
The HTTPRoute showed OutOfSync while other resources were still pending. Checking the events in the namespace confirmed that the HTTPRoute was being created and immediately failing validation:
kubectl get events -n service-sandbox --sort-by='.lastTimestamp' | tail -20
The Gateway controller was rejecting the HTTPRoute because the backend Service reference could not be resolved.
Looking at the ArgoCD application’s sync operation details:
kubectl get application service-sandbox -n argocd \
-o jsonpath='{.status.operationState.syncResult.resources[*]}' | jq .
This showed that all resources were being applied in the same sync phase with no ordering. ArgoCD was sending them all to the cluster simultaneously — it was a race condition whether the Service or HTTPRoute was processed first.
The Root Cause
ArgoCD applies resources using sync waves. Each resource can be annotated with a wave number, and ArgoCD processes waves in numerical order, waiting for all resources in a wave to become healthy before proceeding to the next.
Without explicit wave annotations, all resources default to wave 0. Within a single wave, ArgoCD applies resources in an undefined order. The Kubernetes API server processes them as they arrive, which means ordering depends on network timing, API server load, and other unpredictable factors.
This usually works because most Kubernetes resources have eventual consistency. A Deployment can reference a ConfigMap that does not exist yet, and it will just fail to schedule pods until the ConfigMap appears. Kubernetes keeps retrying, and eventually everything converges.
Gateway API HTTPRoutes are different. They perform eager validation.
When creating an HTTPRoute that references a backend Service, the Gateway controller immediately tries to resolve that Service. Unlike Ingress controllers that typically use lazy resolution and tolerate missing backends, Gateway API controllers validate references at creation time. If the Service does not exist, the HTTPRoute is marked as failed:
status:
parents:
- parentRef:
name: traefik-gateway
namespace: traefik
conditions:
- type: Accepted
status: "False"
reason: BackendNotFound
message: "service service-web not found"
This strict validation is a feature, not a bug. Gateway API was designed to provide better error feedback than Ingress. But it means relying on eventual consistency does not work. Resources must exist in the correct order.
The Fix
The fix is sync wave annotations to control deployment order. Resources with lower wave numbers are applied first, and ArgoCD waits for them to become healthy before processing higher waves.
Updated Service manifest:
# service-web.yaml
apiVersion: v1
kind: Service
metadata:
name: service-web
annotations:
argocd.argoproj.io/sync-wave: "10"
spec:
selector:
app: service-web
ports:
- name: http
port: 80
targetPort: 8000
type: ClusterIP
Updated HTTPRoute manifest (must come after the Service):
# httproute.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: service-route
annotations:
argocd.argoproj.io/sync-wave: "20"
spec:
parentRefs:
- name: traefik-gateway
namespace: traefik
sectionName: websecure-service
hostnames:
- service.domain.com
rules:
- matches:
- path:
type: PathPrefix
value: /
backendRefs:
- name: service-web
port: 80
With these annotations, ArgoCD will:
- Apply wave 10 resources (Service, Deployment)
- Wait for them to become healthy
- Apply wave 20 resources (HTTPRoute)
The HTTPRoute creation now happens after the Service exists, so the Gateway controller resolves the backend reference successfully.
Standard Wave Ordering
After hitting this, I settled on a standard wave ordering for all applications:
| Wave | Resources | Rationale |
|---|---|---|
| -10 | Custom Resource Definitions | Must exist before any CRs |
| -5 | Namespaces | Must exist before namespaced resources |
| 0 | Secrets, ConfigMaps, ExternalSecrets | Referenced by pods, need to exist first |
| 0 | ServiceAccounts | Referenced by deployments |
| 5 | PersistentVolumeClaims | Storage must be ready before pods mount it |
| 10 | Deployments, StatefulSets, Services | Core workloads |
| 15 | Jobs | Post-deployment tasks like migrations |
| 20 | HTTPRoutes, Ingresses | Need backend services to exist |
| 25 | NetworkPolicies | Can be applied after workloads are running |
If resource B references resource A, put A in an earlier wave.
Where to Define Sync Wave Annotations
Option 1: In Base Manifests (Recommended)
Put the annotation directly in the base resource:
# base/httproute.yaml
metadata:
annotations:
argocd.argoproj.io/sync-wave: "20"
Pros: applies to all environments automatically. Cons: cannot vary ordering per environment.
Option 2: In Overlay Patches
Create an overlay patch that adds the annotation:
# overlays/sandbox/patch-sync-waves.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: service-route
annotations:
argocd.argoproj.io/sync-wave: "20"
Pros: can customize per environment. Cons: more files to maintain.
Option 3: In Kustomization commonAnnotations
# overlays/sandbox/kustomization.yaml
commonAnnotations:
argocd.argoproj.io/sync-wave: "0"
Pros: single place to define. Cons: applies the same wave to everything, defeating the purpose.
Option 1 works best for most cases. Sync wave ordering is typically consistent across environments.
Sync Waves vs Sync Phases
ArgoCD has two ordering mechanisms that are often confused.
Sync Waves order resources within a sync operation. All resources in wave 0 are applied, then wave 1, and so on.
Sync Phases are broader: PreSync, Sync, PostSync, and SyncFail. These are used for hooks that run at different stages of the sync lifecycle.
For example:
- PreSync phase for database migration jobs
- Sync phase for normal resources (with waves for ordering)
- PostSync phase for notification or cleanup jobs
# Database migration job - runs before main sync
apiVersion: batch/v1
kind: Job
metadata:
name: db-migrate
annotations:
argocd.argoproj.io/hook: PreSync
argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
template:
spec:
containers:
- name: migrate
image: myapp:latest
command: ["python", "manage.py", "migrate"]
restartPolicy: Never
This job runs before any wave 0 resources are applied, ensuring the database schema is ready before the application starts.
Gateway API vs Ingress: Validation Differences
Ingress Controllers (typically):
- Accept Ingress resources even if backends are missing
- Continuously reconcile, adding backends when they appear
- Show warnings in logs but do not reject the resource
- Route errors result in 502/503 errors, not admission failures
Gateway API Controllers:
- Validate backend references at creation time
- Reject HTTPRoutes if referenced Services do not exist
- Provide explicit status conditions indicating the problem
- Designed for earlier failure feedback
The Gateway API approach is more correct from a configuration management perspective. Configuration errors surface immediately rather than at runtime. But it requires thinking about resource ordering.
Debugging Sync Wave Issues
When ArgoCD sync is stuck or looping:
-
Check application sync status:
kubectl get application myapp -n argocd -o yaml | grep -A 50 status: -
Look for resources stuck in OutOfSync or Progressing state.
-
Check events in the target namespace:
kubectl get events -n myapp --sort-by='.lastTimestamp' -
Examine HTTPRoute status conditions:
kubectl get httproute myroute -n myapp -o yaml | grep -A 20 status: -
Verify sync wave annotations are present:
kubectl get httproute myroute -n myapp -o jsonpath='{.metadata.annotations}' -
If stuck, try a hard refresh:
kubectl patch application myapp -n argocd --type merge \ -p '{"metadata":{"annotations":{"argocd.argoproj.io/refresh":"hard"}}}'
Production Rule
Annotate every Gateway API resource with a sync wave higher than its backend dependencies. The eventual-consistency model that makes Ingress controllers forgiving does not apply to Gateway API. HTTPRoutes fail at creation time if their Services are missing — no graceful degradation, no automatic retry from the controller, just a BackendNotFound condition and an ArgoCD sync loop until someone adds the annotation.
If you are migrating from Ingress to Gateway API, auditing sync ordering is not optional. Patterns that worked with Ingress for years will produce sync loops with Gateway API the first time a dependency is missing at apply time. Put wave annotations in base manifests, not overlays, so the ordering is consistent across environments without per-environment patches.
Discussion