Blog Field Notes RabbitMQ Cluster Operator: The Secret Format Nobody Documents
Debug #kubernetes#rabbitmq#external-secrets#operators#secrets-management

RabbitMQ Cluster Operator: The Secret Format Nobody Documents

Traced a RabbitMQ init container mount failure to undocumented secret key requirements and resolved it with External Secrets Operator templating.

· Gideon Warui
ON THIS PAGE

The Problem

RabbitMQ was running in a Kubernetes cluster using the RabbitMQ Cluster Operator. Everything worked fine until a pattern emerged: every time the RabbitMQ pod restarted, connected applications would lose their connections. Not because RabbitMQ was down, but because the password had changed.

The RabbitMQ Cluster Operator, by default, auto-generates credentials and stores them in a Kubernetes Secret. This is convenient for getting started, but problematic when stable, predictable credentials are needed that integrate with an enterprise secret management system.

I was using AWS Secrets Manager with the External Secrets Operator for all other services. The natural solution was to configure RabbitMQ to use external secrets too. The documentation mentioned secretBackend.externalSecret but was light on details about the exact format required.

After adding the configuration, the RabbitMQ pod went into an endless init loop:

MountVolume.SetUp failed for volume "rabbitmq-confd" : references non-existent secret key: default_user.conf

The Investigation

The error pointed to a missing default_user.conf key in the secret. But the external secret only had the keys that seemed logical:

spec:
  data:
    - secretKey: username
      remoteRef:
        key: project-shared/rabbitmq
        property: RABBITMQ_USERNAME
    - secretKey: password
      remoteRef:
        key: project-shared/rabbitmq
        property: RABBITMQ_PASSWORD
    - secretKey: erlang-cookie
      remoteRef:
        key: project-shared/rabbitmq
        property: ERLANG_COOKIE

Describing the pod revealed the init container was trying to mount several volumes from the secret:

kubectl describe pod rabbitmq-server-0 -n queue
Init Containers:
  setup-container:
    Command:
      sh
      -c
      cp /tmp/erlang-cookie-secret/.erlang.cookie /var/lib/rabbitmq/.erlang.cookie && chmod 600 /var/lib/rabbitmq/.erlang.cookie ;
      cp /tmp/rabbitmq-plugins/enabled_plugins /operator/enabled_plugins ;
      echo '[default]' > /var/lib/rabbitmq/.rabbitmqadmin.conf &&
      sed -e 's/default_user/username/' -e 's/default_pass/password/' /tmp/default_user.conf >> /var/lib/rabbitmq/.rabbitmqadmin.conf
    Mounts:
      /tmp/default_user.conf from rabbitmq-confd (rw,path="default_user.conf")
      /tmp/erlang-cookie-secret/ from erlang-cookie-secret (rw)

The operator expected four specific keys in the secret:

  1. username - the RabbitMQ admin username
  2. password - the RabbitMQ admin password
  3. .erlang.cookie - note the leading dot
  4. default_user.conf - a config file with specific format

The secret had erlang-cookie (without the dot) and was completely missing default_user.conf.

The Root Cause

The RabbitMQ Cluster Operator was designed with auto-generated secrets in mind. When it creates its own secret, it generates all four keys in the exact format the init container expects. The default_user.conf file contains:

default_user = rabbitmq_admin
default_pass = auto_generated_password_here

When switching to secretBackend.externalSecret, the operator no longer generates this secret. It expects a secret that matches the exact format it would have generated. The documentation does not clearly specify this requirement.

The init container script reads from /tmp/default_user.conf, which is mounted from the secret. If that key does not exist, the mount fails and the pod cannot start. Similarly, the erlang cookie must be named .erlang.cookie with the leading dot because that is the filename RabbitMQ expects in the data directory.

This is a classic case of an operator assuming its own conventions without documenting them for external integration.

The Solution

The fix required using External Secrets Operator templating to construct the secret in the exact format RabbitMQ expects:

apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: rabbitmq-secrets
  namespace: queue
  annotations:
    argocd.argoproj.io/sync-wave: "-1"
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: rabbitmq-secrets
    creationPolicy: Owner
    template:
      engineVersion: v2
      data:
        username: "{{ .username }}"
        password: "{{ .password }}"
        .erlang.cookie: "{{ .erlangcookie }}"
        default_user.conf: |
          default_user = {{ .username }}
          default_pass = {{ .password }}
  data:
    - secretKey: username
      remoteRef:
        key: project-shared/rabbitmq
        property: RABBITMQ_USERNAME
    - secretKey: password
      remoteRef:
        key: project-shared/rabbitmq
        property: RABBITMQ_PASSWORD
    - secretKey: erlangcookie
      remoteRef:
        key: project-shared/rabbitmq
        property: ERLANG_COOKIE

The data section uses erlangcookie (no dot) as the secret key name because ESO’s templating engine uses dots for path access — a dotted key name in the data section would be misinterpreted as a nested path. The dot is added back in the template section where it belongs. The sync-wave: "-1" annotation ensures the secret exists before the RabbitMQ cluster starts.

Verifying the Fix

After applying the updated external secret:

kubectl describe secret rabbitmq-secrets -n queue
Name:         rabbitmq-secrets
Namespace:    queue

Data
====
.erlang.cookie:     64 bytes
default_user.conf:  96 bytes
password:           32 bytes
username:           32 bytes

All four keys present. The RabbitMQ pod started successfully:

kubectl exec -n queue rabbitmq-server-0 -- rabbitmqctl status | head -10
Status of node rabbit@rabbitmq-server-0.rabbitmq-nodes.queue ...
Runtime

OS PID: 1
OS: Linux
Uptime (seconds): 108
Is under maintenance?: false
RabbitMQ version: 3.13.7

Cleaning Up Old Secrets

When transitioning from auto-generated to external secrets, delete the old secrets the operator created:

kubectl delete secret rabbitmq-default-user rabbitmq-erlang-cookie -n queue

Then delete the RabbitMQ pod to force it to restart with the new credentials:

kubectl delete pod rabbitmq-server-0 -n queue

The RabbitMQ Cluster Configuration

The RabbitMQ cluster configuration that uses this secret:

apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
  name: rabbitmq
  namespace: queue
spec:
  replicas: 1
  image: rabbitmq:3.13-management
  secretBackend:
    externalSecret:
      name: rabbitmq-secrets

secretBackend.externalSecret.name must match the Kubernetes Secret name that the ExternalSecret creates, not the ExternalSecret resource name itself. In this case both happen to be rabbitmq-secrets, but they could differ if target.name is set differently.

Pattern: Operator Secret Format Discovery

This applies to any Kubernetes operator, not just RabbitMQ. When integrating any operator with external secret management:

Step 1: Deploy with auto-generated secrets. Let the operator create its own secrets first:

kubectl get secret -n namespace -l app.kubernetes.io/component=rabbitmq

Step 2: Examine the secret structure.

kubectl get secret rabbitmq-default-user -n queue -o yaml

Note every key, including:

  • Key names — exact spelling, case, special characters like leading dots
  • Value formats — plain text, base64, structured data
  • Any files with specific formats

Step 3: Replicate the structure in External Secrets. Use ESO templating to construct the exact same structure from centralized secrets.

Step 4: Test in non-production first. Secret format errors often result in pods that cannot start at all.

Additional Considerations

The Erlang cookie is used for inter-node communication in RabbitMQ clusters. It must:

  • Be exactly 64 characters
  • Remain constant across pod restarts
  • Be identical across all nodes in a cluster

Generating a random cookie:

openssl rand -hex 32

Secret Rotation

When rotating RabbitMQ credentials:

  1. Update the secret in AWS Secrets Manager
  2. Wait for ESO to refresh (based on refreshInterval)
  3. Restart the RabbitMQ pods to pick up new credentials
  4. Update all connected applications

Consider using a longer refreshInterval (e.g., 24h) for credentials that rarely change to reduce API calls to Secrets Manager.

Production Rule

When integrating any Kubernetes operator with external secret management: inspect the operator-generated secret before writing the ExternalSecret. Key names are the contract. If the init container expects .erlang.cookie, the secret must have exactly that key — not erlang-cookie, not erlangcookie. ESO’s template engine exists precisely to bridge the gap between what secret managers store and what operators consume. The 30 seconds spent running kubectl get secret -o yaml on the auto-generated secret saves hours of init container debugging later.

#kubernetes#rabbitmq#external-secrets#operators#secrets-management