RabbitMQ Cluster Operator: The Secret Format Nobody Documents
Traced a RabbitMQ init container mount failure to undocumented secret key requirements and resolved it with External Secrets Operator templating.
ON THIS PAGE
The Problem
RabbitMQ was running in a Kubernetes cluster using the RabbitMQ Cluster Operator. Everything worked fine until a pattern emerged: every time the RabbitMQ pod restarted, connected applications would lose their connections. Not because RabbitMQ was down, but because the password had changed.
The RabbitMQ Cluster Operator, by default, auto-generates credentials and stores them in a Kubernetes Secret. This is convenient for getting started, but problematic when stable, predictable credentials are needed that integrate with an enterprise secret management system.
I was using AWS Secrets Manager with the External Secrets Operator for all other services. The natural solution was to configure RabbitMQ to use external secrets too. The documentation mentioned secretBackend.externalSecret but was light on details about the exact format required.
After adding the configuration, the RabbitMQ pod went into an endless init loop:
MountVolume.SetUp failed for volume "rabbitmq-confd" : references non-existent secret key: default_user.conf
The Investigation
The error pointed to a missing default_user.conf key in the secret. But the external secret only had the keys that seemed logical:
spec:
data:
- secretKey: username
remoteRef:
key: project-shared/rabbitmq
property: RABBITMQ_USERNAME
- secretKey: password
remoteRef:
key: project-shared/rabbitmq
property: RABBITMQ_PASSWORD
- secretKey: erlang-cookie
remoteRef:
key: project-shared/rabbitmq
property: ERLANG_COOKIE
Describing the pod revealed the init container was trying to mount several volumes from the secret:
kubectl describe pod rabbitmq-server-0 -n queue
Init Containers:
setup-container:
Command:
sh
-c
cp /tmp/erlang-cookie-secret/.erlang.cookie /var/lib/rabbitmq/.erlang.cookie && chmod 600 /var/lib/rabbitmq/.erlang.cookie ;
cp /tmp/rabbitmq-plugins/enabled_plugins /operator/enabled_plugins ;
echo '[default]' > /var/lib/rabbitmq/.rabbitmqadmin.conf &&
sed -e 's/default_user/username/' -e 's/default_pass/password/' /tmp/default_user.conf >> /var/lib/rabbitmq/.rabbitmqadmin.conf
Mounts:
/tmp/default_user.conf from rabbitmq-confd (rw,path="default_user.conf")
/tmp/erlang-cookie-secret/ from erlang-cookie-secret (rw)
The operator expected four specific keys in the secret:
username- the RabbitMQ admin usernamepassword- the RabbitMQ admin password.erlang.cookie- note the leading dotdefault_user.conf- a config file with specific format
The secret had erlang-cookie (without the dot) and was completely missing default_user.conf.
The Root Cause
The RabbitMQ Cluster Operator was designed with auto-generated secrets in mind. When it creates its own secret, it generates all four keys in the exact format the init container expects. The default_user.conf file contains:
default_user = rabbitmq_admin
default_pass = auto_generated_password_here
When switching to secretBackend.externalSecret, the operator no longer generates this secret. It expects a secret that matches the exact format it would have generated. The documentation does not clearly specify this requirement.
The init container script reads from /tmp/default_user.conf, which is mounted from the secret. If that key does not exist, the mount fails and the pod cannot start. Similarly, the erlang cookie must be named .erlang.cookie with the leading dot because that is the filename RabbitMQ expects in the data directory.
This is a classic case of an operator assuming its own conventions without documenting them for external integration.
The Solution
The fix required using External Secrets Operator templating to construct the secret in the exact format RabbitMQ expects:
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: rabbitmq-secrets
namespace: queue
annotations:
argocd.argoproj.io/sync-wave: "-1"
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
kind: ClusterSecretStore
target:
name: rabbitmq-secrets
creationPolicy: Owner
template:
engineVersion: v2
data:
username: "{{ .username }}"
password: "{{ .password }}"
.erlang.cookie: "{{ .erlangcookie }}"
default_user.conf: |
default_user = {{ .username }}
default_pass = {{ .password }}
data:
- secretKey: username
remoteRef:
key: project-shared/rabbitmq
property: RABBITMQ_USERNAME
- secretKey: password
remoteRef:
key: project-shared/rabbitmq
property: RABBITMQ_PASSWORD
- secretKey: erlangcookie
remoteRef:
key: project-shared/rabbitmq
property: ERLANG_COOKIE
The data section uses erlangcookie (no dot) as the secret key name because ESO’s templating engine uses dots for path access — a dotted key name in the data section would be misinterpreted as a nested path. The dot is added back in the template section where it belongs. The sync-wave: "-1" annotation ensures the secret exists before the RabbitMQ cluster starts.
Verifying the Fix
After applying the updated external secret:
kubectl describe secret rabbitmq-secrets -n queue
Name: rabbitmq-secrets
Namespace: queue
Data
====
.erlang.cookie: 64 bytes
default_user.conf: 96 bytes
password: 32 bytes
username: 32 bytes
All four keys present. The RabbitMQ pod started successfully:
kubectl exec -n queue rabbitmq-server-0 -- rabbitmqctl status | head -10
Status of node rabbit@rabbitmq-server-0.rabbitmq-nodes.queue ...
Runtime
OS PID: 1
OS: Linux
Uptime (seconds): 108
Is under maintenance?: false
RabbitMQ version: 3.13.7
Cleaning Up Old Secrets
When transitioning from auto-generated to external secrets, delete the old secrets the operator created:
kubectl delete secret rabbitmq-default-user rabbitmq-erlang-cookie -n queue
Then delete the RabbitMQ pod to force it to restart with the new credentials:
kubectl delete pod rabbitmq-server-0 -n queue
The RabbitMQ Cluster Configuration
The RabbitMQ cluster configuration that uses this secret:
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
name: rabbitmq
namespace: queue
spec:
replicas: 1
image: rabbitmq:3.13-management
secretBackend:
externalSecret:
name: rabbitmq-secrets
secretBackend.externalSecret.name must match the Kubernetes Secret name that the ExternalSecret creates, not the ExternalSecret resource name itself. In this case both happen to be rabbitmq-secrets, but they could differ if target.name is set differently.
Pattern: Operator Secret Format Discovery
This applies to any Kubernetes operator, not just RabbitMQ. When integrating any operator with external secret management:
Step 1: Deploy with auto-generated secrets. Let the operator create its own secrets first:
kubectl get secret -n namespace -l app.kubernetes.io/component=rabbitmq
Step 2: Examine the secret structure.
kubectl get secret rabbitmq-default-user -n queue -o yaml
Note every key, including:
- Key names — exact spelling, case, special characters like leading dots
- Value formats — plain text, base64, structured data
- Any files with specific formats
Step 3: Replicate the structure in External Secrets. Use ESO templating to construct the exact same structure from centralized secrets.
Step 4: Test in non-production first. Secret format errors often result in pods that cannot start at all.
Additional Considerations
Erlang Cookie Persistence
The Erlang cookie is used for inter-node communication in RabbitMQ clusters. It must:
- Be exactly 64 characters
- Remain constant across pod restarts
- Be identical across all nodes in a cluster
Generating a random cookie:
openssl rand -hex 32
Secret Rotation
When rotating RabbitMQ credentials:
- Update the secret in AWS Secrets Manager
- Wait for ESO to refresh (based on
refreshInterval) - Restart the RabbitMQ pods to pick up new credentials
- Update all connected applications
Consider using a longer refreshInterval (e.g., 24h) for credentials that rarely change to reduce API calls to Secrets Manager.
Production Rule
When integrating any Kubernetes operator with external secret management: inspect the operator-generated secret before writing the ExternalSecret. Key names are the contract. If the init container expects .erlang.cookie, the secret must have exactly that key — not erlang-cookie, not erlangcookie. ESO’s template engine exists precisely to bridge the gap between what secret managers store and what operators consume. The 30 seconds spent running kubectl get secret -o yaml on the auto-generated secret saves hours of init container debugging later.
Discussion