Karpenter GC Controller Failing With AccessDenied: Missing iam:ListInstanceProfiles
Traced recurring AccessDenied errors in Karpenter's instance profile garbage collection controller to a missing iam:ListInstanceProfiles action and patched the controller IAM policy to fix it.
ON THIS PAGE
After resolving a startup panic in Karpenter (covered in a separate post), a second issue surfaced: the controller was running but continuously logging AccessDenied errors on iam:ListInstanceProfiles. Core node provisioning worked fine. The background garbage collection controller was failing on every reconciliation loop.
Environment
| Component | Detail |
|---|---|
| Kubernetes | v1.34 (EKS) |
| Karpenter | v1.8.0 |
| Auth model | IRSA (IAM Roles for Service Accounts) |
| IAM policy type | Customer-managed attached policy |
Step 1 — Observing the Error
I checked the Karpenter logs:
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=10
{
"level": "ERROR",
"time": "2026-03-23T08:39:11Z",
"logger": "controller",
"message": "Reconciler error",
"controller": "instanceprofile.garbagecollection",
"aws-error-code": "AccessDenied",
"aws-operation-name": "ListInstanceProfiles",
"aws-status-code": 403,
"error": "listing instance profiles, operation error IAM: ListInstanceProfiles,
api error AccessDenied: User: arn:aws:sts::<account-id>:assumed-role/karpenter-controller-role/<session>
is not authorized to perform: iam:ListInstanceProfiles on resource:
arn:aws:iam::<account-id>:instance-profile/karpenter/<region>/<cluster-name>/"
}
Consistent, recurring, every reconciliation cycle of instanceprofile.garbagecollection. The IAM role lacked permission to call iam:ListInstanceProfiles.
Step 2 — What the Instance Profile GC Controller Does
Karpenter manages EC2 instance profiles as part of the node provisioning lifecycle. When using EC2NodeClass, Karpenter:
- Creates instance profiles tagged with cluster-specific metadata during node class setup
- Associates roles with those instance profiles
- Garbage collects orphaned instance profiles no longer referenced by any active
EC2NodeClass
The GC controller uses iam:ListInstanceProfiles to enumerate existing instance profiles (filtered by Karpenter’s path prefix) and identify which ones to delete. Without it, the controller cannot enumerate profiles and fails every loop.
This does not affect active provisioning — new nodes still come up — but orphaned instance profiles accumulate indefinitely and can eventually hit the IAM limit of 1,000 instance profiles per account.
Step 3 — Inspecting the Existing IAM Policy
I listed the policies attached to the controller role:
aws iam list-attached-role-policies \
--role-name karpenter-controller-role \
--region us-east-1
{
"AttachedPolicies": [
{
"PolicyName": "karpenter-controller-policy",
"PolicyArn": "arn:aws:iam::<account-id>:policy/karpenter-controller-policy"
},
{
"PolicyName": "karpenter-sqs-policy",
"PolicyArn": "arn:aws:iam::<account-id>:policy/karpenter-sqs-policy"
}
]
}
I retrieved the current policy document:
aws iam get-policy-version \
--policy-arn arn:aws:iam::<account-id>:policy/karpenter-controller-policy \
--version-id $(aws iam get-policy \
--policy-arn arn:aws:iam::<account-id>:policy/karpenter-controller-policy \
--query 'Policy.DefaultVersionId' --output text) \
--query 'PolicyVersion.Document' \
--output json
The relevant section:
{
"Sid": "AllowInstanceProfileReadActions",
"Effect": "Allow",
"Action": [
"iam:GetInstanceProfile"
],
"Resource": "*"
}
Only iam:GetInstanceProfile was present. iam:ListInstanceProfiles was missing entirely.
The other instance profile statements were correctly configured:
{
"Sid": "AllowScopedInstanceProfileCreationActions",
"Effect": "Allow",
"Action": ["iam:CreateInstanceProfile"],
"Condition": {
"StringEquals": {
"aws:RequestTag/kubernetes.io/cluster/<cluster-name>": "owned",
"aws:RequestTag/topology.kubernetes.io/region": "<region>"
},
"StringLike": {
"aws:RequestTag/karpenter.k8s.aws/ec2nodeclass": "*"
}
},
"Resource": "*"
},
{
"Sid": "AllowScopedInstanceProfileActions",
"Effect": "Allow",
"Action": [
"iam:AddRoleToInstanceProfile",
"iam:RemoveRoleFromInstanceProfile",
"iam:DeleteInstanceProfile"
],
"Condition": {
"StringEquals": {
"aws:ResourceTag/kubernetes.io/cluster/<cluster-name>": "owned",
"aws:ResourceTag/topology.kubernetes.io/region": "<region>"
}
},
"Resource": "*"
}
Create, tag, modify, and delete were all present. List was missing.
Step 4 — Why This Permission Was Missing
The Karpenter IAM requirements have evolved across versions. In earlier releases, instance profile garbage collection didn’t exist as a controller, so iam:ListInstanceProfiles was not required. When GC was added, the official documentation and CloudFormation/Terraform templates were updated — but existing deployments not rebuilt from the updated templates retained the older, incomplete policy.
This is the standard IAM drift pattern in long-lived clusters: policies provisioned at initial deployment go stale as the software evolves.
Step 5 — Checking Existing Policy Versions
IAM customer-managed policies support up to 5 versions. I checked the current count before creating a new one:
aws iam list-policy-versions \
--policy-arn arn:aws:iam::<account-id>:policy/karpenter-controller-policy \
--query 'Versions[*].{VersionId:VersionId,IsDefault:IsDefault,CreateDate:CreateDate}' \
--output table
+----------------------------+------------+-------------+
| CreateDate | IsDefault | VersionId |
+----------------------------+------------+-------------+
| 2026-01-17T14:39:44+00:00 | True | v1 |
+----------------------------+------------+-------------+
One version. A clean slot for v2.
Step 6 — Creating the Updated Policy Version
I added iam:ListInstanceProfiles to the AllowInstanceProfileReadActions statement:
{
"Sid": "AllowInstanceProfileReadActions",
"Effect": "Allow",
"Action": [
"iam:GetInstanceProfile",
"iam:ListInstanceProfiles"
],
"Resource": "*"
}
I created the new version and set it as default in a single command:
aws iam create-policy-version \
--policy-arn arn:aws:iam::<account-id>:policy/karpenter-controller-policy \
--policy-document file:///tmp/karpenter-policy.json \
--set-as-default
I confirmed the active version:
aws iam get-policy \
--policy-arn arn:aws:iam::<account-id>:policy/karpenter-controller-policy \
--query 'Policy.{DefaultVersionId:DefaultVersionId,UpdateDate:UpdateDate}' \
--output table
+-------------------+-----------------------------+
| DefaultVersionId | UpdateDate |
+-------------------+-----------------------------+
| v2 | 2026-03-23T08:44:55+00:00 |
+-------------------+-----------------------------+
Step 7 — Verifying the Fix
IAM policy changes propagate within seconds. After a short wait:
sleep 15 && kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=10
{"level":"INFO","time":"2026-03-23T08:45:13Z","logger":"controller","message":"unknown field \"status.nodes\"","controller":"nodepool.counter","NodePool":{"name":"default"}}
{"level":"INFO","time":"2026-03-23T08:45:18Z","logger":"controller","message":"unknown field \"status.nodes\"","controller":"nodepool.counter","NodePool":{"name":"production"}}
No AccessDenied errors. The instanceprofile.garbagecollection controller was clean. The remaining unknown field "status.nodes" INFO messages are unrelated and non-critical.
Complete Karpenter IAM Policy for v1.8+
The minimum permissions required for all current controller functionality, with placeholder values:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Karpenter",
"Effect": "Allow",
"Action": [
"ssm:GetParameter",
"ec2:DescribeImages",
"ec2:RunInstances",
"ec2:DescribeSubnets",
"ec2:DescribeSecurityGroups",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeInstances",
"ec2:DescribeInstanceTypes",
"ec2:DescribeInstanceTypeOfferings",
"ec2:DescribeAvailabilityZones",
"ec2:DeleteLaunchTemplate",
"ec2:CreateTags",
"ec2:CreateLaunchTemplate",
"ec2:CreateFleet",
"ec2:DescribeSpotPriceHistory",
"pricing:GetProducts"
],
"Resource": "*"
},
{
"Sid": "ConditionalEC2Termination",
"Effect": "Allow",
"Action": "ec2:TerminateInstances",
"Condition": {
"StringLike": {
"ec2:ResourceTag/karpenter.sh/nodepool": "*"
}
},
"Resource": "*"
},
{
"Sid": "PassNodeIAMRole",
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::<account-id>:role/karpenter-node-role"
},
{
"Sid": "EKSClusterEndpointLookup",
"Effect": "Allow",
"Action": "eks:DescribeCluster",
"Resource": "arn:aws:eks:<region>:<account-id>:cluster/<cluster-name>"
},
{
"Sid": "AllowScopedInstanceProfileCreationActions",
"Effect": "Allow",
"Action": ["iam:CreateInstanceProfile"],
"Condition": {
"StringEquals": {
"aws:RequestTag/kubernetes.io/cluster/<cluster-name>": "owned",
"aws:RequestTag/topology.kubernetes.io/region": "<region>"
},
"StringLike": {
"aws:RequestTag/karpenter.k8s.aws/ec2nodeclass": "*"
}
},
"Resource": "*"
},
{
"Sid": "AllowScopedInstanceProfileTagActions",
"Effect": "Allow",
"Action": ["iam:TagInstanceProfile"],
"Condition": {
"StringEquals": {
"aws:ResourceTag/kubernetes.io/cluster/<cluster-name>": "owned",
"aws:ResourceTag/topology.kubernetes.io/region": "<region>"
},
"StringLike": {
"aws:ResourceTag/karpenter.k8s.aws/ec2nodeclass": "*"
}
},
"Resource": "*"
},
{
"Sid": "AllowScopedInstanceProfileActions",
"Effect": "Allow",
"Action": [
"iam:AddRoleToInstanceProfile",
"iam:RemoveRoleFromInstanceProfile",
"iam:DeleteInstanceProfile"
],
"Condition": {
"StringEquals": {
"aws:ResourceTag/kubernetes.io/cluster/<cluster-name>": "owned",
"aws:ResourceTag/topology.kubernetes.io/region": "<region>"
},
"StringLike": {
"aws:ResourceTag/karpenter.k8s.aws/ec2nodeclass": "*"
}
},
"Resource": "*"
},
{
"Sid": "AllowInstanceProfileReadActions",
"Effect": "Allow",
"Action": [
"iam:GetInstanceProfile",
"iam:ListInstanceProfiles"
],
"Resource": "*"
}
]
}
IAM Drift and the Production Rule
This is the standard failure mode for long-lived clusters: IAM policies go stale as software versions advance. Three things prevent it:
-
Read the IAM changelog in release notes before upgrading. Karpenter documents IAM changes in its migration guides. Checking them before an upgrade catches permission gaps before they hit production.
-
Manage IAM policies as code. When the policy lives in Terraform or CloudFormation, updating it is a reviewable code change with a paper trail. Ad-hoc console edits are invisible in post-incident analysis.
-
Alert on 403s in controller logs. A log-based alert on
aws-status-code: 403in Karpenter logs catches this class of failure within minutes, not during a node-count audit:
# Example Prometheus alert (VictoriaMetrics compatible)
- alert: KarpenterAccessDenied
expr: |
sum(rate(controller_runtime_reconcile_errors_total{
controller="instanceprofile.garbagecollection"
}[5m])) > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Karpenter GC controller is failing — possible IAM permission issue"
One IAM mechanic to keep straight: iam:GetInstanceProfile and iam:ListInstanceProfiles are separate actions. Get reads a single resource by name. List enumerates a collection. Neither implies the other. A policy that grants only Get cannot list — which is exactly what happened here.
GC controller failures are silent from a workload perspective: nodes still provision, pods still schedule. The impact (orphaned instance profiles accumulating toward the 1,000-per-account limit) is long-term, which is why it goes undetected without proactive monitoring.
Commands Reference
# List policies attached to an IAM role
aws iam list-attached-role-policies --role-name <role-name>
# Get current default policy version
aws iam get-policy --policy-arn <arn> --query 'Policy.DefaultVersionId'
# Retrieve policy document
aws iam get-policy-version --policy-arn <arn> --version-id <version-id> \
--query 'PolicyVersion.Document' --output json
# List policy versions
aws iam list-policy-versions --policy-arn <arn>
# Create new policy version and set as default
aws iam create-policy-version \
--policy-arn <arn> \
--policy-document file://policy.json \
--set-as-default
# Verify active version
aws iam get-policy --policy-arn <arn> \
--query 'Policy.DefaultVersionId' --output text
# Delete an old policy version (to free up version slots)
aws iam delete-policy-version --policy-arn <arn> --version-id <version-id> Discussion