AKS Was Running But the Site Was Unreachable: an NSG Story
The cluster was healthy and the pods were running, but requests from outside the corporate network timed out. An NSG rule was allowing only two CIDRs. Fixed it with a Terraform boolean toggle.
ON THIS PAGE
Two weeks after the initial AKS deployment, the site was inaccessible from outside the corporate network. From <internal-ip> (on-prem) or <internal-cidr> (internal VNET), everything worked. From a mobile phone or a home network, the connection timed out at the TCP layer. No HTTP response, no TLS handshake — just silence.
Environment
| Component | Detail |
|---|---|
| Cluster | AKS (<cluster>), West Europe |
| Load Balancer IP | <internal-ip> |
| NSG | Applied to the AKS node subnet |
| Ingress | NGINX Ingress Controller |
Diagnosing from the outside
curl -v --connect-timeout 10 http://<internal-ip>/
# * Trying <internal-ip>:80...
# * Connection timed out after 10001 milliseconds
# curl: (28) Connection timed out after 10001 milliseconds
Not a 403, not a 404, not a TLS error — a timeout at the TCP connection stage. That ruled out application-layer problems and pointed to the network boundary: either the Azure Load Balancer wasn’t forwarding the traffic, or something was dropping it after the load balancer.
AKS load balancers in Azure forward traffic to the node subnet. The NSG on that subnet controls what reaches the nodes.
The NSG rule
In Terraform, the relevant rule:
resource "azurerm_network_security_rule" "aks_allow_http_https" {
name = "Allow-HTTP-HTTPS"
priority = 110
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_ranges = ["80", "443"]
source_address_prefixes = [
"<internal-cidr>", # Corporate VNET
"<internal-ip>" # On-prem router
]
destination_address_prefix = "*"
}
Two source CIDRs — internal only. Any traffic from outside those ranges hit the default-deny rule that follows.
The security intent was correct: the site should eventually be restricted to known IP ranges. But at this stage of testing, with external stakeholders needing access, the restriction was premature.
The fix: Terraform boolean toggle
Rather than deleting the rule or manually editing the CIDR list for each access change, I added a boolean variable to switch between public and corporate-only modes:
# variables.tf
variable "allow_public_http_https" {
description = "Allow public HTTP/HTTPS access. Set to false to restrict to corporate network only."
type = bool
default = false
}
# main.tf (or network module)
resource "azurerm_network_security_rule" "aks_allow_http_https_public" {
count = var.allow_public_http_https ? 1 : 0
name = "Allow-HTTP-HTTPS-Public"
priority = 110
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_ranges = ["80", "443"]
source_address_prefix = "Internet"
destination_address_prefix = "*"
# ...
}
resource "azurerm_network_security_rule" "aks_allow_http_https_restricted" {
count = var.allow_public_http_https ? 0 : 1
name = "Allow-HTTP-HTTPS-Corporate"
priority = 110
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_ranges = ["80", "443"]
source_address_prefixes = [
"<internal-cidr>",
"<internal-ip>"
]
destination_address_prefix = "*"
# ...
}
count = var.allow_public_http_https ? 1 : 0 is the Terraform pattern for conditional resource creation. One rule or the other exists — never both, never neither.
In terraform.tfvars:
allow_public_http_https = true
Apply:
cd infra/
terraform apply -auto-approve
Verify:
curl -I http://<internal-ip>
# HTTP/1.1 200 OK
The AKS API server is separate
Changing the node subnet NSG only affects application traffic (ports 80/443). It has no effect on kubectl access to the API server, which has its own authorized IP range list:
# Add an IP to the API server authorized list
az aks update \
--name <cluster> \
--resource-group <resource-group> \
--api-server-authorized-ip-ranges "<internal-ip>/32,<internal-ip>/32"
These are two separate controls. The NSG governs what reaches the workload nodes. The apiServerAccessProfile.authorizedIpRanges governs what can talk to the Kubernetes control plane. You need both to use kubectl and serve traffic from a locked-down cluster.
Production rule
With allow_public_http_https = true in terraform.tfvars, the site became accessible from anywhere. The Kubernetes API server remained restricted to known CIDRs. The database and internal services were unaffected — their NSG rules were separate and unchanged.
The toggle makes it easy to flip back:
# In terraform.tfvars
allow_public_http_https = false
terraform apply -auto-approve
That re-enables the corporate-only rule and removes the public one in a single operation. No manual NSG edits, no state drift.
Discussion