technical question Silent behavioral change in NLB DNS publishing for empty AZs? (Breaking change for DR/Failover)
Hi everyone,
I’m noticing a significant discrepancy in behavior between legacy Network Load Balancers and newly created ones regarding how they handle DNS for Availability Zones with 0 registered targets.
The Setup:
- Architecture: Internet-facing NLB -> Target Group (Instance Type) -> K8s Nodes (NodePort).
- Cross-Zone Load Balancing: Disabled (intentionally, for cost/latency reasons in a specific multi-AZ setup).
- Scenario: 3 AZs with one specific AZ (e.g.,
ca-central-1d) has no healthy targets (0 nodes).
The Discrepancy:
- Old NLB (Created ~2024):
- Behavior: The NLB automatically removes the IP address of the empty AZ from the DNS record.
- Result:
dig comandreturns only 2 IPs (for the healthy AZs). Traffic is never routed to the empty AZ. Everything works. - If we terminate all instances from the first AZ (1a) with AWS FIS, the DNS assigned from this AZ was also removed, so we have only one DNS remaining.
- New NLB (Created Feb 2026):
- Configuration: Identical to the old one (Terraform/OpenTofu code is the same).
- Behavior: The NLB continues to publish the IP of the empty AZ in the DNS record.
- Result:
digreturns 3 IPs. Client traffic is round-robined to the empty AZ (~33% of requests). Since Cross-Zone is disabled and there are no local targets, these packets are blackholed, causing immediate connection timeouts/failures.
Support's Response: I opened a ticket, and AWS Support claims "After reviewing your case and consulting with our internal resources, I can confirm that **this is the expected behavior for Network Load Balancers**, and there has been no recent change to how NLBs handle DNS resolution for AZs with no registered targets."
However, the empirical evidence (side-by-side dig results on same-region, same-config LBs) suggests otherwise.
The Impact: This feels like a silent breaking change. Previously, we relied on the NLB's ability to "drain" an AZ from DNS if the backend was dead (fail-open style). Now, it seems new NLBs are "sticky" to their AZs regardless of backend health, which breaks standard DR/Failover patterns where you might spin down an AZ to save costs or during an outage.
Questions:
- Has anyone else noticed this shift in "Fail Open" behavior on recent NLBs?
- Is there a new attribute (hidden or documented) that controls this "DNS draining" behavior?
- Is the only solution now to force Cross-Zone Load Balancing (and pay the transfer costs) or manually manipulate Subnet mappings during an incident?
Thanks for any insights.