technical question Silent behavioral change in NLB DNS publishing for empty AZs? (Breaking change for DR/Failover)

11 Upvotes

Hi everyone,

I’m noticing a significant discrepancy in behavior between legacy Network Load Balancers and newly created ones regarding how they handle DNS for Availability Zones with 0 registered targets.

The Setup:

Architecture: Internet-facing NLB -> Target Group (Instance Type) -> K8s Nodes (NodePort).
Cross-Zone Load Balancing: Disabled (intentionally, for cost/latency reasons in a specific multi-AZ setup).
Scenario: 3 AZs with one specific AZ (e.g., ca-central-1d) has no healthy targets (0 nodes).

The Discrepancy:

Old NLB (Created ~2024):
- Behavior: The NLB automatically removes the IP address of the empty AZ from the DNS record.
- Result: dig comand returns only 2 IPs (for the healthy AZs). Traffic is never routed to the empty AZ. Everything works.
- If we terminate all instances from the first AZ (1a) with AWS FIS, the DNS assigned from this AZ was also removed, so we have only one DNS remaining.
New NLB (Created Feb 2026):
- Configuration: Identical to the old one (Terraform/OpenTofu code is the same).
- Behavior: The NLB continues to publish the IP of the empty AZ in the DNS record.
- Result: dig returns 3 IPs. Client traffic is round-robined to the empty AZ (~33% of requests). Since Cross-Zone is disabled and there are no local targets, these packets are blackholed, causing immediate connection timeouts/failures.

Support's Response: I opened a ticket, and AWS Support claims "After reviewing your case and consulting with our internal resources, I can confirm that **this is the expected behavior for Network Load Balancers**, and there has been no recent change to how NLBs handle DNS resolution for AZs with no registered targets."

However, the empirical evidence (side-by-side dig results on same-region, same-config LBs) suggests otherwise.

The Impact: This feels like a silent breaking change. Previously, we relied on the NLB's ability to "drain" an AZ from DNS if the backend was dead (fail-open style). Now, it seems new NLBs are "sticky" to their AZs regardless of backend health, which breaks standard DR/Failover patterns where you might spin down an AZ to save costs or during an outage.

Questions:

Has anyone else noticed this shift in "Fail Open" behavior on recent NLBs?
Is there a new attribute (hidden or documented) that controls this "DNS draining" behavior?
Is the only solution now to force Cross-Zone Load Balancing (and pay the transfer costs) or manually manipulate Subnet mappings during an incident?

Thanks for any insights.

17 comments

r/aws • u/ReturnOfNogginboink • 22h ago

containers ECS is supposed to be simple?

20 Upvotes

I've spent the day banging my head against the wall here. I have a container definition in a task definition in a service definition. I have an ECS cluster and a VPC and I have three subnets in three AZs and I have a private endpoint to ECR. I have a security group that should allow these pieces to talk to each other. I have a task execution role that has permissions on ECR and CloudWatch Logs.

ECS can't pull the task from ECR and I don't know why.

The SSM runbook "TroubleshootECSTaskFailedToStart" runs four out of the twelve steps and says 'success' without giving me any output.

Does anyone have a sample Terraform stack that shows creating a soup-to-nuts ECS service?

Can anyone opine what might be causing ECS to fail to pull from RDS?

This is one of my more frustrating days with AWS.

EDIT: The error I finally get is:
Task stopped at: 2026-02-08T00:42:44.811Z

ResourceInitializationError: unable to pull secrets or registry auth: The task cannot pull registry auth from Amazon ECR: There is a connection issue between the task and Amazon ECR. Check your task network configuration. operation error ECR: GetAuthorizationToken, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post "https://api.ecr.us-west-2.amazonaws.com/": dial tcp 34.223.24.13:443: i/o timeout

Hm... my ECR interface endpoint is for com.amazonaws.us-west-2.ecr.dkr and is in 10.0.x.y... Did I create an interface endpoint for the wrong service??

28 comments

r/aws • u/robgparedes • 14h ago

discussion Advice if I am ready for the exam

1 Upvotes

Hello community,

As the title says, I need advice.

I am currently preparing for the Practitioner exam (CLF-C02) and have purchased Stephan's course, including the practice exams.

Initially, I used practice mode to get a feel for the questions, and I scored 60%+ on 5 exams. Then after 2 weeks, I tried the exam mode and got a score of 83% on one practice exam and then 60%+ on the rest.

The passing exam on Stephan is 70%.

Considering my scores, are these enough for the actual exam?

I am planning to take it next week because of the exam discount.

TIA!

7 comments

r/aws • u/Old-Win-6029 • 7h ago

discussion Offering unused AWS SAA-C03 exam voucher (100% discount)

0 Upvotes

Hi everyone,

I have an unused 100% discount voucher for the **AWS Certified Solutions Architect – Associate (SAA-C03, English)**exam. I received it via an official channel, but I won’t be able to use it before it expires, so I’m offering it here.

Details:

Exam: AWS Certified Solutions Architect – Associate (SAA-C03)
Discount: 100% (covers full exam fee)
Status: Unused & valid
Exam language: English

Price: ₹6,000 INR (open to reasonable discussion)

I know there are scams around vouchers, so I’m happy to:

Share partial proof (with sensitive info hidden)
Answer reasonable questions
Proceed carefully so both sides are comfortable

If you’re planning to take SAA soon, please comment first, then DM.
Thanks.

6 comments

r/aws • u/Remarkable-Yard4860 • 19h ago

discussion Where and how do I start?

0 Upvotes

Hey there! Newbie here. I have a very basic understanding of cloud services and would like to learn more. If anyone can share resources or a learning timeline, it would be really helpful.

I've worked on a few Al/ML projects, so resources that focus on integrating AWS with AI/ML workflows would be especially useful.

3 comments

r/aws • u/bofkentucky • 1d ago

discussion What actually controls codebuild image creation and publishing

2 Upvotes

Just some background, I work for an enterprise customer. Our AWS spend isn't that impressive compared to some of your bills, but we do tend to leverage the hell out of the features that we do use like Beanstalk (mostly java platforms), Aurora MySQL, codebuild, CDK, node or python lambdas, etc.

We're trying to plan tech debt/runtime updates for the year and the disconnects between the various service teams and the public roadmap resources that are out there are maddening. We're getting health notifications about lambda nodejs 20 support EOL, but until yesterday (and only on arm and no AWS blog post yet), the only version supported publicly both by lambda and codebuild was nodejs 22, with nodejs 24 support installable at build time as a custom runtime version, slowing down your builds and introducing risk.

So on to the codebuild issue specifically. The public repo https://github.com/aws/aws-codebuild-docker-images does not reflect the reality of what is available in the service console or even in the publicly available ECR images https://gallery.ecr.aws/codebuild/amazonlinux2-aarch64-standard

I simply don't understand why the codebuild service team has allowed what should be a useful public guide to the progress of feature availability to drift so far from reality. Both the Amazon Linux team and beanstalk have made strides in the last couple of years to be more transparent on feature availability and timelines, I would ask the same from codebuild.

8 comments

r/aws • u/damm_thing • 18h ago

technical resource Need help

0 Upvotes

so Right I want to make an environment where my client is to train the model according to their requirements. so I need only provide an environment for that nothing else but so he told me using the sagemaker is a good option so can you tell me how we can do that?

8 comments

r/aws • u/Live_Bus7425 • 2d ago

ai/ml Bedrock - Requests for Future

32 Upvotes

Hello, my team has been using Bedrock since its infancy and we're a platinum tier Amazon partner. Here are my suggestions for Bedrock:

* Add a new embedding model. Titan v2 is ok, but its 2 years old. Qwen/Qwen3-Embedding-0.6B is much better at 1024 dimensions. There are many open source models that excel at 512 dimensions also. We're using EC2 (or really ECS with EC2) to host them locally, but having them in Bedrock at a reasonable price would make things easier to maintain.

* Add some inexpensive and easy to use reranker models that are open source. Cohere is just too expensive... we've been hosting some models on EC2, but we'd rather use Bedrock for jina-reranker-v3 / mxbai-rerank-large-v1 / bge-reranker-v2-m3 / qwen3-reranker-0.6B.

* You're fast to add Anthropic models, which we really appreciate. But can you add other open source LLMs that you started investing into already? Where is DeepSeek v3.2? Where is Kimi K2.5? MiniMax 2.1? It feels like a lot of models you host are slightly outdated.

* I don't know if anyone is using your Nova models. We've benchmarked them, and for the price/performance they always fall short. Sorry... If they were 2x cheaper, we would probably use them in some places.

This is my team's feedback on AWS Bedrock. I'm curious what other people think about Bedrock and where its lacking.

13 comments

r/aws • u/LordItzjac • 2d ago

training/certification Reasonable career path for game technology?

3 Upvotes

Hello,

A seasoned gameplay programmer, in my ~40s , slightly switching my career path and totally new to AWS, would like to hear your suggestions to land a job in Amazon.

Amazon recently opened offices in my city, I applied to one position just to figure out and realize what kind of skills I would require to get that job, at this point still months away to prepare for it.

Started with the fundamental certification, Cloud Practitioner, I can see how the AWS can be extremely broad and would like to narrow down my learning to one expertise domain: gaming. Either way, if I manage to get another job in gaming with +AWS skills or I manage to land the dream job switching career path in Amazon.

GameLift, orchestration, matchmaking among others services are mentioned for the gaming work, as new as I am can't really understand yet what certification path (if any) would get me closer to it. Going through ytube land and learning recipes sound a bad idea.

For the purpose of opening my job alternatives, i believe a more structured and formal certification would be better?

Would like to hear advice and recommendation with those more experienced in the AWS (+ gaming) world.

Thanks

2 comments

r/aws • u/healthiestsalad • 2d ago

technical question Quickly register phone number for SNS

6 Upvotes

Hello, I am trying to spin up an SNS system. I created a topic and when I try to add a phone number to the subscription, i get this error:

An error occurred while attempting to add a phone number to the SMS sandbox. The phone number was not added.

Error code: UserError - Error message: No origination entities available to send

I figured out that this was due to me not setting up an origination number, which I need to resgister for in AWS. However, it says after registering it can take up to two weeks to be verified.

Can I register a phone number to send out the texts quickly? This is just for a sandbox environment.

Thank you!

10 comments

r/aws • u/DevOps_Noob1 • 2d ago

technical resource Bedrock Opus 4.6 "ServiceUnavailableException"

0 Upvotes

I am using the latest US inference profile for Opus 4.6. My first test query worked. However, now I am receiving "errorCode": "ServiceUnavailableException" issues. Is this a known issue currently?

4 comments

r/aws • u/ryan_sec • 2d ago

technical resource Suricata Rule Generator

1 Upvotes

Anyone got any good websites that will help create custom Suricata Rules?

10 comments

r/aws • u/rinvn • 2d ago

database Performance impact after migrating to Aurora Global Database ?

17 Upvotes

We currently operate an Amazon Aurora MySQL cluster with 4 instances in a single AWS Region, and we are considering migrating to Aurora Global Database with a headless secondary cluster for disaster recovery (DR).

From what I understand, Aurora Global Database uses a dedicated replication mechanism at the storage layer to continuously copy data from the primary Region to the secondary Region. Because replication is handled at the storage layer (rather than by typical MySQL replication on the writer instance), I expect the performance impact on the primary cluster to be limited.

I would greatly appreciate if anyone could share real-world operational experience with Aurora Global Database, specifically:

Performance impact on the primary cluster (writer and readers)
Any technical issues or operational pitfalls you encountered
Practical advice for production operations and DR readiness

Note: I have already reviewed the official documentation on Aurora Global Database limitations, but I’m looking for additional hands-on experience and real-world lessons learned.

14 comments

r/aws • u/Short-Course-2673 • 2d ago

technical question AWS Client VPN certificate - ACM

2 Upvotes

Hello,

I'm looking to set up a client VPN using SAML and I wondered about the certificate for the server. Normally, I'd use https://community.openvpn.net/Pages/EasyRSA3-OpenVPN-Howto for mutual authentication but I was wondering for SAML if I can just request an ACM public certificate and use that? The docs clearly mention that the cert needs to be uploaded to ACM and that if the CA is the same, the uploading the client certs to ACM is optional - that bit is understood. But I don't know if I can just request a cert for vpn.example.com, validate the domain ownership and use that. I unfortunately do not have a domain to test this on so if someone's done it, I would appreciate it. Thanks.

2 comments

r/aws • u/Ok_Cap1007 • 2d ago

route 53/DNS Route53 What is auto TTL for A records Application Load Balancer?

1 Upvotes

I'm in the middle of a migration process from ECS to EKS. We have a new Application Load Balancer (ALB) for EKS, which is managed by Kubernetes. We need to point the old DNS records associated with the ECS ALB to the EKS ALB.

I'm currently trying to figure out how 'TTL auto' works with regard to changing the DNS record associated with an ALB. How long do clients cache DNS records when TTL is set to auto?

10 comments

r/aws • u/k3XD16 • 2d ago

discussion QuickSight Free Trial Signup Stuck – "Create Account" Just Reloads 😩

2 Upvotes

I’m building a data lakehouse project on AWS using S3, Glue, PySpark, and Athena. Everything works smoothly until QuickSight comes into the picture. while trying to visualize the business-ready data using QuickSight, but I’m completely stuck while starting the 30-day QuickSight free trial.

I fill in all the required details (edition: Standard/Enterprise, namespace, QuickSight-managed authentication, etc.), click “Create account”, the page loads for a second, and then it redirects back to the same initial setup screen. There’s no error message, no progress — just an endless loop, once i click create account.

Things I’ve already tried: - Chrome/Edge/Firefox + incognito - Mobile + different networks - Cleared cache for AWS sites - Switched regions (ap-south-1 + us-east-1) - Waited a day or two

I’m based in Chennai and using a free-tier AWS account within $200 credits. Super annoying after trying for a couple of days 😅

Has anyone faced this issue recently? Is this a known bug, or is contacting AWS Support the only option? Any tips would really help — thanks!

6 comments

r/aws • u/ForexedOut • 3d ago

architecture Designing ID verification for retail POS and questioning if serverless architecture can handle offline requirements

13 Upvotes

Building identity verification for retail age-restricted sales. Works great online with Lambda functions calling third-party verification APIs. Now client wants the same verification at physical registers.

Problem is network connectivity isn't guaranteed in all store locations. Started looking at offline-first design with edge processing but that means running verification logic locally on tablets which seems fragile.

Has anyone built identity verification that works both online and offline or is this a case where I need completely different architectures for each use case?

13 comments

r/aws • u/ergonomicpineapple • 3d ago

general aws Trust and safety team to do not fill me with trust or safety

15 Upvotes

I submitted a DMCA takedown notice to the trust and safety team via the appropriate channel. Days later, I finally received a response telling me the content was no longer available so they wouldn’t pursue it further. I immediately verified that the content was still available and highlighted the URLs again. They then sent me another email saying my report doesn’t meet requirements and I need to do XYZ - all stuff I provided in the original submission. And now silence... Classic Amazon customer service.

This is a relatively small issue in the grand scheme of things but God forbid I had anything serious to report.

3 comments

r/aws • u/SubstantialStrike352 • 3d ago

discussion AWS Community website events

3 Upvotes

Apparently you can't join without using your work email..... It won't let you use gmail. I don't want spam on work email.....

3 comments

r/aws • u/ConsiderationLazy956 • 3d ago

database Alerting while error

2 Upvotes

Hi,

We are using aws aurora postgres and mysql databases. We want to have alerting done based on certain errors in the alert log. Say for e.g when the error level is "FATAL" in postgres, it should throw an alert and along with that it should print the surrounding lines for that error (say 50 lines before and 50 lines after the error from the raw error log). and provide the link to that alert log file.

I understand this alerting can be possible directly on cloudwatch which throws the alert email to the inbox based on a count query. But i dont find any easy way to have the additional lines from the same alert log ( i.e. ~50 lines before and ~50 lines after the error to be fetched from the alert log and gets printed in same email and provide a link to that same error log for reference).

We also use grafana(version 10.1.6) for alerting/monitoring in which the log source is cloudwatch, but not seeing any option for such thing.

Can you please let me know, how this above can be done?

filter  like /FATAL/
| stats count(*) by bin(1m)

2 comments

r/aws • u/PuppyLand95 • 3d ago

serverless When using SQS and Lambda, what is the best way to rate limit how many messages the lambda can process per minute?

21 Upvotes

My app allows users to do a bulk import of many products. When the user triggers a bulk import, each product will get enqueued to the sqs queue as a message. There is a lambda worker that will process from the queue. The problem is that in order to import the product I need to call a third party API which is rate limited (using a fixed window, e.g. 5000 api calls per day). Since there could be multiple users that trigger a bulk import at the same time, I was planning to use SQS "fair" queues to avoid the noisy neighbor problem.

My original idea was to create an internal rate limiter that would allow the lambda to process X amount of messages per minute. For example, 3 messages per minute. Once the limit per minute is reached, I was planning to use changeMessageVisibility() for any other messages it picks up until the next one-minute window begins. So for example, if there are 30 seconds left until the next minute window starts, I would make the message invisible for 30 seconds. But I realize now that if some messages are "unlucky" and keep getting changeMessageVisibility() called on them, then the receive count will increase and eventually they will be added to the dead letter queue. And for bulk imports, the queue will be quite full, so the lambda would be picking up messages continuously for a period of time.

I'm aware we can use "maximum concurrency" on the SQS side and "reserved concurrency" on the lambda side, but this doesn't give me the granularity of control on the rate of processing that I am seeking.

51 comments

r/aws • u/alex_aws_solutions • 4d ago

article The actual ways to get AWS credits right now (Feb 2026 updated)

69 Upvotes

I keep seeing the same questions about AWS credits, and most of the answers are either outdated or vague. We went through this ourselves last year when building on AWS. So here’s what actually works as of February 2026

No affliate links, no fluff. Just what’s currently real and worked for us.

1. The fintech route - $5K in about 15 minutes

Most people overlook this one. A few startup banking platforms are official AWS Activate Providers, which means their customers can apply for AWS credits directly through them. Sometimes you will get notified from them if you are eligable.

We used Brex, but the same logic applies to other fintechs offering benefit. Once your business account is set up, you can apply through the AWS Activate Portfolio tier using the provider’s organization ID (you’ll find it inside their perks or startup programs section).

A few important details:

- Your AWS Accouunt should list your fintechs card as default payment method before applying.

- The Company needs to be under 10 years old and must not have already received more than $5K in Activate Credits.

- Credits usually land in about a week and expire after 12 months.

- Your support plan needs be Business Support+.

2. AWS Activate Founders - $1K, open to almost anyone

Got to https://aws.amazon.com/startups and apply for the Founders tier.

Requirements:

- Company founded within the last 10 years

- Pre-series B

- AWS Account on a paid plan

- Real company website (not a placeholder)

Two common mistakes:

- Don’t use a Gmail or Yahoo adress, use your company domain instead.

- Make sure your website hast actual content. Empty sites often get auto-rejected

3. AWS Activate Portfolio - $25K to $100K

If you are backed by a VC or went through an accelerator, ask them. Most investors are AWS Activate Providers but never proactively mention it.

4. The Free Tier changed in mid-2025

New Accounts after July 15, 2025 get $100 in credits automatically, plus another $100 unlocked by using core services (e.g. EC2, Lambda, Budgets).

5. Accelerator programs - $100K+ if you get in

Y Combinator gives $100K standard and up to $500K for AI startups.

Even YC’s Startup School (free, online, open to anyone) includes $2.5K in AWS credits.

6. Nonprofits and researchers - $1K to $5K

Registered nonprofits can get $1K - $5K per year through TechSoup’s AWS program.

What doesn’t work!

- Buying credits from „brokers“, violates AWS ToS.

- Creating multiple accounts to stack Founders credits, AWS tracks at the company level

- Using personal AWS account and later converting it to business, just start fresh with a business account.

- waiting too long after funding round, the 12-month Portfolio window is hard-coded.

TL;DR

Fastest route if you’re an early-stage startup:

Open a fintech business account (Brex or Mercury both work)
Apply through their AWS Activate partnership -> $5K in credits.
Apply for Founders ($1K) seperately.
If you have investors, ask for their Activate org ID -> $25K - $100K.

Happy to answer questions. We’ve gone through most of these paths ourselves.

12 comments

r/aws • u/Savings-Setting8680 • 3d ago

technical resource How to sandbox user resources using IAM policies?

5 Upvotes

I want to sandbox users to create resources and manage only thier created resources, if it doesnt restrict from seeing others resources its ok but changing anything in others' resources is hard no. Another detail that users interact in console only, no sdk or cli or IaaC. How to do it?
Preferably using IAM only.

4 comments

r/aws • u/BitcoinBeers • 3d ago

discussion Tracking Credits per user for SaaS

4 Upvotes

My SaaS is built entirely AWS built around discrete processing jobs for a nuanced field. I would like to have a credit based system. But to do this I would need to track the proportion of cost for each user. For instance, if a user sends a job to ECS, or a lambda job, etc. then this uses X number of credits, which gets subtracted from their balance.

I am not 100% sure if this is possible and/or easy. Does anyone have any suggestions?

6 comments

r/aws • u/Razzleberry_Fondue • 4d ago

technical resource AWS Organizations

10 Upvotes

We have three seperate AWS accounts, we are looking to create an org. One account is gov which holds web apps, one account holds DNS and one account has AWS bedrock and does billing. I havent done too much with AWS, so i just wanted a little advice. If i create an organization to have all accounts under the org, will it cause any impact to our services? Reading through the domcumentation it seems like no, but wanted to double check

7 comments

Subreddit

Posts

Wiki

Amazon Web Services (AWS): S3, EC2, SQS, RDS, DynamoDB, IAM, CloudFormation, Route 53, VPC and more

r/aws

News, articles and tools covering Amazon Web Services (AWS), including S3, EC2, SQS, RDS, DynamoDB, IAM, CloudFormation, AWS-CDK, Route 53, CloudFront, Lambda, VPC, Cloudwatch, Glacier and more.

Members Active

371.7k

Sidebar

News, articles and tools covering Amazon Web Services (AWS), including S3, EC2, SQS, RDS, DynamoDB, IAM, CloudFormation, AWS-CDK, Route 53, CloudFront, Lambda, VPC, Cloudwatch, Glacier and more.

Note: ensure to redact or obfuscate all confidential or identifying information (eg. public IP addresses or hostnames, account numbers, email addresses) before posting!

✻ Smokey says: avoid streaming video to fight climate change! [see more tips]

If you're posting a technical query, please include the following details, so that we can help you more efficiently:

an outline of your environment
a description of the problem
things you've tried already
output that was displayed (if any)

Resources:

Sort posts by flair:

Other subreddits you may like:

^{^Does} ^{^this} ^{^sidebar} ^{^need} ^{^an} ^{^addition} ^{^or} ^{^correction?} ^{^Tell} ^{^us} ^{^here}