AWS Interview Questions

Nail your next AWS interview with our extensive collection of questions on EC2, S3, VPC, IAM, Lambda, Serverless architectures, and AWS best practices.

Explore AWS Track

Interview Questions Database

Master your concepts with 54 hand-picked questions

Filter by Experience Level

On-Demand Instances let you pay for compute capacity by the hour or second with no long-term commitments. They are ideal for applications with short-term, spiky, or unpredictable workloads that cannot be interrupted.

Reserved Instances provide significant discounts (up to 75%) compared to On-Demand pricing. They require a commitment of 1 or 3 years and are best for steady-state workloads with predictable usage.

Spot Instances allow you to bid on spare Amazon EC2 computing capacity at up to 90% discount compared to On-Demand pricing. However, AWS can reclaim the instance with a 2-minute warning when it needs that capacity back.

Best used for: batch processing jobs, big data analytics, CI/CD build agents, stateless web servers, and any workload that is fault-tolerant and can be interrupted.

You can secure an S3 bucket using multiple methods:

1. Use IAM Policies to grant specific users/roles access.

2. Apply Bucket Policies to restrict access across the entire bucket.

3. Enable Block Public Access to prevent accidental exposure.

4. Use S3 Access Control Lists (ACLs) for granular object-level permissions.

5. Enable Server-Side Encryption (SSE) using AWS KMS or S3-managed keys to encrypt data at rest.

6. Enforce HTTPS via Bucket Policies to encrypt data in transit.

S3 Versioning is a feature that keeps multiple variants of an object in the same S3 bucket. When enabled, every time you overwrite or delete an object, S3 keeps the previous version(s).

You would enable it to:

1. Protect against accidental deletion or overwrites (act as a trash can).

2. Recover from unintended user actions or application failures.

3. Satisfy compliance requirements that mandate data retention.

Note: Once enabled, versioning cannot be disabled, only suspended.

Security Groups operate at the instance level (EC2). They are stateful, meaning if you allow traffic in, the return traffic is automatically allowed out. They only support 'allow' rules.

Network ACLs operate at the subnet level. They are stateless, meaning you must explicitly allow both inbound and outbound traffic. They support both 'allow' and 'deny' rules, making them useful for blocking specific rogue IP addresses.

VPC Peering is a networking connection between two VPCs that enables routing traffic between them using private IPv4 or IPv6 addresses. Instances in either VPC can communicate with each other as if they are within the same network.

Key limitations:

1. No transitive peering — if VPC A is peered with B, and B is peered with C, A cannot communicate with C through B.

2. No overlapping CIDR blocks are allowed between the peered VPCs.

3. VPC Peering is region-specific by default (though cross-region peering is supported).

4. Does not support edge-to-edge routing (e.g., through a VPN or Direct Connect).

An IAM User is a permanent identity for a person or application that needs long-term access to AWS resources. It has permanent credentials (password or access key).

An IAM Role is a temporary identity meant to be assumed by trusted entities (EC2 instances, Lambda functions, other AWS services, or users from another account). Roles do not have permanent credentials — they issue temporary security tokens via AWS STS (Security Token Service).

Best practice: Always prefer Roles for granting access to AWS services or cross-account access, and avoid using long-lived access keys for IAM Users.

The Principle of Least Privilege (PoLP) dictates that any user, application, or service should only be granted the minimum permissions required to perform its work — nothing more.

In AWS IAM, you implement this by:

1. Starting with no permissions (deny all by default) and granting only what is needed.

2. Using AWS Managed Policies as a baseline and creating Customer Managed Policies for fine-grained control.

3. Regularly reviewing and removing unused permissions using IAM Access Analyzer and IAM Credentials Report.

4. Using Service Control Policies (SCPs) in AWS Organizations to set permission guardrails.

A 'cold start' occurs when AWS Lambda spins up a new execution environment to handle an invocation. This takes time, causing latency. It happens mostly when the function hasn't been invoked recently, or when scaling out to handle concurrent requests.

You can mitigate it by:

1. Using Provisioned Concurrency, which keeps a set number of execution environments initialized and ready to respond in double-digit milliseconds.

2. Optimizing package size and minimizing dependencies.

3. Using interpreted languages like Node.js or Python instead of compiled languages like Java, though AWS SnapStart can help Java specifically.

Amazon RDS (Relational Database Service) is a managed service for relational databases (MySQL, PostgreSQL, Oracle, SQL Server, etc.). It is ideal for structured data with complex queries and relationships. It supports SQL and ACID transactions.

Amazon DynamoDB is a fully managed NoSQL key-value and document database. It is ideal for applications requiring single-digit millisecond performance at any scale, such as gaming leaderboards, shopping carts, and IoT data.

Choose RDS for structured, relational data with complex queries. Choose DynamoDB for high-throughput, low-latency access patterns with a simple data model.

RTO (Recovery Time Objective) is the maximum acceptable time that a system can be offline after a disaster. It answers: 'How quickly must we recover?'

RPO (Recovery Point Objective) is the maximum acceptable amount of data loss measured in time. It answers: 'How much data can we afford to lose?'

For example, an RPO of 1 hour means your system must be backed up frequently enough that you never lose more than 1 hour's worth of data. Together, RTO and RPO drive your choice of disaster recovery strategy (Backup & Restore, Pilot Light, Warm Standby, or Multi-Site Active/Active).

AWS CloudFormation is an Infrastructure as Code (IaC) service that allows you to define your AWS infrastructure in a declarative template file (YAML or JSON). CloudFormation reads the template and provisions the resources in the correct order, handling dependencies automatically.

A Stack is a single unit of related AWS resources, all managed together as one template. If you need to delete your environment, you can delete the stack and all associated resources are cleaned up automatically. Stacks can be nested for complex architectures.

ALB (Layer 7): Operates at the HTTP/HTTPS level. It can route traffic based on URL path, hostname, HTTP headers, and query strings. Best for microservices, container-based apps, and HTTP/HTTPS traffic.

NLB (Layer 4): Operates at the TCP/UDP/TLS level. It passes traffic through very quickly with ultra-low latency (handles millions of requests per second). Best for extreme performance requirements, gaming applications, and non-HTTP TCP protocols.

A systematic cost optimization approach:

1. Use AWS Cost Explorer and AWS Trusted Advisor to identify idle/underutilized resources.

2. Find unattached EBS volumes and Elastic IP addresses (they still cost money when idle).

3. Right-size EC2 instances using CloudWatch metrics (CPU, memory utilization).

4. Convert On-Demand instances for predictable workloads to Reserved Instances or Savings Plans.

5. Use S3 Intelligent-Tiering or Lifecycle Policies to move infrequently accessed data to cheaper storage tiers.

6. Delete unused snapshots, old AMIs, and unused load balancers.

A Region is a large geographical area (e.g., us-east-1) containing multiple data centers. An Availability Zone (AZ) is a single, isolated data center or cluster of data centers within a Region. Regions provide global distribution, while AZs provide high availability within that specific geographic area.

An IAM User is an entity that represents a person or application with long-term credentials (password or access keys). A Role has short-term, temporary credentials and is designed to be assumed by trusted entities, such as an EC2 instance needing access to S3, without hardcoding keys.

Amazon S3 is object storage used for unstructured data, backups, and static website hosting because it is highly scalable and accessible via API over the internet. EBS is block storage representing a virtual hard drive that must be attached to a running EC2 instance to store its operating system and active file systems.

An Elastic Load Balancer (ELB) automatically distributes incoming application traffic across multiple targets, such as EC2 instances, containers, or Lambda functions. It ensures no single instance is overwhelmed, providing fault tolerance.

A Security Group acts as a stateful, virtual firewall for your EC2 instances to control incoming and outgoing traffic. You must explicitly allow inbound traffic (e.g., allow port 443 for HTTPS); all inbound traffic is blocked by default.

Route 53 is AWS's highly available and scalable cloud Domain Name System (DNS) web service. It translates human-readable domain names (like www.example.com) into IP addresses and handles global traffic routing.

EC2 Auto Scaling monitors your applications and automatically adds or removes EC2 instances dynamically based on predefined conditions (like CPU utilization) to maintain steady, predictable performance at the lowest possible cost.

It allows customers to explore and try out AWS services free of charge up to specified limits for either 12 months, short-term trials, or infinitely (always free tier).

Amazon RDS is a managed service that makes it easy to set up, operate, and scale a relational database (like MySQL, PostgreSQL) in the cloud by automating patching, backups, and hardware provisioning.

AWS Lambda is a serverless compute service that lets you run back-end code without provisioning or managing any underlying servers. You only pay for the exact compute time consumed while your code is running.

First, verify the RDS Security Group allows inbound traffic on the database port (e.g., 3306) specifically from the EC2 instance's Security Group ID. Second, verify the EC2 instance is deployed in a subnet that has routing access to the db subnet. Third, ensure NACLs are not blocking the traffic.

I would attach an IAM Role to the EC2 instance granting `s3:GetObject` permissions to that specific bucket's ARN. Then, to strictly enforce it, I would add a Bucket Policy to the S3 bucket explicitly denying all access unless the request originates from that specific IAM Role's ARN.

I would use Amazon EventBridge (formerly CloudWatch Events) to schedule a cron expression that triggers an AWS Lambda function. The Lambda function would contain the script, executing instantly and scaling transparently.

An ALB operates at Layer 7, making routing decisions based on HTTP/HTTPS headers, paths, or query strings, ideal for microservices. An NLB operates at Layer 4 (TCP/UDP), handling millions of requests per second with ultra-low latency, ideal for pure connection-based routing.

I would use AWS Systems Manager (SSM) Session Manager. It allows secure, auditable, browser-based or CLI interactive shell access to the instance without opening inbound port 22 or managing SSH key pairs.

Standard is designed for frequently accessed data. Intelligent-Tiering automatically monitors access patterns and moves objects between frequent, infrequent, and archive access tiers without operational overhead or retrieval fees, saving money on data with unknown access patterns.

Instantly delete or deactivate the compromised IAM Access Keys in the AWS Console. Rotate the keys, force the developer to cycle local configurations, review AWS CloudTrail specifically for API calls made by those compromised keys to identify blast radius, and remove the secret from GitHub history.

An IGW is attached to a Public Subnet allowing two-way internet routing, enabling the instance to receive inbound public traffic. A NAT Gateway is placed IN the Public Subnet but used by instances in a Private Subnet strictly to initiate outbound internet requests (like downloading patches) without exposing themselves inbound.

I would use Amazon Data Lifecycle Manager (DLM) or AWS Backup to create automated snapshot policies. These policies would define the daily schedule and retention rules (e.g., keep the last 7 days of snapshots), ensuring compliance without manual cron jobs.

I would upload the static assets to an S3 bucket configured for web hosting. Then, I would provision an Amazon CloudFront distribution, point the origin to the S3 bucket, configure an Origin Access Control (OAC) to secure the bucket, and attach a free ACM SSL certificate to CloudFront.

First, I use RDS Performance Insights and enable Enhanced Monitoring to identify the exact SQL query causing the CPU spike. Often, it's an unindexed table scan. If the query cannot be optimized or indexed, and it is overwhelmingly read-heavy, I would provision an RDS Read Replica and update the application code to route SELECT queries to the replica, offloading the Primary.

I would implement a VPC Gateway Endpoint for Amazon S3. I would update the VPC route tables to route all S3 traffic instantly to the Gateway Endpoint. Finally, I would implement a strict S3 Bucket Policy using the `aws:SourceVpce` condition key, completely denying any access that does not flow through that specific VPC Endpoint ID.

I would use AWS Cloud Map for service discovery. As Fargate tasks spin up dynamically, they register their ephemeral IPs with Cloud Map. Other services can then resolve them via internal DNS namespaces (e.g., `backend.local.net`). To secure communication, I would employ AWS App Mesh to handle mTLS encryption and intelligent retries between the containers.

Lambda functions can scale concurrently to thousands of instances, directly exhausting Postgres/MySQL connection limits. I would implement Amazon RDS Proxy. The proxy sits between Lambda and RDS, pooling and multiplexing the connections securely, preserving the database memory and preventing connection exhaustion.

I would utilize the Elastic Volumes feature. Via the AWS Console or CLI, I can dynamically modify the EBS volume from `gp2` to `gp3` and provision explicit higher IOPS/throughput, or move to `io2` if the requirement is astronomical. This hot-modification works seamlessly while the volume is actively in use without impacting the running OS.

I would route all IoT messages directly into an Amazon Kinesis Data Stream, which natively buffers massive throughput. A fleet of AWS Lambda functions would consume batches of messages from the Kinesis stream. The Lambdas would format the data and perform batch inserts into an Amazon DynamoDB table, preventing database throttling while guaranteeing data durability.

I would enable VPC Flow Logs to trace the specific IP addresses generating cross-AZ traffic. Usually, it's chattiness between a web tier in AZ-A and a log aggregation service or database in AZ-B. I would mitigate this by ensuring services heavily utilize intra-AZ routing logically, enabling ALB cross-zone load balancing optimizations, or deploying caching layers localized strictly within the same AZ.

I should have applied a `DeletionPolicy: Retain` or `Snapshot` directly on the RDS resource within the CloudFormation template. This ensures that even if the stack is deleted, the database instance is left intact or a final snapshot is natively taken before deletion.

I would upload the file to S3, triggering an S3 Event Notification to AWS Step Functions or an AWS Batch job. AWS Batch would dynamically spin up an EC2 Spot instance with sufficient RAM, stream the file process block-by-block, write the results to a target datastore, and terminate the instance.

I would use AWS Organizations Service Control Policies (SCPs) to explicitly deny the `ec2:RunInstances` or `ec2:CreateVolume` actions if the `ec2:Encrypted` condition boolean is false. Alternatively, I would simply enable the 'EBS Encryption by Default' toggle globally at the regional account level.

I would utilize an Amazon Aurora Global Database to replicate extremely low-latency storage blocks globally asynchronously (sub-second lag) serving as the core state. The compute tier would run purely stateless microservices in EKS clusters across us-east-1 and eu-west-1. Route 53 latency-based routing combined with AWS Global Accelerator Anycast IPs would securely route users to the healthiest, nearest region. DynamoDB Global Tables could handle hyper-fast user session caching.

I would implement AWS PrivateLink. In the destination VPC, I would create an internal Network Load Balancer (NLB) fronting the API and expose it as a VPC Endpoint Service. In the source VPC, I would instantiate an Interface VPC Endpoint. This securely maps the destination API to private IP addresses entirely valid within the source's CIDR, completely neutralizing the IP overlap issue without complex NAT masquerading.

I would leverage AWS Systems Manager (SSM) globally via AWS Organizations. First, I use AWS Config to dynamically identify the exact non-compliant drifted AMIs. Then, I utilize SSM Run Command or SSM Automation Documents to target the fleet concurrently via Resource Groups, silently executing the shell script to upgrade the package instantly without generating SSH keys or managing firewall ports.

Multitenancy on EKS requires deep stratification. I would assign dedicated Kubernetes Namespaces per tenant. For compute isolation, I would use Taints/Tolerations to pin tenant Pods to specific Node Groups. For network isolation, I would enforce strict Calico NetworkPolicies denying cross-namespace traffic. For IAM, I would utilize IAM Roles for Service Accounts (IRSA), mapping the specific tenant's Kubernetes Service Account directly to an AWS IAM Role possessing least-privilege KMS and S3 policies specific to that tenant's exact data.

DynamoDB Global Tables are master-master replicated globally, fundamentally prioritizing Availability and Partition tolerance (AP). Under global partitions, it strictly relies on Eventual Consistency and 'last-writer-wins' conflict resolution. Aurora Multi-Master operates synchronously within a single region, leaning towards Consistency and Partition tolerance (CP), providing strictly serializable transactions but risking immediate availability if the synchronized quorum layer experiences extreme latency.

I would rapidly deploy AWS WAF (Web Application Firewall) attached to the ALB. I would implement AWS Managed Rules for DDoS protection to catch known botnets. For the zero-day HTTP flood, I would configure Rate-Based Rules in WAF explicitly triggering when requests from a single IP exceed strict thresholds natively. I would also integrate AWS Shield Advanced to gain proactive automated traffic engineering from the AWS DDoS Response Team.

Standard SQS guarantees at-least-once delivery, which can duplicate events. To guarantee ordering, I must utilize an SQS FIFO queue utilizing strict MessageGroupId routing. To guarantee Exactly-Once processing, AWS SQS FIFO handles deduplication based on a 5-minute deduplication ID window. However, because Lambdas can internally fail post-processing but pre-deletion, the true guarantee relies entirely on writing the Lambda logic to execute idempotently against a transactional datastore like DynamoDB.

Transferring 500TB over 1Gbps would theoretically take over 45 days. The mathematical solution is utilizing physical offline transfer. I would order multiple AWS Snowball Edge Storage Optimized devices. I would encrypt the data heavily managed by on-premise HSMs before loading it onto the physical Snowball appliances using the native S3 adapter. Once shipped to AWS, the data is ingested into S3 securely, satisfying the stringent 14-day timeline bypass.

A Hot Partition occurs when a bad Partition Key heavily concentrates read/write operations on a single physical node (e.g., querying strictly by a rapidly updating 'Status' attribute). I would re-architect the data model using Partition Key strategies like adding artificial suffixes (Sharding) to distribute heavily accessed items. If the workload is overwhelmingly read-heavy on specific hot keys, I would place DynamoDB Accelerator (DAX) in front of the tables to natively cache microsecond reads seamlessly.

I would establish a centralized AWS Organization CloudTrail logging natively to a dedicated, highly restricted specialized 'Security Tooling' AWS Account. The target S3 Bucket would strictly enforce S3 Object Lock in Compliance Mode with a 7-year retention constraint. Compliance Mode physically prevents any user, including the root user of the Security Account, from altering, deleting, or overwriting the CloudTrail log files mathematically over their lifespan.