Codesnippets

AWS EC2 Instance Management with Boto3: Start, Stop, and Query Instances

Mohammad Abu Mattar 14 Apr, 2026 10 Mins read

If you’ve ever spent 20 minutes clicking through the AWS Console just to stop a handful of dev instances, you already know the pain. It’s tedious, it doesn’t scale, and one wrong click can ruin your afternoon.

That’s where Boto3 comes in. It’s the official AWS SDK for Python, and it lets you talk to AWS services, including EC2, directly from your code. Instead of pointing and clicking, you write a script once and run it whenever you need it. Your code doesn’t get tired, doesn’t misclick, and works the same at 3 AM as it does at 3 PM.

This guide covers everything you need to get comfortable with EC2 automation: setting up authentication, starting and stopping instances, waiting for state changes, filtering by tags, running bulk operations, and handling errors like a pro.

Why Automate EC2 Management with Boto3?

Why bother writing code when the AWS Console already does the job?

Fair question. The console is fine when you’ve got two or three instances. But once you’re managing a real environment, manual work starts costing you more than time.

Think about it: every action you take through the console depends on a human doing it right, every single time. With Boto3, you write the logic once, and it runs consistently regardless of who’s on call or how tired they are.

Here’s where automation pays off most:

Scheduled shutdowns: Stop dev and staging instances overnight or on weekends. There’s no reason to pay for servers that nobody’s using at 2 AM.
Consistent tagging: Every instance your script touches gets tagged correctly. No more missing Environment or Owner tags because someone forgot.
Incident response: When something goes wrong, your runbook can stop a compromised instance or spin up a replacement without waiting for a human to log in.
Audit trails: Log every action with timestamps, so you’ve got a clear record of what happened, when, and why.
Fewer mistakes: Repetitive manual work breeds errors. Scripts don’t forget steps or click the wrong button.

Once your automation is in place, managing 200 instances takes roughly the same effort as managing two.

Setting Up Boto3 and Authenticating with AWS

How do you connect Boto3 to your AWS account securely?

Start with the install:

pip install boto3

Now, authentication. This is where a lot of people make mistakes, so it’s worth doing right from the start. The method you use should depend on where your code runs.

Option 1: IAM Roles (Best for EC2 and Lambda)

If your script runs on an EC2 instance or a Lambda function, attach an IAM role to that resource. Boto3 picks up the credentials automatically from the instance metadata service. You don’t need to touch a key file or set any environment variables.

1
import boto3
2

3
# No keys needed here Boto3 reads the IAM role attached to this instance
4
ec2_client = boto3.client('ec2', region_name='us-east-1')
5

6
# Quick check to confirm the connection works
7
response = ec2_client.describe_instances()
8
print('Connected via IAM role good to go.')

This is the cleanest approach for production. There are no credentials to rotate, leak, or accidentally commit to Git.

Option 2: Named Profiles (Best for Local Development)

When you’re working locally, use a named AWS CLI profile. It keeps credentials in ~/.aws/credentials where they belong, not in your source code.

# Run this once in your terminal to set up a profile
aws configure --profile myproject

1
import boto3
2

3
# Load credentials from the 'myproject' profile in ~/.aws/credentials
4
session = boto3.Session(profile_name='myproject')
5
ec2_client = session.client('ec2', region_name='us-east-1')
6

7
print(f'Using profile: {session.profile_name}')

You can have multiple profiles for different accounts or environments, which makes switching between dev and prod much easier.

A word on hardcoded credentials: Don’t do it. Not in scripts, not in config files checked into source control, not anywhere. Use IAM roles in production and named profiles locally. If you’re using access keys, rotate them regularly and give them only the permissions they actually need.

Starting and Stopping EC2 Instances

How do you start and stop instances programmatically with Boto3?

This is probably why you’re here, so let’s get into it. Boto3 uses start_instances and stop_instances for these operations. Both are simple calls, but there are a few details worth knowing.

Stopping an Instance

1
import boto3
2
from botocore.exceptions import ClientError
3

4
def stop_instance(instance_id: str, region: str = 'us-east-1') -> dict:
5
    """Stop a running EC2 instance and return the state transition."""
6
    ec2 = boto3.client('ec2', region_name=region)
7

8
    try:
9
        response = ec2.stop_instances(InstanceIds=[instance_id])
10

11
        # AWS tells us both the old state and the new state
12
        state_info = response['StoppingInstances'][0]
13
        previous = state_info['PreviousState']['Name']
14
        current = state_info['CurrentState']['Name']
15

16
        print(f'{instance_id}: {previous} -> {current}')
17
        return response
18

19
    except ClientError as e:
20
        print(f'Could not stop instance: {e.response["Error"]["Code"]} - {e}')
21
        raise
22

23
# Usage
24
stop_instance('i-0abcd1234ef567890')

Starting an Instance

1
import boto3
2
from botocore.exceptions import ClientError
3

4
def start_instance(instance_id: str, region: str = 'us-east-1') -> dict:
5
    """Start a stopped EC2 instance and return the state transition."""
6
    ec2 = boto3.client('ec2', region_name=region)
7

8
    try:
9
        response = ec2.start_instances(InstanceIds=[instance_id])
10

11
        state_info = response['StartingInstances'][0]
12
        previous = state_info['PreviousState']['Name']
13
        current = state_info['CurrentState']['Name']
14

15
        print(f'{instance_id}: {previous} -> {current}')
16
        return response
17

18
    except ClientError as e:
19
        code = e.response['Error']['Code']
20

21
        # Don't crash if the instance is already running
22
        if code == 'IncorrectInstanceState':
23
            print(f'{instance_id} is already in the desired state.')
24
        else:
25
            raise
26

27
# Usage
28
start_instance('i-0abcd1234ef567890')

One thing worth noting: both methods accept a list of instance IDs, so you can act on multiple instances in a single API call. If you’re managing more than one instance, that’s a lot cleaner than looping and calling separately.

1
# Stop multiple instances at once no loop needed
2
ec2 = boto3.client('ec2', region_name='us-east-1')
3
ec2.stop_instances(InstanceIds=[
4
    'i-0abcd1234ef567890',
5
    'i-0efgh5678ij901234',
6
    'i-0klmn9012op345678',
7
])

The response tells you the previous and current state for each one, which makes logging and auditing easy.

Using Boto3 Waiters to Block Until a State is Reached

How do you know when an instance has actually finished starting or stopping?

Here’s a common mistake: you call start_instances, assume it’s done, and immediately try to SSH in. But the instance is still booting. Your script fails, and now you’re debugging something that wasn’t actually broken.

The fix is waiters. A Boto3 waiter is a built-in polling loop that keeps checking the AWS API until your instance reaches the state you want. It handles the timing for you and raises an error if something goes wrong or takes too long.

1
import boto3
2
from botocore.exceptions import WaiterError
3

4
def start_and_wait(instance_id: str, region: str = 'us-east-1'):
5
    """Start an instance and block until it's fully running."""
6
    ec2 = boto3.client('ec2', region_name=region)
7

8
    ec2.start_instances(InstanceIds=[instance_id])
9
    print(f'Starting {instance_id}...')
10

11
    waiter = ec2.get_waiter('instance_running')
12

13
    try:
14
        waiter.wait(
15
            InstanceIds=[instance_id],
16
            WaiterConfig={
17
                'Delay': 15,       # Check every 15 seconds
18
                'MaxAttempts': 40  # Give up after ~10 minutes
19
            }
20
        )
21
        print(f'{instance_id} is running.')
22

23
    except WaiterError as e:
24
        print(f'Waiter timed out: {e}')
25
        raise
26

27

28
def stop_and_wait(instance_id: str, region: str = 'us-east-1'):
29
    """Stop an instance and block until it's fully stopped."""
30
    ec2 = boto3.client('ec2', region_name=region)
31

32
    ec2.stop_instances(InstanceIds=[instance_id])
33
    print(f'Stopping {instance_id}...')
34

35
    waiter = ec2.get_waiter('instance_stopped')
36

37
    try:
38
        waiter.wait(
39
            InstanceIds=[instance_id],
40
            WaiterConfig={'Delay': 15, 'MaxAttempts': 40}
41
        )
42
        print(f'{instance_id} is stopped.')
43

44
    except WaiterError as e:
45
        print(f'Waiter failed: {e}')
46
        raise

The four most useful EC2 waiters are instance_running, instance_stopped, instance_terminated, and instance_exists. Each one polls describe_instances under the hood and checks the state automatically, so you don’t have to.

If you want to wait on multiple instances at once, just pass all the IDs together:

1
# Wait for several instances to stop one waiter call handles all of them
2
waiter = ec2.get_waiter('instance_stopped')
3
waiter.wait(
4
    InstanceIds=['i-0abc123', 'i-0def456', 'i-0ghi789'],
5
    WaiterConfig={'Delay': 15, 'MaxAttempts': 40}
6
)
7
print('All instances are stopped.')

Tip: Tune Delay and MaxAttempts based on your actual instances. A small instance with a simple AMI might be running in under a minute. A larger one running a heavy bootstrap script could take five or ten. If your waiter times out too often, bump up MaxAttempts before adding more complex logic.

Filtering and Querying Instances

How do you find the right instances without knowing their IDs ahead of time?

Hardcoding instance IDs is a trap. Instances get replaced, IDs change, and suddenly your script is managing the wrong thing, or nothing at all. A better approach is to query by attributes you control, like tags or state.

Querying by State

1
import boto3
2

3
def get_instances_by_state(state: str, region: str = 'us-east-1') -> list:
4
    """
5
    Return all instances in a given state.
6
    Valid states: 'running', 'stopped', 'pending', 'stopping', 'terminated'
7
    """
8
    ec2 = boto3.client('ec2', region_name=region)
9

10
    response = ec2.describe_instances(
11
        Filters=[{'Name': 'instance-state-name', 'Values': [state]}]
12
    )
13

14
    # describe_instances groups results into Reservations, so we flatten them
15
    instances = []
16
    for reservation in response['Reservations']:
17
        instances.extend(reservation['Instances'])
18

19
    return instances
20

21

22
# Find everything that's currently running
23
running = get_instances_by_state('running')
24
print(f'{len(running)} running instance(s) found.')
25

26
for inst in running:
27
    print(f"  {inst['InstanceId']}  {inst['InstanceType']}  {inst['Placement']['AvailabilityZone']}")

Querying by Tags

1
import boto3
2

3
def get_instances_by_tag(tag_key: str, tag_value: str, region: str = 'us-east-1') -> list:
4
    """Find instances that have a specific tag key/value pair."""
5
    ec2 = boto3.client('ec2', region_name=region)
6

7
    # AWS filter syntax for tags is 'tag:<key>'
8
    response = ec2.describe_instances(
9
        Filters=[{'Name': f'tag:{tag_key}', 'Values': [tag_value]}]
10
    )
11

12
    instances = []
13
    for reservation in response['Reservations']:
14
        instances.extend(reservation['Instances'])
15

16
    return instances
17

18

19
# Find all production instances
20
prod = get_instances_by_tag('Environment', 'production')
21

22
for inst in prod:
23
    # Pull the Name tag if it exists, otherwise label it 'unnamed'
24
    name = next(
25
        (t['Value'] for t in inst.get('Tags', []) if t['Key'] == 'Name'),
26
        'unnamed'
27
    )
28
    print(f"  {inst['InstanceId']}  {name}  {inst['State']['Name']}")

You can stack filters together, too. AWS applies them with AND logic, so you’ll only get instances that match all of them:

1
# Only running instances in the production environment both filters must match
2
response = ec2.describe_instances(
3
    Filters=[
4
        {'Name': 'tag:Environment', 'Values': ['production']},
5
        {'Name': 'instance-state-name', 'Values': ['running']}
6
    ]
7
)

That single call replaces what would otherwise be a manual filter in the console or a clunky post-processing loop in code.

Bulk Operations: Stopping Instances by Tag or VPC

How do you stop an entire group of instances at once without listing each ID manually?

This is where scripting really earns its keep. Instead of hunting down individual instance IDs, you describe what you want, collect the IDs, and stop them all in one shot.

Stop All Instances in a VPC

1
import boto3
2

3
def stop_instances_in_vpc(vpc_id: str, region: str = 'us-east-1') -> list:
4
    """Stop all running instances in a given VPC."""
5
    ec2 = boto3.client('ec2', region_name=region)
6

7
    # Find every running instance in this VPC
8
    response = ec2.describe_instances(
9
        Filters=[
10
            {'Name': 'vpc-id', 'Values': [vpc_id]},
11
            {'Name': 'instance-state-name', 'Values': ['running']}
12
        ]
13
    )
14

15
    instance_ids = [
16
        inst['InstanceId']
17
        for r in response['Reservations']
18
        for inst in r['Instances']
19
    ]
20

21
    if not instance_ids:
22
        print(f'Nothing running in {vpc_id}.')
23
        return []
24

25
    print(f'Stopping {len(instance_ids)} instance(s) in {vpc_id}...')
26
    ec2.stop_instances(InstanceIds=instance_ids)
27

28
    return instance_ids
29

30

31
stopped = stop_instances_in_vpc('vpc-0abc12345def67890')
32
print(f'Stopped: {stopped}')

Stop All Instances with a Specific Tag

1
import boto3
2

3
def stop_instances_by_tag(
4
    tag_key: str,
5
    tag_value: str,
6
    region: str = 'us-east-1',
7
    dry_run: bool = True
8
) -> list:
9
    """
10
    Stop all running instances that match a tag.
11
    Set dry_run=True to preview what would be stopped without actually doing it.
12
    """
13
    ec2 = boto3.client('ec2', region_name=region)
14

15
    response = ec2.describe_instances(
16
        Filters=[
17
            {'Name': f'tag:{tag_key}', 'Values': [tag_value]},
18
            {'Name': 'instance-state-name', 'Values': ['running']}
19
        ]
20
    )
21

22
    instance_ids = [
23
        inst['InstanceId']
24
        for r in response['Reservations']
25
        for inst in r['Instances']
26
    ]
27

28
    if not instance_ids:
29
        print(f'No running instances tagged {tag_key}={tag_value}.')
30
        return []
31

32
    if dry_run:
33
        print(f'[DRY RUN] Would stop {len(instance_ids)} instance(s):')
34
        for iid in instance_ids:
35
            print(f'  - {iid}')
36
        return instance_ids
37

38
    ec2.stop_instances(InstanceIds=instance_ids)
39
    print(f'Stopped {len(instance_ids)} instance(s) tagged {tag_key}={tag_value}.')
40

41
    return instance_ids
42

43

44
# Always preview before committing
45
stop_instances_by_tag('Environment', 'dev', dry_run=True)
46

47
# When you're confident, flip the flag
48
# stop_instances_by_tag('Environment', 'dev', dry_run=False)

Always build a dry_run mode into your bulk scripts. It takes two minutes to add and can save you from a very bad day. Preview first, execute second.

Error Handling: Throttling, Permissions, and Invalid States

What happens when your Boto3 script hits an error, and how do you handle it cleanly?

AWS APIs fail. Not often, but enough that your production scripts need to be ready for it. The three most common issues you’ll run into are rate limiting, missing permissions, and trying to do something that doesn’t make sense for the instance’s current state.

Here’s a function that handles all of them gracefully:

1
import boto3
2
import time
3
from botocore.exceptions import ClientError
4

5
def robust_stop_instance(
6
    instance_id: str,
7
    region: str = 'us-east-1',
8
    max_retries: int = 3
9
) -> bool:
10
    """
11
    Stop an instance with sensible error handling and retry logic.
12
    Returns True on success, False if the operation can't be completed.
13
    """
14
    ec2 = boto3.client('ec2', region_name=region)
15

16
    for attempt in range(1, max_retries + 1):
17
        try:
18
            response = ec2.stop_instances(InstanceIds=[instance_id])
19
            state = response['StoppingInstances'][0]['CurrentState']['Name']
20
            print(f'Stopped {instance_id}. Current state: {state}')
21
            return True
22

23
        except ClientError as e:
24
            code = e.response['Error']['Code']
25

26
            if code in ('RequestLimitExceeded', 'Throttling'):
27
                # Back off exponentially and try again
28
                wait_time = 2 ** attempt
29
                print(f'Rate limited. Waiting {wait_time}s before retry {attempt}/{max_retries}...')
30
                time.sleep(wait_time)
31

32
            elif code == 'UnauthorizedOperation':
33
                # No point retrying this is a permissions issue
34
                print(f'Permission denied: {e.response["Error"]["Message"]}')
35
                print('Check the IAM policy on your role or profile.')
36
                return False
37

38
            elif code == 'IncorrectInstanceState':
39
                # Instance might already be stopped or terminated
40
                print(f'{instance_id} is in an unexpected state. Skipping.')
41
                return False
42

43
            elif code == 'InvalidInstanceID.NotFound':
44
                # Wrong region, or the instance no longer exists
45
                print(f'{instance_id} not found in {region}.')
46
                return False
47

48
            else:
49
                # Something unexpected surface it clearly
50
                print(f'Unexpected error [{code}]: {e}')
51
                raise
52

53
    print(f'Gave up after {max_retries} attempts.')
54
    return False

The key error codes to know:

RequestLimitExceeded / Throttling: You’re hitting the AWS API rate limit. Use exponential backoff and retry.
UnauthorizedOperation: The IAM role or user doesn’t have the right permission. No amount of retrying will fix this you need to update the policy.
IncorrectInstanceState: You’re trying to do something the instance’s current state doesn’t allow, like stopping an instance that’s already stopped.
InvalidInstanceID.NotFound: The instance ID doesn’t exist in that region. Double-check both the ID and the region.

Frequently Asked Questions

Yes, and it’s pretty easy to do. Just create a separate client for each region you need. You can loop over a list of region names and run the same operations in each one. If you need it to go fast, concurrent.futures lets you run those calls in parallel.

1
import boto3
2
from concurrent.futures import ThreadPoolExecutor
3

4
regions = ['us-east-1', 'eu-west-1', 'ap-southeast-1']
5

6
def count_running(region):
7
    ec2 = boto3.client('ec2', region_name=region)
8
    resp = ec2.describe_instances(
9
        Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
10
    )
11
    count = sum(len(r['Instances']) for r in resp['Reservations'])
12
    print(f'{region}: {count} running instance(s)')
13

14
# Query all regions at the same time instead of one by one
15
with ThreadPoolExecutor() as pool:
16
    pool.map(count_running, regions)

At a minimum you’ll need ec2:StartInstances, ec2:StopInstances, ec2:DescribeInstances, and ec2:DescribeInstanceStatus. If you’re filtering by tags, add ec2:DescribeTags too. Scope permissions as tightly as you can using IAM condition keys, especially for production environments.

A fixed sleep is a guess. A waiter actually checks. Waiters poll the AWS API and return the moment the state changes, which means they’re faster when things go smoothly and more informative when they don’t. If the timeout is exceeded, you get a WaiterError you can handle, rather than a script that silently moves on assuming everything is fine.

Yes. boto3.client('ec2') is the low-level interface that maps directly to API calls and returns raw dictionaries. boto3.resource('ec2') gives you a higher-level, object-oriented interface with things like ec2.Instance('i-0abc123'). Both work, but the client is more flexible for filtering and bulk operations, which is why most automation scripts use it.

Mostly yes. start_instances, stop_instances, and describe_instances all work with spot instances. The main difference is that spot instances can be interrupted by AWS when capacity is reclaimed, so your automation should be ready for unexpected state changes. Use describe_spot_instance_requests to check spot-specific status and watch for interruption notices.

describe_instances only returns up to 1,000 results per call. For bigger environments, use a paginator instead of calling directly:

1
import boto3
2

3
ec2 = boto3.client('ec2', region_name='us-east-1')
4
paginator = ec2.get_paginator('describe_instances')
5

6
instances = []
7
# paginate() handles NextToken automatically no manual looping needed
8
for page in paginator.paginate(Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]):
9
    for reservation in page['Reservations']:
10
        instances.extend(reservation['Instances'])
11

12
print(f'Total running instances: {len(instances)}')

The paginator handles NextToken automatically, so you’ll always get everything without extra logic.

Use the dry_run pattern shown in the bulk operations section. Run with dry_run=True first to see exactly which instances would be affected before you commit. Beyond that, test in a staging account or a non-production VPC. Some Boto3 calls also support a DryRun=True parameter at the API level, which validates your permissions without making any changes.

AWS EC2 Instance Management with Boto3: Start, Stop, and Query Instances

AWS EC2 Instance Management with Boto3: Start, Stop, and Query Instances