Codesnippets

PostgreSQL Query Optimization: Indexes, EXPLAIN ANALYZE & Execution Plans

Codesnippets

PostgreSQL Query Optimization: Indexes, EXPLAIN ANALYZE & Execution Plans

Mohammad Abu MattarPublished: 29 Mar, 202605 Mins read04 Mins listen

Need to optimize slow PostgreSQL queries? Here’s how with EXPLAIN ANALYZE and strategic indexing.

Slow database queries kill application performance. But most developers don’t know where the actual bottleneck is, so they guess at fixes. EXPLAIN ANALYZE reveals exactly what’s expensive and where to optimize.

The Problem

Why Queries Get Slow

In development with small datasets, missing indexes don’t matter. Switch to production with millions of rows, and suddenly sequential table scans become devastating. Queries that ran in 50ms take 5+ seconds. You need to know why.

Common Performance Issues

Missing indexes: Forcing full table scans on every query
N+1 queries: Fetching data in loops instead of bulk operations
Bad query plans: Using inefficient joins or sorts
Undersized indices: Index on wrong columns
Connection pool exhaustion: Running out of available connections

The Solution

EXPLAIN ANALYZE: Your Debugging Superpower

EXPLAIN ANALYZE shows exactly how PostgreSQL executes your query, including:

Which operations are most expensive (by cost)
Actual vs. estimated row counts
Time spent in each step
Seq Scans vs. Index Scans
Sort and Hash operations

TL;DR

Run EXPLAIN ANALYZE to identify expensive operations
Add indexes where sequential scans happen on large tables
Use pg_stat_statements to automatically find slow queries
Fix N+1 query patterns before rewriting complex queries
Setup connection pooling for better resource utilization

Understanding EXPLAIN ANALYZE

Basic Query Analysis

1
-- Compare these two queries
2
EXPLAIN ANALYZE
3
SELECT * FROM users WHERE created_at > '2025-01-01';
4

5
-- Output example:
6
-- Seq Scan on users  (cost=0.00..35.50 rows=1000 width=100)
7
--   Filter: (created_at > '2025-01-01')
8
--   Planning Time: 0.123 ms
9
--   Execution Time: 45.234 ms

What this tells you:

Seq Scan: Reading entire table (slow for large tables)
cost=0.00..35.50: PostgreSQL’s estimated cost
rows=1000: Expected to return 1000 rows
Execution Time: 45.234 ms: Actual time taken

Reading the Cost

1
EXPLAIN ANALYZE
2
SELECT u.id, u.name, o.total
3
FROM users u
4
LEFT JOIN orders o ON u.id = o.user_id
5
WHERE u.status = 'active';
6

7
-- Output:
8
-- Hash Left Join  (cost=2534.50..5892.33 rows=5000 width=50)
9
--   Hash Cond: (o.user_id = u.id)
10
--   ->  Seq Scan on orders o  (cost=0.00..1234.50 rows=50000 width=8)
11
--   ->  Hash  (cost=2500.00..2500.00 rows=5000 width=42)
12
--         ->  Index Scan using idx_users_status on users u  (cost=10.00..2500.00 rows=5000 width=42)

Cost interpretation:

Lower cost = faster execution
cost=A..B: A = startup cost, B = total cost
Focus on the highest-cost operations first

Creating Effective Indexes

Basic Index Creation

1
-- Simple B-tree index (most common)
2
CREATE INDEX idx_users_email ON users(email);
3

4
-- Multi-column index
5
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
6

7
-- Partial index (only index active users)
8
CREATE INDEX idx_users_active ON users(id) WHERE status = 'active';
9

10
-- Unique index (constraint + performance)
11
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

Index Types: When to Use Each

1
-- B-tree (default, best for most queries)
2
CREATE INDEX idx_standard ON table_name(column);
3

4
-- Hash (equality only, rarely faster than B-tree)
5
CREATE INDEX idx_hash ON table_name USING hash(column);
6

7
-- GiST (geometric, full-text search)
8
CREATE INDEX idx_fulltext ON articles USING gist(search_vector);
9

10
-- GIN (array, JSON, full-text - faster for large result sets)
11
CREATE INDEX idx_json ON logs USING gin(metadata);

Partial Indexes for Common Cases

1
-- Don't index inactive records
2
CREATE INDEX idx_active_orders ON orders(id) WHERE status != 'cancelled';
3

4
-- Only index recent data
5
CREATE INDEX idx_recent_events ON events(id) WHERE created_at > NOW() - INTERVAL '90 days';
6

7
-- Saves space and speeds up queries on active data

Multi-Column Index Strategy

1
-- For WHERE + ORDER BY combinations
2
-- Query: WHERE user_id = ? ORDER BY created_at DESC
3
CREATE INDEX idx_user_date ON orders(user_id, created_at DESC);
4

5
-- Column order matters: Put filtered columns first
6
-- Good:   WHERE status = ? AND type = ?
7
CREATE INDEX idx_good ON events(status, type);
8

9
-- Bad:    WHERE status = ? AND type = ?
10
CREATE INDEX idx_bad ON events(type, status);  -- Wrong order

Detecting and Fixing N+1 Queries

The N+1 Problem

1
# BAD: N+1 queries (1 + N queries)
2
users = db.query(User).all()  # Query 1
3
for user in users:
4
    orders = db.query(Order).filter(Order.user_id == user.id).all()  # Queries 2 to N+1
5
    print(f"{user.name}: {len(orders)} orders")
6

7
# Result with 1000 users = 1001 queries

The Fix: Eager Loading

1
# GOOD: Eager load with join
2
from sqlalchemy import joinedload
3

4
users = db.query(User).options(joinedload(User.orders)).all()  # Single query with join
5

6
for user in users:
7
    print(f"{user.name}: {len(user.orders)} orders")

SQL Equivalent

1
-- BAD (N+1):
2
SELECT * FROM users;  -- 1000 queries...
3
SELECT * FROM orders WHERE user_id = ?;
4

5
-- GOOD (Single query):
6
SELECT u.id, u.name, o.id, o.total
7
FROM users u
8
LEFT JOIN orders o ON u.id = o.user_id;

Finding Slow Queries Automatically

Enable Query Logging

1
-- Enable logging of slow queries (500ms+)
2
ALTER SYSTEM SET log_min_duration_statement = 500;
3
SELECT pg_reload_conf();
4

5
-- View current setting
6
SHOW log_min_duration_statement;

Use pg_stat_statements

1
-- Install extension
2
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
3

4
-- Find slowest queries
5
SELECT query, calls, mean_time, max_time
6
FROM pg_stat_statements
7
ORDER BY mean_time DESC
8
LIMIT 10;
9

10
-- Find queries that were called most
11
SELECT query, calls, mean_time
12
FROM pg_stat_statements
13
ORDER BY calls DESC
14
LIMIT 10;
15

16
-- Clear stats to get fresh baseline
17
SELECT pg_stat_statements_reset();

Query Optimization Patterns

Pattern 1: Missing Index on WHERE Clause

1
-- Before: Seq Scan (slow)
2
EXPLAIN ANALYZE
3
SELECT * FROM orders WHERE customer_id = 42;
4

5
-- Fix: Add index
6
CREATE INDEX idx_orders_customer ON orders(customer_id);
7

8
-- After: Index Scan (fast)
9
EXPLAIN ANALYZE
10
SELECT * FROM orders WHERE customer_id = 42;

Pattern 2: Inefficient Subqueries

1
-- SLOW: Subquery evaluated for each row
2
SELECT * FROM orders
3
WHERE customer_id IN (
4
  SELECT id FROM customers WHERE status = 'premium'
5
);
6

7
-- FAST: Use JOIN instead
8
SELECT DISTINCT o.*
9
FROM orders o
10
JOIN customers c ON o.customer_id = c.id
11
WHERE c.status = 'premium';

Pattern 3: Using Functions in WHERE Clauses

1
-- SLOW: Function applied to indexed column
2
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
3

4
-- FAST: Use expression index
5
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
6
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
7

8
-- OR: Normalize incoming data instead
9
SELECT * FROM users WHERE email = '[email protected]';

Pattern 4: Missing ORDER BY Index

1
-- SLOW: Sort step required
2
EXPLAIN ANALYZE
3
SELECT * FROM events ORDER BY created_at DESC LIMIT 10;
4

5
-- Fix: Add index for sort
6
CREATE INDEX idx_events_date_desc ON events(created_at DESC);
7

8
-- Now uses Index Scan + Limit, no Sort step

Connection Pooling

Why Connection Pooling Matters

1
-- Without pooling: Each request opens/closes connection (expensive)
2
-- With pooling: Reuse existing connections (cheap)

PgBouncer Configuration

1
[databases]
2
myapp = host=localhost port=5432 dbname=myapp_prod
3

4
[pgbouncer]
5
listen_port = 6432
6
listen_addr = 127.0.0.1
7
auth_type = md5
8
auth_file = /etc/pgbouncer/userlist.txt
9

10
# Connection pooling settings
11
pool_mode = transaction
12
max_client_conn = 1000
13
default_pool_size = 25
14
min_pool_size = 10
15
reserve_pool_size = 5
16
reserve_pool_timeout = 3

Application Connection

1
# Before pooling: Direct to PostgreSQL
2
import psycopg2
3
conn = psycopg2.connect("dbname=myapp user=postgres host=localhost")
4

5
# After pooling: Connect to PgBouncer on port 6432
6
import psycopg2
7
conn = psycopg2.connect("dbname=myapp user=postgres host=localhost port=6432")

Real-World Optimization Workflow

Step 1: Identify the Slow Query

# From application logs or pg_stat_statements
# Example: SELECT query is taking 8000ms

Step 2: Run EXPLAIN ANALYZE

1
EXPLAIN ANALYZE
2
SELECT o.id, o.total, c.name
3
FROM orders o
4
JOIN customers c ON o.customer_id = c.id
5
WHERE o.created_at > '2025-01-01'
6
ORDER BY o.total DESC;

Step 3: Look for Expensive Operations

1
Sort (cost=9234.50..9244.50)  ← Look here
2
  ->  Hash Join (cost=2500.00..9234.00)  ← And here
3
        ->  Seq Scan on orders o (cost=0.00..5000.00)  ← Sequential scan on large table
4
        ->  Hash (cost=100.00..100.00 rows=1000)
5
              ->  Seq Scan on customers c (cost=0.00..100.00)

Step 4: Add Indexes

1
-- Index for WHERE clause
2
CREATE INDEX idx_orders_date ON orders(created_at);
3

4
-- Index for JOIN condition
5
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
6

7
-- Index for ORDER BY
8
CREATE INDEX idx_orders_total ON orders(total DESC);

Step 5: Rerun EXPLAIN ANALYZE

1
-- Should now use Index Scans instead of Seq Scans
2
-- No Sort step if you have the right index

Best Practices

1. Index Naming Convention

1
-- Consistent naming helps find related indexes
2
CREATE INDEX idx_table_column_type ON table_name(column);
3

4
-- Examples:
5
CREATE INDEX idx_users_email ON users(email);
6
CREATE INDEX idx_orders_customer_date ON orders(customer_id, created_at);
7
CREATE INDEX idx_products_active ON products(id) WHERE active = true;

2. Monitor Index Usage

1
-- Find unused indexes (bloat)
2
SELECT schemaname, tablename, indexname
3
FROM pg_indexes
4
WHERE schemaname NOT IN ('pg_catalog', 'information_schema');
5

6
-- Check if index is being used
7
SELECT
8
  indexrelname,
9
  idx_scan,
10
  idx_tup_read,
11
  idx_tup_fetch
12
FROM pg_stat_user_indexes
13
ORDER BY idx_scan DESC;

3. Regular Maintenance

1
-- Analyze table stats (for query planner)
2
ANALYZE users;
3

4
-- Vacuum to clean up dead rows
5
VACUUM FULL users;
6

7
-- Reindex if fragmented (heavy write workloads)
8
REINDEX INDEX idx_users_email;

You might also enjoy