Blog post image for PostgreSQL Query Optimization: Indexes, EXPLAIN ANALYZE & Execution Plans - Master PostgreSQL query optimization with practical examples. Learn EXPLAIN ANALYZE interpretation, effective index strategies, query rewriting, and connection pooling to identify and fix slow queries in production microservices.
Codesnippets

PostgreSQL Query Optimization: Indexes, EXPLAIN ANALYZE & Execution Plans

PostgreSQL Query Optimization: Indexes, EXPLAIN ANALYZE & Execution Plans

05 Mins read

Need to optimize slow PostgreSQL queries? Here’s how with EXPLAIN ANALYZE and strategic indexing.

Slow database queries kill application performance. But most developers don’t know where the actual bottleneck is, so they guess at fixes. EXPLAIN ANALYZE reveals exactly what’s expensive and where to optimize.

The Problem

Why Queries Get Slow

In development with small datasets, missing indexes don’t matter. Switch to production with millions of rows, and suddenly sequential table scans become devastating. Queries that ran in 50ms take 5+ seconds. You need to know why.

Common Performance Issues

  • Missing indexes: Forcing full table scans on every query
  • N+1 queries: Fetching data in loops instead of bulk operations
  • Bad query plans: Using inefficient joins or sorts
  • Undersized indices: Index on wrong columns
  • Connection pool exhaustion: Running out of available connections

The Solution

EXPLAIN ANALYZE: Your Debugging Superpower

EXPLAIN ANALYZE shows exactly how PostgreSQL executes your query, including:

  • Which operations are most expensive (by cost)
  • Actual vs. estimated row counts
  • Time spent in each step
  • Seq Scans vs. Index Scans
  • Sort and Hash operations

TL;DR

  • Run EXPLAIN ANALYZE to identify expensive operations
  • Add indexes where sequential scans happen on large tables
  • Use pg_stat_statements to automatically find slow queries
  • Fix N+1 query patterns before rewriting complex queries
  • Setup connection pooling for better resource utilization

Understanding EXPLAIN ANALYZE

Basic Query Analysis

-- Compare these two queries
EXPLAIN ANALYZE
SELECT * FROM users WHERE created_at > '2025-01-01';
-- Output example:
-- Seq Scan on users (cost=0.00..35.50 rows=1000 width=100)
-- Filter: (created_at > '2025-01-01')
-- Planning Time: 0.123 ms
-- Execution Time: 45.234 ms

What this tells you:

  • Seq Scan: Reading entire table (slow for large tables)
  • cost=0.00..35.50: PostgreSQL’s estimated cost
  • rows=1000: Expected to return 1000 rows
  • Execution Time: 45.234 ms: Actual time taken

Reading the Cost

EXPLAIN ANALYZE
SELECT u.id, u.name, o.total
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.status = 'active';
-- Output:
-- Hash Left Join (cost=2534.50..5892.33 rows=5000 width=50)
-- Hash Cond: (o.user_id = u.id)
-- -> Seq Scan on orders o (cost=0.00..1234.50 rows=50000 width=8)
-- -> Hash (cost=2500.00..2500.00 rows=5000 width=42)
-- -> Index Scan using idx_users_status on users u (cost=10.00..2500.00 rows=5000 width=42)

Cost interpretation:

  • Lower cost = faster execution
  • cost=A..B: A = startup cost, B = total cost
  • Focus on the highest-cost operations first

Creating Effective Indexes

Basic Index Creation

-- Simple B-tree index (most common)
CREATE INDEX idx_users_email ON users(email);
-- Multi-column index
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
-- Partial index (only index active users)
CREATE INDEX idx_users_active ON users(id) WHERE status = 'active';
-- Unique index (constraint + performance)
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

Index Types: When to Use Each

-- B-tree (default, best for most queries)
CREATE INDEX idx_standard ON table_name(column);
-- Hash (equality only, rarely faster than B-tree)
CREATE INDEX idx_hash ON table_name USING hash(column);
-- GiST (geometric, full-text search)
CREATE INDEX idx_fulltext ON articles USING gist(search_vector);
-- GIN (array, JSON, full-text - faster for large result sets)
CREATE INDEX idx_json ON logs USING gin(metadata);

Partial Indexes for Common Cases

-- Don't index inactive records
CREATE INDEX idx_active_orders ON orders(id) WHERE status != 'cancelled';
-- Only index recent data
CREATE INDEX idx_recent_events ON events(id) WHERE created_at > NOW() - INTERVAL '90 days';
-- Saves space and speeds up queries on active data

Multi-Column Index Strategy

-- For WHERE + ORDER BY combinations
-- Query: WHERE user_id = ? ORDER BY created_at DESC
CREATE INDEX idx_user_date ON orders(user_id, created_at DESC);
-- Column order matters: Put filtered columns first
-- Good: WHERE status = ? AND type = ?
CREATE INDEX idx_good ON events(status, type);
-- Bad: WHERE status = ? AND type = ?
CREATE INDEX idx_bad ON events(type, status); -- Wrong order

Detecting and Fixing N+1 Queries

The N+1 Problem

# BAD: N+1 queries (1 + N queries)
users = db.query(User).all() # Query 1
for user in users:
orders = db.query(Order).filter(Order.user_id == user.id).all() # Queries 2 to N+1
print(f"{user.name}: {len(orders)} orders")
# Result with 1000 users = 1001 queries 🔥

The Fix: Eager Loading

# GOOD: Eager load with join
from sqlalchemy import joinedload
users = db.query(User).options(joinedload(User.orders)).all() # Single query with join
for user in users:
print(f"{user.name}: {len(user.orders)} orders")

SQL Equivalent

-- BAD (N+1):
SELECT * FROM users; -- 1000 queries...
SELECT * FROM orders WHERE user_id = ?;
-- GOOD (Single query):
SELECT u.id, u.name, o.id, o.total
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;

Finding Slow Queries Automatically

Enable Query Logging

-- Enable logging of slow queries (500ms+)
ALTER SYSTEM SET log_min_duration_statement = 500;
SELECT pg_reload_conf();
-- View current setting
SHOW log_min_duration_statement;

Use pg_stat_statements

-- Install extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Find slowest queries
SELECT query, calls, mean_time, max_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;
-- Find queries that were called most
SELECT query, calls, mean_time
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 10;
-- Clear stats to get fresh baseline
SELECT pg_stat_statements_reset();

Query Optimization Patterns

Pattern 1: Missing Index on WHERE Clause

-- Before: Seq Scan (slow)
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42;
-- Fix: Add index
CREATE INDEX idx_orders_customer ON orders(customer_id);
-- After: Index Scan (fast)
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42;

Pattern 2: Inefficient Subqueries

-- SLOW: Subquery evaluated for each row
SELECT * FROM orders
WHERE customer_id IN (
SELECT id FROM customers WHERE status = 'premium'
);
-- FAST: Use JOIN instead
SELECT DISTINCT o.*
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.status = 'premium';

Pattern 3: Using Functions in WHERE Clauses

-- SLOW: Function applied to indexed column
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
-- FAST: Use expression index
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
-- OR: Normalize incoming data instead
SELECT * FROM users WHERE email = '[email protected]';

Pattern 4: Missing ORDER BY Index

-- SLOW: Sort step required
EXPLAIN ANALYZE
SELECT * FROM events ORDER BY created_at DESC LIMIT 10;
-- Fix: Add index for sort
CREATE INDEX idx_events_date_desc ON events(created_at DESC);
-- Now uses Index Scan + Limit, no Sort step

Connection Pooling

Why Connection Pooling Matters

-- Without pooling: Each request opens/closes connection (expensive)
-- With pooling: Reuse existing connections (cheap)

PgBouncer Configuration

pgbouncer.ini
[databases]
myapp = host=localhost port=5432 dbname=myapp_prod
[pgbouncer]
listen_port = 6432
listen_addr = 127.0.0.1
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
# Connection pooling settings
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
min_pool_size = 10
reserve_pool_size = 5
reserve_pool_timeout = 3

Application Connection

# Before pooling: Direct to PostgreSQL
import psycopg2
conn = psycopg2.connect("dbname=myapp user=postgres host=localhost")
# After pooling: Connect to PgBouncer on port 6432
import psycopg2
conn = psycopg2.connect("dbname=myapp user=postgres host=localhost port=6432")

Real-World Optimization Workflow

Step 1: Identify the Slow Query

Terminal window
# From application logs or pg_stat_statements
# Example: SELECT query is taking 8000ms

Step 2: Run EXPLAIN ANALYZE

EXPLAIN ANALYZE
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.created_at > '2025-01-01'
ORDER BY o.total DESC;

Step 3: Look for Expensive Operations

Sort (cost=9234.50..9244.50) ← Look here
-> Hash Join (cost=2500.00..9234.00) ← And here
-> Seq Scan on orders o (cost=0.00..5000.00) ← Sequential scan on large table
-> Hash (cost=100.00..100.00 rows=1000)
-> Seq Scan on customers c (cost=0.00..100.00)

Step 4: Add Indexes

-- Index for WHERE clause
CREATE INDEX idx_orders_date ON orders(created_at);
-- Index for JOIN condition
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
-- Index for ORDER BY
CREATE INDEX idx_orders_total ON orders(total DESC);

Step 5: Rerun EXPLAIN ANALYZE

-- Should now use Index Scans instead of Seq Scans
-- No Sort step if you have the right index

Best Practices

1. Index Naming Convention

-- Consistent naming helps find related indexes
CREATE INDEX idx_table_column_type ON table_name(column);
-- Examples:
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_customer_date ON orders(customer_id, created_at);
CREATE INDEX idx_products_active ON products(id) WHERE active = true;

2. Monitor Index Usage

-- Find unused indexes (bloat)
SELECT schemaname, tablename, indexname
FROM pg_indexes
WHERE schemaname NOT IN ('pg_catalog', 'information_schema');
-- Check if index is being used
SELECT
indexrelname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

3. Regular Maintenance

-- Analyze table stats (for query planner)
ANALYZE users;
-- Vacuum to clean up dead rows
VACUUM FULL users;
-- Reindex if fragmented (heavy write workloads)
REINDEX INDEX idx_users_email;

Resources

Related Posts

You might also enjoy

Check out some of our other posts on similar topics

Redis Caching Patterns: Cache-Aside, Write-Through & Cache Invalidation

Redis Caching Patterns: Cache-Aside, Write-Through & Cache Invalidation

Need to scale your backend without throwing money at servers? Redis caching patterns are your answer. Most databases can handle hundreds of queries per second, but thousands? Your app slows to a

AWS EC2 Instance Management with Boto3: Start, Stop, and Query Instances

AWS EC2 Instance Management with Boto3: Start, Stop, and Query Instances

If you've ever spent 20 minutes clicking through the AWS Console just to stop a handful of dev instances, you already know the pain. It's tedious, it doesn't scale, and one wrong click can ruin your a

Optimizing your python code with __slots__?

Optimizing your python code with __slots__?

Memory Optimization with slots Understanding the Problem Dev Tip: Optimizing Data Models in Big Data Workflows with slots In big data and MLOps workflows, you often work with

Multi-Environment Secret Management with HashiCorp Vault

Multi-Environment Secret Management with HashiCorp Vault

Need to manage secrets safely across multiple environments? Here's how with HashiCorp Vault. Storing secrets in .env files, hardcoding them, or even using separate secret managers per environme

Top 7 Open Source OCR Models for Document Processing

Top 7 Open Source OCR Models for Document Processing

AI Tool Turn your documents into perfect digital copies with these powerful open source OCR models. No more dealing with messy text extraction get clean, accurate markdown from PDFs, images, and

Why printf Beats echo in Linux Scripts

Why printf Beats echo in Linux Scripts

Scripting Tip You know that feeling when a script works perfectly on your machine but fails miserably somewhere else? That's probably because you're using echo for output. Let me show you why pri

6 related posts