Database Partitioning

Partitioning is splitting one logical table across multiple physical tables. Same database; same schema; different storage and indexing per partition. Different from sharding (which splits across machines).

In Postgres, declarative partitioning since PG10 makes this a manageable operational technique. It's the right answer for some specific patterns and the wrong answer for many.

What partitioning actually solves

Time-series data that grows forever. Drop old partitions instead of DELETE (which is expensive and produces bloat).
Very large tables (10s-100s of GB). Index maintenance and vacuum become per-partition; scales better than one huge table.
Multi-tenant workloads with per-tenant access patterns. Partition by tenant; queries hit only the relevant partition.
Data with natural physical locality that maps to partitions cleanly.

What partitioning doesn't solve:

Single-row latency. Hitting one partition is similar to hitting one table; no magic speed-up.
Scaling to multiple machines. That's sharding; see DatabaseSharding.
Bad query patterns. A bad query is bad on a partitioned table too.

The three partitioning strategies

Range partitioning

CREATE TABLE events (
    id BIGINT,
    occurred_at TIMESTAMPTZ NOT NULL,
    payload JSONB
) PARTITION BY RANGE (occurred_at);

CREATE TABLE events_2026_q1 PARTITION OF events
    FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');

CREATE TABLE events_2026_q2 PARTITION OF events
    FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');

Most common. Time-series workloads.

List partitioning

CREATE TABLE users (...) PARTITION BY LIST (country_code);

CREATE TABLE users_us PARTITION OF users FOR VALUES IN ('US');
CREATE TABLE users_eu PARTITION OF users FOR VALUES IN ('DE','FR','IT','ES');

Useful for known small-cardinality categorical splits.

Hash partitioning

CREATE TABLE messages (...) PARTITION BY HASH (conversation_id);

CREATE TABLE messages_p0 PARTITION OF messages FOR VALUES WITH (MODULUS 8, REMAINDER 0);
-- ... seven more

For when you want even distribution but no natural key. Rarely the right pick — if you have a natural key, range / list usually works better.

Picking a partition key

Like sharding, the partition key is hard to change once you've committed. Choose carefully.

Rules:

The key should appear in most query WHERE clauses. Otherwise the planner scans every partition.
The key shouldn't have skew. A partition with 80% of the data isn't really partitioned.
For range partitioning, the key should be monotonically increasing or otherwise predictable so you can plan partition creation.
For multi-tenant, partition key = tenant_id, with sub-partitioning if needed.

For time-series: partition by month / quarter / year depending on data volume. Monthly is common for high-volume; quarterly for moderate.

Constraint exclusion / partition pruning

The win is partition pruning. A query like:

SELECT * FROM events WHERE occurred_at >= '2026-04-01' AND occurred_at < '2026-04-30';

Hits only the events_2026_q2 partition. Postgres skips the others.

Verify with EXPLAIN:

Append
  -> Seq Scan on events_2026_q2  (only this one)

If the plan shows scans on partitions that shouldn't match, your query doesn't use partition keys appropriately, or your queries aren't constraint-exclusion-compatible.

Indexes on partitioned tables

Indexes are per-partition. Creating an index on the parent declares it for all partitions:

CREATE INDEX ON events (user_id);
-- Postgres creates per-partition indexes

This avoids having to remember to add indexes per new partition. Cost: more total index storage; some maintenance overhead.

Constraints (UNIQUE, FK) are also per-partition. A unique constraint must include the partition key; otherwise it can't be enforced across partitions.

Operational concerns

Adding partitions

For range partitioning, schedule new-partition creation:

-- Monthly job
CREATE TABLE events_2026_05 PARTITION OF events
    FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');

pg_partman automates this. Worth installing for any production time-series workload.

Forget to create a future partition; inserts that don't match any partition error out. Always have N+1 partitions ready.

Dropping partitions

For time-series with retention, drop old partitions:

DROP TABLE events_2024_q1;

Fast; no DELETE-and-vacuum overhead. The biggest practical reason to partition by time.

Detaching partitions

Take a partition out of the parent without dropping it:

ALTER TABLE events DETACH PARTITION events_2024_q1;
-- events_2024_q1 is now an independent table

Useful for archival or moving partitions to a separate database.

Vacuum

Vacuum runs per-partition. For very large parents, this is a win — autovacuum can keep up better with smaller partitions than one giant table.

Be careful: indexes on partitions also need their own vacuum. Default autovacuum settings sometimes don't cover all partitions equally well; tune per-partition for hot ones.

Index maintenance

REINDEX per partition. pg_repack works per partition.

Partition-wise joins / aggregations

Postgres can sometimes execute joins per-partition (each partition joined with the other table; results unioned). Faster for some queries; needs the partition key on both sides.

Things people get wrong

Forgetting the partition key in queries. "Why is this slow?" Because the planner has to scan all partitions. Always include the partition key.

Partitioning too granularly. 1000 partitions of 1MB each = overhead exceeds benefit. Partition into chunks that are big enough to matter (10 GB+ each is typical).

Partitioning prematurely. A 5 GB table doesn't need partitioning. Reach for partitioning above ~50 GB or where retention-driven drops are needed.

Using hash partitioning when range / list would do. Hash hides the meaning of the partition; debugging is harder.

Cross-partition unique constraints. Can only enforce uniqueness within one partition unless the partition key is in the constraint.

Cross-partition foreign keys. Limited support. Plan accordingly.

When to skip partitioning

Tables under ~50 GB — vacuum and indexing are fine without partitioning.
Workloads without natural partition keys — forcing one creates pain.
Cross-partition queries dominate — partitioning adds cost without benefit.
You're considering sharding instead — if you're going to need multiple machines anyway, look at sharding now.

Comparison to sharding

Partitioning and sharding sound similar but solve different problems:

	Partitioning	Sharding
Scope	Single database	Multiple databases / machines
Use case	Big table; retention	Beyond single-machine capacity
Operational complexity	Moderate	High
Query routing	Database planner	Application or proxy layer
Cross-shard / cross-partition queries	Native (one DB)	Application-level fan-out

Partitioning is the cheaper of the two; reach for it first. Move to sharding only when partitioning doesn't keep up.

A concrete example: time-series partitioning

For an events table with 50M rows/month, retain 12 months:

CREATE TABLE events (
    id BIGINT,
    occurred_at TIMESTAMPTZ NOT NULL,
    user_id BIGINT,
    type TEXT,
    payload JSONB,
    PRIMARY KEY (id, occurred_at)  -- include partition key
) PARTITION BY RANGE (occurred_at);

-- Indexes
CREATE INDEX ON events (user_id, occurred_at);
CREATE INDEX ON events USING BRIN (occurred_at);

-- Use pg_partman for partition lifecycle
SELECT partman.create_parent('public.events', 'occurred_at', 'native', 'monthly');
SELECT partman.config_parent('public.events', retention => '12 months');

-- Daily partman maintenance creates new partitions and drops old

This handles 600M rows/year, queries hit one or two partitions, retention is cheap.