Time-Series Databases

Time-series data: measurements indexed by time. Sensor readings, metrics, financial ticks, log events. Each point has a timestamp; you usually query ranges of time.

General-purpose databases handle this poorly at scale. Time-series databases are optimized for the specific access patterns.

This page covers when they fit and the major options.

What makes time-series different

Data shape

Mostly inserts, append-only. Updates rare; deletes mostly via retention.

```

metric_name | timestamp | value | tags

cpu_usage | t1 | 0.45 | host=A region=us

cpu_usage | t2 | 0.47 | host=A region=us

cpu_usage | t3 | 0.52 | host=A region=us

```

Each insert is small; the volume is enormous.

Queries

- Aggregations over time windows: "average CPU per minute over last hour"

- Recent data dominates: "last 5 minutes" is queried more than "last 5 years"

- Downsampling: convert high-resolution data to lower resolution for older data

Lifecycle

Data has retention: keep raw for 30 days, downsampled for 1 year, monthly aggregates for 5 years. Older data automatically deleted.

Why general-purpose databases struggle

Index size

A typical PostgreSQL setup with B-tree indexes on timestamp + tags: indexes get huge as data grows. Inserts slow.

Aggregations

Computing "average over 1 hour" requires scanning many rows. Without specialized storage, slow.

Compression

Time-series data compresses extremely well (similar values close in time). General DBs don't optimize for this.

Retention

Deleting old data is expensive in MVCC databases. Time-series DBs handle this efficiently via partitioning.

The major options

InfluxDB

Purpose-built time-series database. Tag-based; SQL-like query language (Flux); good performance.

InfluxDB 1.x vs. 2.x: significant changes; the v2 API is different. Pick deliberately.

TimescaleDB

PostgreSQL extension. Time-series features (chunked storage, automatic partitioning) on top of PostgreSQL.

For organizations that want PostgreSQL ecosystem + time-series performance.

Prometheus

Metrics-focused. Pull-based (scrapes targets). Built-in query language (PromQL). Limited durability — typically used with long-term storage backend (Thanos, Cortex, Mimir).

Standard for Kubernetes/cloud-native monitoring.

Apache Druid

Real-time analytics on time-series. Heavyweight; for large-scale analytics.

ClickHouse

Columnar OLAP database. Very fast aggregations; not strictly time-series but used for it.

TimestreamDB (AWS)

Managed time-series on AWS. Serverless; pay-per-use.

For most use cases:

- **Metrics monitoring**: Prometheus + long-term storage

- **General time-series application data**: TimescaleDB

- **High volume custom metrics**: InfluxDB or ClickHouse

Specific patterns

Downsampling

High-resolution data is expensive to store long-term. Periodically aggregate:

```

Raw 1-second data → 1-minute averages (kept 90 days) → 1-hour averages (kept 1 year)

```

Most time-series DBs have automated downsampling.

Continuous aggregates

TimescaleDB feature: precomputed aggregates that update incrementally. Queries hit the precomputed view instead of raw data.

Tags / labels

Time-series points have tags ("host=A", "region=us"). Tags index the data; queries filter and group by tags.

Cardinality matters: too many distinct tag combinations explode storage.

Retention policies

Built-in: "keep raw 30 days; aggregated 1 year; monthly summaries forever."

Without retention: data grows until storage fills.

Compression

Time-series compresses 10-100x. The DB handles this; you don't manage manually.

When time-series DBs are right

- **Metrics**: CPU, memory, request latency, business metrics

- **IoT data**: sensor readings

- **Financial ticks**: high-frequency trading data

- **Application telemetry**: per-request timings, custom counters

- **Log events** (sometimes): when grouped by time

When they're not

Transactional data

Order created at time T isn't really "time-series" — it's an order. Use a relational DB.

Heavy updates to existing points

Time-series DBs assume append-mostly. Update-heavy workloads don't fit.

Ad-hoc analytical queries

Some time-series DBs are limited in query expressiveness. For complex analysis, OLAP or warehouse.

Small data

A few thousand points per day fit in any database. Don't introduce time-series infrastructure for small needs.

Cardinality management

The biggest scaling concern. Each unique combination of tags is a "series."

Bad:

```

tags: user_id=user-123 ← creates a new series for every user

```

Good:

```

tags: country=US, plan=premium ← bounded set

```

High cardinality (millions of series) crushes most time-series DBs. Plan tag schemas accordingly.

Common failure patterns

- **Using time-series for non-time-series.** Wrong tool.

- **High cardinality.** Performance collapse.

- **No retention.** Storage explodes.

- **No downsampling.** Hot data old; expensive to query.

- **Custom dashboards reading raw data.** Slow; should use precomputed.

- **Single time-series DB for both metrics and logs.** Different access patterns; usually want different tools.

A reasonable starter

For monitoring needs: Prometheus + Grafana for current; long-term storage (Thanos/Mimir) if retention matters.

For application time-series: TimescaleDB. Familiar SQL; strong performance.

For very high-cardinality or specialized needs: evaluate InfluxDB, ClickHouse.

Further Reading

- [CloudDatabases](CloudDatabases) — Database options

- [ElasticsearchFundamentals](ElasticsearchFundamentals) — Adjacent for log data

- [CloudMonitoring](CloudMonitoring) — Where metrics fit