ETL vs. ELT

The traditional pattern: Extract, Transform, Load (ETL). Pull data from sources, transform in a separate process, load the cleaned data into the warehouse.

The modern pattern: Extract, Load, Transform (ELT). Pull raw data, load it as-is into the warehouse, transform inside the warehouse using SQL.

The shift is real and recent. ELT is the dominant paradigm in modern data stacks.

Why ETL was the default

Old data warehouses (Teradata, Oracle, on-prem) were:

Expensive (licensed by capacity)
Constrained (limited storage and compute)
Slow at heavy transformation

Loading raw data and transforming inside was expensive. The natural pattern: clean and reduce data before loading; warehouse only sees the final shape.

Tools: Informatica, Talend, custom Python/Java jobs running outside the warehouse.

What changed

Modern cloud data warehouses (Snowflake, BigQuery, Redshift, Databricks) are:

Storage and compute decoupled
Storage cheap; compute scales elastically
SQL-on-anything capable

Loading raw data is cheap. Transformation in SQL is fast. The warehouse can do what dedicated ETL tools used to do, but in SQL that analysts can read and write.

The natural pattern flipped: load raw; transform in warehouse.

ELT in practice

Sources → Ingestion (Fivetran, Airbyte, custom) → Raw warehouse layer → dbt → Analytics tables

The warehouse holds:

Raw: as it came from the source
Staging: light cleaning, naming
Intermediate: business logic
Marts: shaped for analysis

Each layer is SQL transforms over the previous. dbt is the dominant tool for managing these transforms. See DbtAndAnalyticsEngineering.

Pros of ELT

Simplicity

Most transformations are SQL. Analysts can read and modify them. No separate ETL skill required.

Reproducibility

Raw data preserved. Transformations re-run produce the same result. Bug found? Fix the SQL; re-run.

Discoverability

Data is in the warehouse already. Analysts can explore raw and transformed data freely.

Speed

Modern warehouses are fast. Transformations in SQL on Snowflake/BigQuery often beat external ETL.

Backfills are easier

Raw data is there; just re-run transforms. No need to re-extract from sources.

Pros of ETL (still)

Some transformations don't fit SQL

Complex JSON parsing, ML feature engineering, image processing. Sometimes the transform belongs in code, not SQL.

Privacy / compliance

Sometimes you can't load raw data. PII redaction must happen before warehouse.

Source system load

Heavy raw data load can stress source systems. Lightweight extracts may be necessary.

Cost at extreme scale

ELT in cloud warehouses is cheap until it isn't. At extreme scale, dedicated processing (Spark, Flink) can beat warehouse compute.

When ETL still wins

PII or compliance requires pre-load redaction
Transformation is non-SQL-shaped (ML, complex parsing)
Source data is enormous and warehouse loading is impractical
Specific cost optimization at scale

For most modern data stacks, ELT is right. ETL is for specific cases.

The "ELTL" hybrid

Some modern stacks do:

Extract → Load (raw) → Transform (in warehouse) → Load (to operational stores)

The transformations happen in the warehouse; the result is loaded to operational stores (search indexes, caches, etc.) for low-latency access.

This is functional ELT plus a "publish" step.

Tooling

Ingestion (the "EL" part)

Fivetran, Stitch: managed connectors; pay for convenience
Airbyte: open-source equivalent
Custom: when no managed connector exists

Warehouse (the "T" target)

Snowflake: dominant; multi-cloud
BigQuery: GCP-native; very fast
Redshift: AWS-native; older
Databricks: lakehouse architecture

Transformation (the "T" engine)

dbt: dominant for SQL transforms
SQL: for simpler cases
Spark / PySpark: for non-SQL transformations

Orchestration

Airflow: heavy; classic
Prefect, Dagster: modern alternatives
dbt Cloud: integrated with dbt

Common failure patterns

Treating ELT as "load everything raw" without thinking about what's actually needed. Massive raw data with no usage is just cost.
No transformation testing. SQL transforms can have bugs; tests are needed.
Heavy lift-and-shift of old ETL. Often the right answer is to redesign around ELT, not port.
Privacy data loaded raw. PII in the warehouse without redaction.
dbt without lineage tracking. Can't tell what depends on what.