Cloud Native Application Design

"Cloud native" is a label often misapplied to mean "runs in a cloud." The useful definition: an application designed to take advantage of cloud-platform properties — elasticity, managed services, ephemerality, distributed-by-default — rather than fighting them.

This page is about what cloud-native actually requires and the patterns that make applications work well in cloud environments.

The 12-factor principles

Heroku's 12-factor app is the canonical statement. The principles that have aged well:

Codebase: one codebase per service; many deploys.
Dependencies: explicit; no system-wide assumptions.
Config: environment variables, not in code.
Backing services: databases, queues are attached resources; swappable.
Build/release/run: strict separation of build artifacts and deployment config.
Processes: stateless; share-nothing.
Port binding: app exposes HTTP via port; doesn't depend on a specific server.
Concurrency: scale via process model.
Disposability: fast startup, graceful shutdown.
Dev/prod parity: minimize environment differences.
Logs: write to stdout/stderr; aggregator handles routing.
Admin processes: one-off tasks run in the same environment as the app.

Many of these are obvious now. They weren't in 2011 when the manifesto was written. The cloud-native baseline is roughly "12-factor plus container."

What cloud-native gives you

When the application follows these patterns:

Horizontal scaling works: any instance can serve any request
Replacement works: instances can be killed and replaced without ceremony
Deployment is fast: build once, run anywhere
Observability is uniform: logs, metrics, traces flow through standard channels
Multi-region is feasible: no hidden state ties to a specific server

What cloud-native costs

The principles aren't free:

State management is harder: nothing on disk, everything in databases or cache
Sessions need a backing store: in-memory sessions break when an instance dies
File uploads need object storage: local disk is gone after restart
Background jobs need queues: not in-process work
Configuration management gets more complex: environment variables for everything

Some applications fight these constraints. Some workloads — long-running stateful services, batch jobs with local-disk needs, applications with hard-coded paths — don't fit.

Patterns that work in cloud-native

Stateless services

Application instances hold no state. State lives in databases, caches, queues. Any instance handles any request. Scaling means adding instances.

Managed backing services

Use the cloud's managed databases, queues, caches rather than self-hosting. The cost is real but operational simplicity is large.

Externalized configuration

Environment variables, config services (AWS Parameter Store, GCP Secret Manager), or sidecars for config delivery.

Health checks

/health endpoints (or similar) that the orchestrator polls. Unhealthy instances get replaced automatically.

Graceful shutdown

Receive SIGTERM, stop accepting new work, drain in-flight requests, exit. Containers and serverless platforms expect this.

Circuit breakers and retries

Distributed systems fail in distributed ways. Resilience patterns — circuit breakers, exponential backoff, bulkheads — are not optional at scale.

Observability

Structured logs, metrics on all the things, distributed traces. The ecosystem (OpenTelemetry, Prometheus, Grafana, Datadog) is standardized.

Lift-and-shift vs. re-architect

Two migration paths:

Lift-and-shift

Run the existing application in the cloud with minimal changes. Pros: fast, cheap, low risk. Cons: doesn't capture cloud benefits; often costs more than on-prem because cloud pricing assumes scaling.

Re-architect

Redesign for cloud-native. Pros: captures the benefits. Cons: long, expensive, risky.

Most successful migrations do both: lift-and-shift first to escape the data center, then re-architect incrementally. Pure rewrites usually fail.

Common patterns to avoid

Chasing cloud-native for its own sake. If your app works fine on three EC2 instances, microservices won't make it better.
Microservices when you have one team. Service boundaries should match team boundaries.
Disregarding cost. Cloud-native = many small things billed independently. Costs explode if not measured.
Treating cloud as "someone else's data center." It is, but the economics and operations are different.

Common failure patterns

Stateful local state. Disk, in-memory sessions, locally-cached files.
Manual deployment. Cloud-native expects automation.
Single-region thinking. Multi-AZ and multi-region need explicit design.
Vendor lock-in surprises. Some cloud services have no migration path.
Network as a free lunch. Cross-region traffic is expensive; design accordingly.