Object Storage Patterns
Object storage (S3, GCS, Azure Blob) is the universal cloud storage primitive. Cheap; durable; infinitely scalable; HTTP-accessible. Modern cloud architectures rely heavily on it.
This page covers patterns for using object storage effectively.
The basics
Object storage characteristics:
- Files identified by key (path-like string)
- Each file is fully replaced or deleted; no in-place edit
- Highly durable: 99.999999999% (11 nines) for S3 Standard
- Massively scalable: petabytes; no per-bucket limits
- HTTP-accessible: GET, PUT, DELETE
- Eventually consistent in older versions; strongly consistent in modern S3
The model differs from filesystems (no directories really), block storage (fully replaced not edited), databases (key-value, not query).
What object storage is good for
Large files
Videos, images, documents, backups. Files where size matters.
Static assets
CSS, JS, images for web apps. Combined with CDN, fastest possible serving.
Data lake / lakehouse
Parquet files, JSON, CSV. Queried by Athena, BigQuery, Spark.
Backups
Database backups, log archives, snapshots. Cheap long-term storage.
Object archive
Compliance retention, legal hold. Retention rules and lifecycle to cold storage.
Event-driven workflows
S3 upload triggers Lambda; ETL pipeline begins.
What object storage is not
Database
No queries, no transactions, no joins. Don't use as a key-value DB; the access pattern doesn't fit (eventual consistency historically; per-request cost).
Filesystem
No real directories; "directory" is just a prefix. No POSIX semantics; no in-place writes. Don't mount as a filesystem for application access.
Streaming media (raw)
You can store videos in S3, but for streaming you typically want a CDN or specialized streaming infrastructure.
Cost optimization
Storage classes change cost dramatically. See [CloudStorageOptions](CloudStorageOptions) for the full breakdown.
Lifecycle rules
Move objects through storage classes based on age:
```
After 30 days → Standard-IA
After 90 days → Glacier Flexible Retrieval
After 1 year → Glacier Deep Archive
After 7 years → Delete
```
Lifecycle rules are free; the savings are real.
Compression
Compress before storing. Gzip text; specialized compression for binary.
Deduplication
Don't store the same object twice. Hash content; use hash as key.
Inventory and analyze
S3 Inventory; S3 Storage Lens. Tools to see what's actually stored, where, what's hot vs. cold.
Multi-part upload abort
Incomplete multi-part uploads accumulate. Lifecycle rule to abort after 7 days.
Specific patterns
Pre-signed URLs
For uploads from clients (mobile, web), sign a URL; client uploads directly to S3.
```
Client → app server: "I want to upload"
App server → S3: generate signed URL
App server → client: signed URL
Client → S3: PUT to signed URL with file
```
App server doesn't proxy the file. See [FileUploadPatterns](FileUploadPatterns).
Versioning
Bucket versioning preserves old versions. Useful for:
- Accidental delete protection
- Object Lock for compliance
- Audit trail
Costs extra storage; old versions stay around.
Replication
Cross-region replication for DR. Cross-account for security/compliance.
Asynchronous; some lag. For critical data, both regions accessible.
Object Lock
WORM (Write Once, Read Many). Compliance feature. Object can't be deleted or modified for retention period.
Event notifications
S3 event → Lambda, SQS, or EventBridge. The event-driven pattern.
```
S3 PUT object → Lambda triggered → Process the new file
```
For thumbnail generation, indexing, validation, etc.
Server-side encryption
SSE-S3 (S3-managed keys) or SSE-KMS (your KMS keys). Encrypt at rest. Default for many compliance regimes.
Object metadata
Each object can have user-defined metadata. Useful for tagging, lifecycle, application context.
Performance considerations
Hot keys / partition
S3 partitions buckets internally. Sequential keys (timestamps, auto-incrementing IDs) hot-spot.
For high-throughput, randomize the prefix:
```
Bad: 2026-04-26-001, 2026-04-26-002, ...
Good: a1f3-2026-04-26-001, b8e2-2026-04-26-002, ...
```
S3's internal partitioning was improved but the pattern still helps.
Multi-part upload
For large files (>100 MB), multi-part upload is faster and more resilient. The client uploads parts in parallel; failed parts retry.
Range requests
GET part of an object via Range header. Useful for partial reads of large files.
Transfer acceleration
S3 Transfer Acceleration uses CloudFront edges for faster uploads from far-away clients. Costs more; useful for global apps.
CloudFront in front
For frequent reads of public objects, CloudFront caches at edge. Massively reduces S3 cost and latency.
Specific architectural patterns
Static website hosting
S3 + CloudFront serves static sites. Fast; cheap; scales infinitely.
Data lake
Raw, processed, curated layers in S3. Athena/BigQuery for queries.
Backup destination
Primary databases dump to S3. Encrypted; lifecycle to Glacier; cross-region replication.
Build artifact storage
CI produces JARs, container images (via ECR which uses S3); stored versioned in S3.
Log archive
Application logs ship to S3 for long-term retention. SIEM ingests for active analysis.
Common failure patterns
- **Public bucket exposing data.** Periodic problem; use S3 Block Public Access.
- **No lifecycle.** Storage cost grows; Standard tier for cold data.
- **Sequential keys at scale.** Performance ceiling.
- **No encryption.** Compliance gap.
- **Forgotten multi-part uploads.** Storage cost from incomplete uploads.
- **Treating S3 as filesystem.** POSIX expectations don't apply.
Further Reading
- [CloudStorageOptions](CloudStorageOptions) — Storage class details
- [FileUploadPatterns](FileUploadPatterns) — Upload via S3
- [BatchProcessingPatterns](BatchProcessingPatterns) — Often reads from S3
- [CdnArchitecture](CdnArchitecture) — S3 + CDN pattern