DevOpsAndSre Hub

This cluster covers the operational discipline of running software in production — automated delivery, deployment patterns, observability, on-call practice, and the SRE core principles. The orientation is concrete: practices that make the difference between a stable production system and an unstable one.

Delivery

- [DevOpsFundamentals](DevOpsFundamentals) — What DevOps actually changed; what it did not

- [CiCdPipelines](CiCdPipelines) — Pipeline design, stages, the patterns that scale

- [TrunkBasedDevelopment](TrunkBasedDevelopment) — Trunk vs. GitFlow, the case for trunk

- [GitWorkflows](GitWorkflows) — Branch strategies, merge vs. rebase, commit hygiene

- [MonorepoVsPolyrepo](MonorepoVsPolyrepo) — The trade-offs at scale

- [FeatureToggleManagement](FeatureToggleManagement) — Flag types, lifecycle, retirement

- [ReleaseEngineering](ReleaseEngineering) — Release artifacts, signing, rollback

- [ReleasePlanning](ReleasePlanning) — Sequencing, dependencies, communication

Operations and Resiliency

- [OnCallPractices](OnCallPractices) — Rotation, escalation, blameless postmortems

- [RunbookAutomation](RunbookAutomation) — Runbooks that work; automating the recoverable

- [StatusPageBestPractices](StatusPageBestPractices) — Public status pages, customer communication

- [ToilReductionStrategies](ToilReductionStrategies) — Identifying and eliminating operational toil

- [ScheduledTaskManagement](ScheduledTaskManagement) — Cron, scheduled jobs, the patterns that survive

- [Auto Scaling Strategies](AutoScalingStrategies) — Horizontal vs. Vertical, predictive scaling, and cost control

- [Health Check Patterns](HealthCheckPatterns) — Liveness, readiness, and deep-health checks in distributed systems

Observability Implementation

Technical standards for monitoring and insight across the project ecosystem.

- [Observability and Monitoring Blueprint](ObservabilityAndMonitoringBlueprint) — Unified standard for OTel, Prometheus, and Grafana

- [Monitoring and Alerting](MonitoringAndAlerting) — The architecture of insight: metrics, logs, and traces

- [AI Observability in Production](AiObservabilityInProduction) — Monitoring LLM drift, safety, and evaluation metrics

Infrastructure and Tooling

- [Kubernetes Basics](KubernetesBasics) — Pods, Deployments, Services, and the K8s object model

- [Docker Deployment](DockerDeployment) — Containerizing applications for portable production

- [Secrets Management](SecretsManagement) — Storing and rotating credentials in a secure pipeline

- [Rate Limiting and Throttling](RateLimitingAndThrottling) — Protecting services from resource exhaustion

- [ServiceMeshArchitecture](ServiceMeshArchitecture) — When the mesh is worth the complexity

- [Container Security](ContainerSecurity) — Hardening the runtime and the image supply chain

Adjacent clusters

- [Cloud Platforms Hub](CloudPlatformsHub) — Where DevOps practices land in cloud

- [Software Engineering Practices Hub](SoftwareEngineeringPracticesHub) — Code-side disciplines

- [Web Services and APIs Hub](WebServicesAndApisHub) — Service-level concerns