CircuitBreakerPattern

The Circuit Breaker pattern is a stability mechanism that prevents a service from repeatedly attempting an operation that is likely to fail. It acts as a stateful proxy between a caller and a failing dependency, protecting local resources (threads, memory) from being exhausted by slow or dead downstream services.

The Finite State Machine (FSM)

A circuit breaker implements three primary states:

1. **CLOSED (Success):** Requests pass through normally. The breaker tracks failure rates over a **Sliding Window** (e.g., last 100 requests).

2. **OPEN (Failure):** The threshold (e.g., 50% failure rate) is exceeded. All requests are rejected immediately with a `CallNotPermittedException`. This gives the dependency time to recover.

3. **HALF-OPEN (Probing):** After a "Wait Duration," the breaker allows a limited number of trial requests.

- If they succeed $\rightarrow$ **CLOSED**.

- If they fail $\rightarrow$ **OPEN** (resets wait duration).

Library Comparison

| Feature | Resilience4j | Hystrix (Netflix) | Sentinel (Alibaba) |

|---|---|---|---|

| **Status** | Active / Recommended | Maintenance Mode | Active |

| **Threading** | Functional / Decoupled | Thread-pool Isolation | Adaptive Throttling |

| **State Storage** | In-Memory (Atomic) | RxJava Observables | Slot-based Bucket |

| **Complexity** | Low | High | Medium |

Advanced Strategies

1. Adaptive Timeouts

Static timeouts ($2\text{s}$) are often either too long (exhausting threads) or too short (causing false failures). Adaptive timeouts use the $P99$ latency of the last window plus a safety margin to set dynamic thresholds.

2. Predictive Tripping

Instead of waiting for a 50% failure rate, predictive breakers monitor the **Derivative of Latency**. If latency is increasing exponentially, the breaker trips early to preempt a total outage.

3. Bulkhead Integration

Circuit breakers should be paired with **[BulkheadPattern](BulkheadPattern)** to ensure that a tripped circuit for `Service A` does not starve the thread pool used for `Service B`.

Implementation Checklist

- **Don't wrap internal calls:** Only wrap calls that cross a network boundary or a process boundary.

- **Log State Transitions:** Alert when a circuit moves to **OPEN**. This is a leading indicator of a downstream incident.

- **Fail Fast, but return Fallbacks:** Where possible, return a cached value or a default response instead of throwing an error to the end user.

- **Test via Chaos Engineering:** Use tools like Gremlin or Chaos Mesh to inject latency and verify the breaker trips as expected.

Further Reading

- [DistributedSystemsHub](DistributedSystemsHub) — Resilience foundations.

- [MicroservicesArchitecture](MicroservicesArchitecture) — Service mesh pattern context.

- [MonitoringAndAlerting](MonitoringAndAlerting) — Telemetry for state visibility.