Microservices Architecture

Microservices achieve organizational scalability by decoupling service boundaries. However, this decoupling introduces the most difficult problem in distributed systems: **Cross-Service Consistency**.

The "Final Boss": Distributed Transactions

In a monolith, a transaction either commits or rolls back across the entire database. In microservices, each service has its own database. If a business process spans three services (e.g., Order → Payment → Inventory), you cannot use a global lock without destroying availability and performance.

The Saga Pattern

A Saga is a sequence of local transactions. Each local transaction updates the database and publishes an event to trigger the next local transaction. If a step fails, the Saga executes **compensating transactions** to undo the preceding steps.

1. Choreography (Event-Based)

Services exchange events without a central coordinator.

- **Flow:** Order Service (Success) → `OrderCreated` → Payment Service (Success) → `PaymentAuthorized` → Inventory Service.

- **Pros:** Highly decoupled, simple to start.

- **Cons:** Hard to track the overall state; "Cyclic Dependencies" are common and dangerous.

2. Orchestration (Command-Based)

A central "Saga Orchestrator" manages the state machine and tells each service what to do.

- **Flow:** Orchestrator → `AuthorizePayment` → Payment Service (Success) → Orchestrator → `ReserveInventory` → Inventory Service.

- **Pros:** Centralized visibility, easier to debug, no cyclic dependencies.

- **Cons:** The Orchestrator itself is a single point of failure (requires [StateManagementPatterns](StateManagementPatterns) for durability).

Concrete Example: Travel Booking Saga

A travel booking requires a Hotel and a Flight. If the Flight fails, the Hotel must be cancelled.

| Step | Service | Transaction | Compensation |

|---|---|---|---|

| 1 | Hotel | `bookHotel()` | `cancelHotel()` |

| 2 | Flight | `bookFlight()` | `cancelFlight()` |

| 3 | Payment | `chargeCard()` | `refundCard()` |

**Orchestrator Logic (Pseudo-code):**

```python

def travel_saga(request):

try:

hotel_id = hotel_service.book(request)

try:

flight_id = flight_service.book(request)

try:

payment_service.charge(request)

except PaymentError:

flight_service.cancel(flight_id)

hotel_service.cancel(hotel_id)

except FlightError:

hotel_service.cancel(hotel_id)

except HotelError:

return "Booking Failed"

```

Isolation Challenges (The AC-D in BASE)

Sagas lack the "Isolation" of ACID. While a Saga is running, other transactions might see the "Intermediate State" (e.g., the Hotel is booked but the Flight isn't yet).

**Mitigation Strategies:**

- **Semantic Lock:** Use an `application-level lock` (e.g., set `status = PENDING`) to prevent other processes from modifying the same data.

- **Commutative Updates:** Design operations so the order doesn't matter (e.g., increments/decrements).

- **Pessimistic View:** Show users "Pending" states instead of "Success" until the entire Saga completes.

Observability and the "Golden Signal"

When a Saga spans 10 services, finding the point of failure is impossible without **Distributed Tracing**.

- **Trace Context:** Every request must carry a `trace_id` and `span_id`.

- **Log Correlation:** All service logs must include the `trace_id` to allow reconstructing the "Story" of a failed transaction across the entire cluster.

Further Reading

- [SagaPattern](SagaPattern)

- [OutboxPattern](OutboxPattern)

- [EventSourcing](EventSourcing)

- [DistributedTracing](DistributedTracing)