Heartbeat and Lease Patterns

In distributed systems, the lack of a shared clock and a shared memory space makes it difficult to know if a remote node is functioning correctly. **Heartbeats** and **Leases** are two complementary patterns used to manage node presence and resource authority.

1. The Heartbeat Pattern (Liveness)

A **Heartbeat** is a periodic signal sent from one node to another (usually a leader or monitor) to indicate that it is still operational.

The Mechanism

1. **Interval:** A worker node sends a lightweight message every $T$ seconds.

2. **Timeout:** If the monitor does not receive a heartbeat within $N \times T$ seconds (the threshold), it assumes the node has failed or been partitioned.

3. **Action:** The monitor triggers a recovery process, such as reassigning the worker's tasks to another node.

Weakness: The False Positive

Heartbeats are prone to false positives caused by network jitter or heavy CPU load. Modern systems use the [Phi Accrual Failure Detector](PhiAccrualFailureDetector) to provide a probabilistic suspicion level instead of a binary Up/Down status.

2. The Lease Pattern (Authority)

A **Lease** is a time-bound grant of authority over a shared resource. It is essentially a "lock with an expiration date."

The Problem: The Crash Deadlock

Traditional locks are dangerous in distributed systems. If a node acquires a lock on a database row and then crashes, the resource remains locked forever.

The Solution

1. **Contract:** The lock manager grants a lease for a fixed duration (e.g., 60 seconds).

2. **Holder Maintenance:** The lease holder must explicitly **renew** the lease before it expires.

3. **Automatic Release:** If the holder crashes, it fails to renew. Once the TTL (Time to Live) expires, the lock manager can safely grant the resource to another node.

3. Comparison: Heartbeat vs. Lease

| Feature | Heartbeat | Lease |

| :--- | :--- | :--- |

| **Primary Goal** | Failure detection. | Resource coordination (Mutual Exclusion). |

| **Mechanism** | "I am alive" signal. | "I have the right to act" contract. |

| **Authority** | No rights granted. | Exclusive rights granted for a window. |

| **Direction** | Node $\to$ Monitor. | Client $\leftrightarrow$ Lock Manager. |

4. Integration: The Robust Distributed Lock

Most production systems (etcd, ZooKeeper) combine these patterns:

1. **Acquisition:** A node requests a **Lease** on a resource.

2. **Renewal:** The node uses a background **Heartbeat** thread to keep the lease alive.

3. **Protection:** The node provides a [Generation Clock](GenerationClock) (Fencing Token) when acting on the resource to ensure that if the heartbeat fails and the lease expires, its stale actions are rejected.

See Also

* [Distributed Systems Hub](DistributedSystemsHub) — Pattern catalog.

* [Phi Accrual Failure Detector](PhiAccrualFailureDetector) — Advanced heartbeat analysis.

* [Generation Clock (Epoch)](GenerationClock) — Fencing expired lease holders.