Leader Election Algorithms

Leader election ensures that exactly one node in a cluster holds the authority to coordinate operations (e.g., write-serialization, task scheduling). Failure to maintain a unique leader results in Split-Brain, where multiple nodes claim authority, leading to state divergence and data corruption.

Why Leaders Matter

Write Serialization: Single-master systems (PostgreSQL, MySQL) require a leader to order transactions.
Coordination: Schedulers (Kubernetes, Nomad) need one source of truth for task placement.
Efficiency: It is faster to delegate to a leader than to run a full consensus round (Paxos/Raft) for every single read or write.

Consensus Algorithms: The Modern Standard

Modern systems use consensus-based election to prevent split-brain via a Quorum ( $N/2 + 1$ ).

1. Raft

Raft is designed for understandability and implements a strict leader hierarchy.

Election Timeout: A follower becomes a Candidate if it hasn't heard a leader heartbeat.
Term: A monotonically increasing counter. A candidate with a higher term wins.
Log Matching: A candidate cannot be elected unless its log is as up-to-date as the majority.

2. Paxos

The foundation of distributed consensus. Mathematically exhaustive but complex to implement.

Proposer/Acceptor: Nodes negotiate a proposal number.
Zab (Zookeeper): A Paxos variant optimized for high-throughput broadcast.

Simple Election Patterns (Non-Consensus)

Algorithm	Mechanism	Pros	Cons
Bully	Highest Node ID wins.	Simple.	Flapping if highest ID is unstable.
Ring	Nodes pass election tokens in a circle.	Deterministic.	High latency in large rings.
Leases	Leader holds a TTL-based lock in a store (etcd/Consul).	Easy to integrate.	Dependency on the lock store.

Guarding Against Split-Brain: Fencing

When a leader is suspected dead and a new one is elected, the old leader might still be alive (e.g., during a long GC pause). Fencing prevents the zombie leader from causing harm.

Fencing Tokens: Every election increments a token. The backend (Storage/DB) only accepts writes from the highest known token.
STONITH (Shoot The Other Node In The Head): Physically power cycling the suspected failed node.

Comparison: Leader Election vs. Gossip

Metric	Leader Election	Gossip Protocol
Consistency	Strong (Linearizable)	Eventual
Coordination	High (Stop-the-world)	Low (Peer-to-peer)
Network Cost	$O(N^2)$ during election	$O(N)$ constant
Use Case	Shared State, Transactions	Membership, Metrics

Implementation Strategy

Don't implement Raft from scratch. Use proven libraries (etcd raft, Hashicorp raft).
Use external coordinators (etcd, Zookeeper) for application-level leadership rather than building it into every microservice.
Set aggressive heartbeats but conservative election timeouts to minimize flapping.