Metaheuristic Optimization: The Math of SA and GA
In [Operations Research](OperationsResearch), we frequently encounter "NP-hard" landscapes where global optima are obscured by a thicket of local minima. When exact methods (like Branch and Bound) become computationally intractable, we turn to **Metaheuristics**. This article dissects the mathematical machinery of the two most prominent paradigms: Simulated Annealing (SA) and Genetic Algorithms (GA).
---
1. Simulated Annealing: The Thermodynamics of Search
Simulated Annealing maps the physical process of cooling a metal to the search for a global minimum of a cost function $f(x)$.
1.1 The Metropolis Criterion
The heart of SA is the **Acceptance Probability**, which allows the algorithm to escape local traps by occasionally accepting "uphill" moves (moves that increase the cost).
Given a current state$x$and a candidate neighbor$x'$, let$\Delta E = f(x') - f(x)$. The probability of accepting$x'$is:$$P(\text{accept}) =
\begin{cases}
1 & \text{if } \Delta E \le 0 \\
\exp\left(-\frac{\Delta E}{T}\right) & \text{if } \Delta E > 0
\end{cases}$$Where$T$is the "temperature" parameter.
**The Math of Exploration:** At high temperatures ($T \to \infty$),$P \to 1$for all$\Delta E$, making the search essentially a random walk. As$T \to 0$,$P \to 0$for all$\Delta E > 0$, and the algorithm behaves like a greedy Hill Climber.
1.2 Cooling Schedules: Logarithmic vs. Geometric
The effectiveness of SA depends on how$T$is reduced over time$k$.
* **Geometric Cooling:**$T_{k+1} = \alpha T_k$, where$0.8 \le \alpha < 1$. This is the standard in engineering for its speed.
* **Logarithmic Cooling:**$T_k = \frac{C}{\log(k+d)}$. This schedule is mathematically guaranteed to find the global optimum given infinite time, but it is too slow for most practical applications.
---
2. Genetic Algorithms: The Algebra of Evolution
Genetic Algorithms (GA) are population-based searches that use operators inspired by molecular biology: Selection, Crossover, and Mutation.
2.1 Selection Pressure: Tournament vs. Roulette
Selection is the mechanism that enforces "survival of the fittest."
* **Roulette Wheel Selection:** The probability of selecting individual$i$is$P_i = \frac{f_i}{\sum f_j}$. This is prone to "stagnation" if one individual is significantly fitter than the rest.
* **Tournament Selection:** Pick$k$individuals at random and select the best among them. This is more robust as it depends only on the *relative* rank, not the absolute magnitude of fitness.
2.2 The Schemata Theorem
The mathematical justification for GAs is John Holland’s **Schemata Theorem**. It states that "building blocks" (short, high-fitness strings called schemata) increase their presence in the population exponentially over generations.
Let$m(H, t)$be the number of strings matching schema$H$at generation$t$.$$m(H, t+1) \ge m(H, t) \frac{f(H)}{\bar{f}} [1 - \text{Loss}_{crossover} - \text{Loss}_{mutation}]$$Where$f(H)$is the average fitness of schema$H$and$\bar{f}$is the average fitness of the whole population. This confirms that GA is not a random search but a structured information-processing engine.
---
3. Comparison and Convergence
| Feature | Simulated Annealing (SA) | Genetic Algorithm (GA) |
| :--- | :--- | :--- |
| **Search Mode** | Single trajectory (Trajectory-based) | Population-based (Parallel) |
| **Global Guarantee** | Yes (given infinite time/log-cooling) | No (prone to premature convergence) |
| **Key Math** | Boltzmann/Metropolis Distribution | Schemata Theorem / Parity Laws |
| **Best For** | Continuous or very large discrete spaces | Combinatorial problems (e.g. TSP) |
3.1 The Exploitation vs. Exploration Trade-off
The performance of any metaheuristic is a balance between **Exploration** (visiting new regions) and **Exploitation** (refining the current best).
* In SA, this is controlled by the **Temperature**$T$.
* In GA, this is controlled by the **Mutation Rate**$\mu$and **Crossover Rate**$P_c$.
Researchers should prioritize SA when the neighborhood structure is well-defined and the "energy" landscape is relatively smooth. Choose GA when the problem can be decomposed into independent sub-components that can be recombined through crossover.