Statistical Inference: The Science of Decision Making

Statistical inference is the formal process of deducing the properties of an underlying distribution from observed data. It bridges the gap between a finite, noisy sample and the universal population parameter. Inference is broadly divided into two schools: Frequentist (based on repeatability) and Bayesian (based on belief).

1. The Frequentist Paradigm

Frequentist inference treats parameters as fixed constants. We assess hypotheses by calculating how unlikely our observed data would be if a specific "null" hypothesis were true.

1.1 Hypothesis Testing and the p-value

We start with a Null Hypothesis ( $H_0$ ) and an Alternative Hypothesis ( $H_1$ ). The p-value is the probability of obtaining test results at least as extreme as the results actually observed, under the assumption that $H_0$ is correct.

1.1.1 Geometric Interpretation: Tail Areas

Geometrically, if the test statistic (e.g., $z$ -score or $t$ -score) follows a specific distribution (like a Bell Curve), the p-value represents the area under the curve in the "tails" beyond our observed point.

One-tailed test: Area to the right of $z_{obs}$ .
Two-tailed test: Area to the right of $|z_{obs}|$ and to the left of $-|z_{obs}|$ .

Visualizing this as a physical space: the p-value is the fraction of the total "probability mass" that lies in the extreme regions of the distribution's support.

1.2 Confidence Intervals (CI)

A 95% Confidence Interval for a parameter $\theta$ is a range $[L, U]$ such that if the experiment were repeated infinitely many times, 95% of the calculated intervals would contain the true $\theta$ .

Frequentist Caveat: For any single calculated interval, the probability that it contains the parameter is either 0 or 1. The "95%" refers to the process, not the specific result.

2. Decision Theory and Bayesian Inference

In contrast to Frequentist "significance," Decision Theory focuses on the utility of being right or wrong.

2.1 Loss Functions and Risk

A decision maker chooses an action $a$ to minimize a Loss Function $L(\theta, a)$ .

Squared Error Loss: $L(\theta, a) = (\theta - a)^2$ . Leads to the posterior mean as the optimal estimate.
Absolute Error Loss: $L(\theta, a) = |\theta - a|$ . Leads to the posterior median.

2.2 Bayesian Credible Intervals

Unlike Frequentist CIs, a 95% Credible Interval means there is a 95% probability (based on current knowledge) that the parameter lies within the range. This is the direct result of integrating the posterior distribution $p(\theta | \mathcal{D})$ .

3. Quantitative Foundations: Power and Errors

Statistical tests are subject to two types of errors, which exist in a geometric trade-off.

3.1 The Error Matrix

Truth \ Decision	Fail to Reject $H_0$	Reject $H_0$
$H_0$ is True	Correct Decision	Type I Error ( $\alpha$ )
$H_1$ is True	Type II Error ( $\beta$ )	Correct Decision (Power)

Significance Level ( $\alpha$ ): The probability of a False Positive. Geometrically, this is the area of the rejection region under the $H_0$ curve.
Power ( $1-\beta$ ): The probability of a True Positive. Geometrically, this is the area of the rejection region under the $H_1$ curve.

3.1.2 The Power Formula (Simple Case)

For a test of a mean $\mu$ with known $\sigma$ :

\text{Power} = \Phi\left( \frac{|\mu_a - \mu_0|\sqrt{n}}{\sigma} - z_{1-\alpha/2} \right)

Where $\Phi$ is the standard normal CDF. This shows that power increases with sample size ( $n$ ) and the "Effect Size" $|\mu_a - \mu_0|$ .

4. Real-World Applications

4.1 A/B Testing in Software Engineering

Modern tech companies use statistical inference to validate feature changes. By splitting traffic between "Control" and "Treatment," engineers use a $t$ -test to determine if a 20ms reduction in latency is "statistically significant." However, Practical Significance must also be considered: is the 20ms gain worth the complexity of the new code?

4.2 Clinical Trials and Medicine

In Phase III clinical trials, inference is used to determine if a drug is more effective than a placebo. Due to the high cost of Type I errors (approving a useless drug), $\alpha$ is often set strictly. Bayesian Adaptive Trials allow for the "stopping" of a trial early if the posterior probability of efficacy becomes overwhelmingly high, saving time and lives.