Hash Functions and Cryptographic Hashing

A Hash Function is a mathematical algorithm that maps data of arbitrary size to a fixed-size bit string (a hash value, digest, or simply a hash). They are the fundamental building blocks of modern computer science, powering everything from hash tables to blockchain ledgers.

1. Core Properties

A robust hash function must satisfy several critical properties:

Determinism: The same input must always produce the same output.
Uniformity: Outputs should be evenly distributed across the available hash space to minimize collisions.
Avalanche Effect: A tiny change in the input (even a single bit) should radically change the output digest. This ensures the output is unpredictable.
Speed: It should be computationally efficient to generate a hash.

2. Cryptographic vs. Non-Cryptographic

Hash functions bifurcate into two distinct categories based on their security guarantees.

Cryptographic Hash Functions

Designed to withstand adversarial attacks. They prioritize security over raw speed and enforce:

Pre-image Resistance: Given a hash h, it is computationally infeasible to find an input m such that hash(m) = h.
Collision Resistance: It is infeasible to find two different inputs m1 and m2 that hash to the same output.
Examples: SHA-256, SHA-3, BLAKE3.

Non-Cryptographic Hash Functions

Designed for raw speed and excellent distribution, but not security. They are vulnerable to intentional collision generation (HashDoS attacks).

Use Cases: Hash tables, caching, Bloom Filters.
Examples: MurmurHash3, xxHash, CityHash.

3. Application: Bloom Filters

One of the most elegant applications of non-cryptographic hash functions is the Bloom Filter—a space-efficient probabilistic data structure.

By passing an element through $k$ different hash functions and setting the corresponding bits in a bit array, a Bloom Filter can quickly determine set membership. It guarantees no false negatives (if it says an item is not present, it definitely isn't) but allows for a tunable rate of false positives. This makes them ideal for database query optimization and caching layers.