Fresh Food Demand Forecasting
Predicting demand for short-shelf-life products is critical because the penalty for error is immediate spoilage or lost sales.
Why Perishable Demand is Harder
Unlike durable goods, fresh food demand is highly volatile due to:
- Weather Sensitivity: Ice cream and salad sales spike on hot days; soup on cold days.
- Substitution Effects: If a store is out of strawberries, a customer might buy raspberries instead.
- Promotions: Discounts drive massive, localized demand spikes.
Classical vs. Machine Learning Methods
- Classical: SARIMAX (Seasonal ARIMA with eXogenous variables) and Holt-Winters provide solid baselines, effectively capturing weekly seasonality.
\Phi(B)\Phi_s(B^s)(1-B)^d(1-B^s)^D y_t = \Theta(B)\Theta_s(B^s) \epsilon_t + \beta X_t
- Machine Learning: Gradient Boosted Trees (like LightGBM, XGBoost, and Random Forest) excel at capturing nonlinear interactions (e.g., a promotion on a Tuesday during a rainstorm). Despite the rise of deep learning, tree-based models often still outperform Transformers for short-horizon, daily retail fresh food orders due to their simpler feature spaces and robust performance on structured data.
- Deep Learning and Probabilistic Forecasting: Temporal Fusion Transformers (TFT) and other advanced architectures are increasingly used for multi-horizon forecasting. Modern models have largely shifted from point forecasting to probabilistic forecasting, directly linking prediction intervals to operational risk and inventory decisions.
Feature Engineering for Fresh Food
Crucial features for an ML model predicting daily SKU-level demand:
- Temporal: Day-of-week, day-of-month, distance to next major holiday.
- Exogenous: Temperature, precipitation, local events.
- Pricing: Current price, competitor pricing, cannibalization features (prices of substitute fresh goods).
Forecast Reconciliation
Predictions are often generated at the store-SKU level, but purchasing decisions happen at the DC-category level. Hierarchical reconciliation ensures that the sum of store forecasts equals the regional forecast, preventing bullwhip effects across the network.
The Asymmetric Cost of Over- vs Under-Forecasting
As discussed in PerishableInventoryTheory, the cost of over-forecasting (c_o, leading to waste) is often higher than the cost of under-forecasting (c_u, lost margin).
Therefore, models are trained using an asymmetric loss function, such as the pinball loss for quantile regression:
L_\tau(y, \hat{y}) = \begin{cases} \tau (y - \hat{y}) & \text{if } y \geq \hat{y} \\ (1 - \tau) (\hat{y} - y) & \text{if } y < \hat{y} \end{cases}
Setting \tau < 0.5 biases the model to under-forecast, explicitly trading off out-of-stocks against food waste (see FreshFoodWasteScience).
Worked Example:
A supermarket sells fresh baked croissants.
- $c_o = 1.00 (waste cost)
- $c_u = 1.50 (lost profit margin)
The critical quantile is \tau = 1.50 / (1.50 + 1.00) = 0.60.
Using LightGBM with pinball loss at \tau=0.60, the model outputs an optimal order quantity of 120 units. If a simple MSE (Mean Squared Error) model were used, it might forecast the mean demand of 105 units. By integrating the inventory decision into the forecast objective, the store optimally orders 120, balancing the risk of waste against capturing the upside profit.
Industry Comparisons
References