Global Cocoa Price Forecasting with Hybrid ML Framework | Henry Wei (魏炫麒)

Overview

Forecasting commodity prices requires models that can handle both long-term trends and sudden volatility shifts. This senior capstone project tackled global cocoa price prediction using over 6,000 daily observations spanning 1994 to 2024, developing a hybrid framework that prioritizes interpretability and statistical rigor over raw algorithmic complexity.

Motivation

My interest in this problem grew from a broader curiosity about how uncertainty impacts temporal dynamics. Commodity markets are a natural testbed: prices are driven by overlapping factors — weather, geopolitics, supply chain shocks — and any single modeling paradigm tends to fail under regime changes. The goal was not just to forecast prices, but to build a pipeline where each modeling decision was justified by residual diagnostics rather than arbitrary complexity.

Approach

Cocoa price forecasting approach

Phase 1 — ARIMA Baseline

Initial ARIMA-based models captured long-term trends effectively. However, during periods of high volatility — such as supply disruptions or speculative surges — the model consistently overpredicted, revealing a fundamental limitation of linear autoregressive assumptions.

Phase 2 — XGBoost

To address this, I transitioned to XGBoost for improved flexibility in capturing nonlinear relationships. While predictive accuracy improved, residual diagnostics showed that the model was still ignoring critical temporal dependencies — the structured autocorrelation in the residuals indicated that simply adding algorithmic complexity wasn’t enough.

Phase 3 — Hybrid Framework

Moving beyond raw algorithmic complexity, I designed a hybrid framework that integrated time-series structure (to respect the temporal ordering and autocorrelation) with external drivers (macroeconomic indicators, supply-side signals). The design prioritized interpretability: each component’s contribution could be isolated and validated, making the model’s behavior transparent to domain experts.

Diagnostics-Driven Iteration

Every model transition was motivated by careful residual analysis — checking for autocorrelation, heteroscedasticity, and distributional assumptions — rather than simply stacking more complex architectures. This approach ensured that improvements were statistically grounded.

Results

Achieved a final MAPE of 5.39%, significantly improving over both ARIMA and standalone XGBoost baselines
Successfully captured volatility regimes that pure autoregressive models missed
Demonstrated that model utility lies in interpretability and diagnostic-driven design, not just predictive accuracy

Key Takeaway

This project reinforced a core principle from my studies in nonlinear optimization and statistical inference: true model utility lies not in raw predictive performance, but in designing architectures that ensure reliability and interpretability. The diagnostics-driven workflow became a template for how I approach all subsequent modeling problems.