A Tutorial on Tree-based Time Series Forecasting

Chris Kuo/Dr. Dataman
12 min readJan 19, 2024

In around 2020s, there was a series of prize competitions for time series forecasting called the M competition, hosted by the International Institute of Forecasters and its affiliation with the prestigious journal, International Journal of Forecasting. The winning models were awarded with prizes from $25,000 to $2,000. The competitions have drawn researchers from around the world to invent new time series models.

Who were the winners? Surprisingly, the top-ranking models were dominated by tree-based machine learning methods. In particular, gradient-boosting models like LightGBM (Ke et al., 2017) prevailed in the competitions, as documented in Januschowski et al., 2022. It is for this important reason that I include the tree-based time series forecasting in this book. Although I’ve introduced handy tools like Prophet and NeuralProphet, it is extremely valuable to understand tree-based models.

Now we venture into a new territory that is different from the classical ARIMA-style time series modeling. We know that a supervised learning model requires a target variable and features, yet a univariate time series seems to have limited information. Is it enough to generate many features for a supervised learning model? A straightforward way is to create a lot of lagged variables. Further, let’s talk about the target. We are using the past terms to predict the future terms, right? Is it only one-step-ahead or will be multi-step? It is just for these important reasons that there have been many successful models by academic researchers and practitioners.

In this post, we will do the fundamental steps:

  • Creating features from a univariate time series
  • Supervised learning framework for one-step-ahead forecast
  • Building lightGBM forecasting models
  • Providing model explainability

These fundamental steps will prepare you to do the multi-step forecasting. We just focus on lightGBM, but the procedure applies to other regressors like Random Forest (RF), or Gradient Boosting Machine (GBM), or Extreme Gradient Boosting (XGB). The Python Notebook is available here for you to download.

This and the next posts complete the tutorials for tree-based time series…

--

--

Chris Kuo/Dr. Dataman

The Dataman articles are my reflections on data science and teaching notes at Columbia University https://sps.columbia.edu/faculty/chris-kuo