Chris Kuo/Dr. Dataman
2 min readJul 11, 2024

--

Great question. Indeed, it feels like wasting the most recent data not for model training. Let me answer with three related points (I usually bring up related points in my class Q&A): (1) Time series modeling splits the data in the in-time window and out-of-time (OOT) window. The in-time window is earlier and the OOT window is later. We want a stable model that captures the patterns in the in-time window so it can predict the OOT. (2) Model stability means the model can capture the important patterns that exist in all periods. If there is a new pattern in the OOT window, it is not captured. This idea is the same as any ML model that if there is a new pattern in the particular test data, it is not captured. Therefore there are data sampling processes such as K-fold cross-validation to mitigate the risk of a model relying on a particular dataset. See Chapter 2 for "model diagnostics for all periods", (3) Now comes to the most important question, "why don't we use the OOT window to build the model?" The idea is model stability. If there is a new pattern only in the OOT, we are not sure it will continue in the future. If we use the OOT, we may build a model that is too sensitive to the data in OOT. (4) What's the remedy? One, increasing the length of the in-time window and shortening the length of the OOT window so the model includes more data of the OOT period, given sufficient data in the OOT for validation. The next remedy is to build a stable model with the in-time and OOT procedure. Once you are satisfied with the model, a practice is to re-fit the model with the entire data. We need to compare the re-fitted model with the built model to observe patterns. We do not want the re-fitted model capturing small new patterns that only exist in the OOT window. We simply re-parameterize the model for the patterns that are already captured. Between the two remedies, my suggestion is the first one.

--

--

Chris Kuo/Dr. Dataman
Chris Kuo/Dr. Dataman

No responses yet