layout: true --- class: inverse, center, middle background-image: url(figs/titlepage16-9.png) background-size: cover <br> # Forecast with forecasts: # Diversity matters <img src="figs/slides.png" width="150px"/> #### *Yanfei Kang | ISF2020 | 26 Oct 2020* --- class: inverse, center, middle # > "In combining the results of these two methods, one can obtain a result whose probability law of error will be more rapidly decreasing." <img src="figs/laplace.jpg" width="200px"/> > — *Pierre-Simon Laplace* --- # Model combination Model combination helps improve the overall performance in - **Regression** (e.g., Mendes-Moreira, Soares, Jorge, and Sousa, 2012) - **Classification** (e.g., Kuncheva and Whitaker, 2003) - **Anomaly detection** (e.g., Perdisci, Gu, and Lee, 2006) - **Forecasting** (e.g., Thomson, Pollock, Onkal, and Gonul, 2019) --- class: inverse, center, middle # A look back at forecast combination --- # A look back at forecast combination - The seminal work: (e.g., Bates and Granger, 1969) - Forecast combination can improve forecasting accuracy, provided that the sets of forecasts contain some independent information. - Thereafter a variety of weighting methods - Error-based approach (e.g., Winkler and Makridakis, 1983) - Regression-based approach (e.g., Granger and Ramanathan, 1984) - Bayesian averaging (e.g., Granger and Ramanathan, 1984) - etc. --- # Feature-based, **OF COURSE!** - Pioneer studies: rule-based methods (e.g., Wang, Smith-Miles, and Hyndman, 2009) - More recently, - `FFORMA` (Montero-Manso, Athanasopoulos, Hyndman, and Talagala, 2020): 42 features + `XGBoost` - `gratis` (Kang, Hyndman, and Li, 2020): time series generation + 26 features + nonlinear regression models - Forecasting with imaging (Li, Kang, and Li, 2020): imaging features + `XGBoost` - `fuma` (Wang, Kang, Petropoulos, and Li, 2019) : 42 features + `gam` + **interval forecasting** --- # Challenges - Choice and estimation of time series features. - Time series features vary from tens to thousands (Fulcher and Jones, 2014). - Features are extracted using the historical, observed data of each time series - they are doomed to *change*, though. - Feature extraction might not be robust in the case of limited historical data. - Large number of chosen features increases the computational time. --- class: inverse, center, middle # Historical data `\(\Rightarrow\)` Produced forecasts -- # Forecast with forecasts -- # Diversity matters --- # Diversity matters? .pull-left[ <img src="figure/unnamed-chunk-3-1.svg" width="576" style="display: block; margin: auto;" /> ] .pull-right[ ``` #> # A tibble: 9 x 2 #> Method MASE #> <chr> <dbl> #> 1 auto_arima_forec 0.363 #> 2 nnetar_forec 0.4 #> 3 tbats_forec 0.408 #> 4 snaive_forec 0.412 #> 5 ets_forec 0.435 #> 6 naive_forec 0.53 #> 7 rw_drift_forec 0.603 #> 8 thetaf_forec 0.744 #> 9 stlm_ar_forec 0.844 ``` ] --- # Combining two methods .pull-left[
] .pull-right[ ``` #> # A tibble: 9 x 2 #> Method MASE #> <chr> <dbl> #> 1 auto_arima_forec 0.363 #> 2 nnetar_forec 0.4 #> 3 tbats_forec 0.408 #> 4 snaive_forec 0.412 #> 5 ets_forec 0.435 #> 6 naive_forec 0.53 #> 7 rw_drift_forec 0.603 #> 8 thetaf_forec 0.744 #> 9 stlm_ar_forec 0.844 ``` ] --- # Combining two methods .pull-left[
] .pull-right[ ``` #> # A tibble: 9 x 2 #> Method MASE #> <chr> <dbl> #> 1 auto_arima_forec 0.363 #> 2 nnetar_forec 0.4 #> 3 tbats_forec 0.408 #> 4 snaive_forec 0.412 #> 5 ets_forec 0.435 #> 6 naive_forec 0.53 #> 7 rw_drift_forec 0.603 #> 8 thetaf_forec 0.744 #> 9 stlm_ar_forec 0.844 ``` ] --- # Combining two methods .pull-left[
] .pull-right[ ``` #> # A tibble: 9 x 2 #> Method MASE #> <chr> <dbl> #> 1 auto_arima_forec 0.363 #> 2 nnetar_forec 0.4 #> 3 tbats_forec 0.408 #> 4 snaive_forec 0.412 #> 5 ets_forec 0.435 #> 6 naive_forec 0.53 #> 7 rw_drift_forec 0.603 #> 8 thetaf_forec 0.744 #> 9 stlm_ar_forec 0.844 ``` ] --- # Yes, diversity matters! <img src="figure/unnamed-chunk-11-1.svg" width="864" style="display: block; margin: auto;" /> --- # Yes, diversity matters! <img src="figure/unnamed-chunk-12-1.svg" width="864" style="display: block; margin: auto;" /> --- class: inverse, center, middle # How to "forecast with forecasts"? --- # How to use diversity to forecast? - Input: the forecasts from a pool of models. - Measure their diversity, a feature that has been identified as a decisive factor in forecast combination (Thomson, Pollock, Onkal, et al., 2019; Lichtendahl and Winkler, 2020). - Through meta-learning, we link the diversity of the forecasts with their out-of-sample performance to fit combination models based on diversity. --- # Measuring diversity .small[ $$ `\begin{aligned} MSE_{comb} & = \frac{1}{H}\sum_{h = 1}^HMSE_{ch} \\ & = \frac{1}{H}\sum_{h = 1}^H (\frac{1}{M}\sum_{i = 1}^Mf_{ih} - y_{T+h})^2\\ & = \frac{1}{H} \frac{1}{M} \sum_{h=1}^{H} \sum_{i=1}^{M}(f_{ih}-y_{T+h})^2 - \frac{1}{H} \frac{1}{M^2} \sum_{h=1}^{H} \sum_{i=1}^{M-1} \sum_{j=1,j>i}^{M}(f_{ih}-f_{jh})^2 \\ & = \frac{1}{M} \sum_{i=1}^{M} MSE_i - \frac{1}{M^2} \sum_{i=1}^{M-1} \sum_{j=1,j>i}^{M}Div_{i,j}, \end{aligned}` $$ `\(H\)` is the forecasting horizon, `\(M\)` is the number of forecasting methods, `\(T\)` is the historical length (Thomson, Pollock, Onkal, et al., 2019). ] --- # Measuring diversity $$ `\begin{aligned} MSE_i &= \frac{1}{H} \sum_{i=1}^{H} MSE_{ch} = \frac{1}{H} \sum_{i=1}^{H} (f_{ch}-y_{T+h})^2,\\ Div_{i,j}& = \frac{1}{H} \sum_{i=1}^{H} (f_{ih}-f_{jh})^2, \\ \\ \\ sDiv_{i,j}& = \frac{\frac{1}{H}\sum\limits_{h=1}^H(f_{ih}-f_{jh})^2}{(\frac{1}{T}\sum\limits_{t=1}^T|y_t|)^2}. \end{aligned}` $$ --- # Diversity for forecast combination <img src="figs/diversity-extraction.png"/> --- # The forecasting framework <center> <img src="figs/framework.png" width="800px"/> </center> --- class: inverse, center, middle # Does it work? --- # Data used: M4 <center> <img src="figs/M4.png" width="900px"/> </center> --- # Methods used <center> <img src="figs/methods.png" width="900px"/> </center> --- # Forecasting with logistic regression <center> <img src="figs/MNL.png" width="650px"/> </center> --- # Forecasting with XGBoost <center> <img src="figs/XGBoost.png" width="500px"/> </center> --- # Does it work? - We empirically show that this single feature is enough. - Moreover, we show that expanding the list of other time-series features to include the diversity further enhances the forecasting performance. - Computational time: 1 min for diversity features; 44 mins to extract 42 features for M4 (32 cores). - We found that the proposed method replies on the length of forecasting horizon. --- # Conclusion - Facing the challenge of selecting an appropriate set of time series features. - Features are unreliable when historical data is limited. - We propose to forecast with forecasts, without manually choosing time series features, yielding comparable performance. - The proposed diversity-based forecast combination automatically controls the combination via measuring the pairwise diversity between forecasts and linking them, via a meta-learner, to the accuracy on a test set. - Our approach does not rely on particular families of forecasting models. It can equally be applied on both statistical and judgmental forecasts. --- class: center, middle <br> # Thanks! <img src="figs/wei.jpg" width="200px"/> <img src="figs/feng.jpg" width="200px"/> <img src="figs/fotios.jpg" width="200px"/> ## `Web`: *http://yanfei.site* ## `Lab`: *http://kllab.org*