layout: true --- class: split-10 hide-slide-number background-image: url(figs/titlepage_buaa.png) background-size: cover .column[ ] .column[ .sliderbox.vtop.shade_main.right[ .font2[Feature-based time series forecasting] .font1[*Yanfei Kang | ICDM | 7 Dec 2021*] ]] --- class: center, split-60 hide-slide-number background-image: url(figs/generation.jpg) background-size: cover .column[ .sliderbox.vmiddle.shade_main.center[ .font2[Modern time series data]]] .column[ ] --- class: center, hide-slide-number background-image: url(figs/retail.jpeg) background-size: cover # Retail --- class: center, hide-slide-number background-image: url("figs/data-center.jpeg") background-size: cover # Web traffic --- class: center, hide-slide-number background-image: url("figs/smart-meter.jpeg") background-size: cover # Smart meter --- class: inverse, left, middle *"If we know that learning algorithm `\(A\)` is superior to `\(B\)` averaged over some set of targets `\(F\)`, then the No Free Lunch theorems tell us that `\(B\)` must be superior to `\(A\)` if one averages over all targets not in `\(F\)`. This is true even if algorithm `\(B\)` is the algorithm of purely random guessing." * .left[-- Wolpert (1996)] --- class: inverse, left, middle *“The No Free Lunch Theorem argues that, without having substantive information about the modeling problem, there is no single model that will always do better than any other model.”* .left[-- Kuhn and Johnson (2013)] --- # Algorithm selection problem - Using measurable features of the problem instances to **predict which algorithm is likely to perform best**. - Applied to e.g., classification, regression, constraint satisfaction, forecasting and optimization (Smith-Miles, 2009). --- # Forecasting - One way to forecast: manually inspect the time series `\(\rightarrow\)` understand its characteristics `\(\rightarrow\)` manually select an appropriate method according to the forecaster’s experience. - Not scalable for **large collections of series**. - **An automatic framework** is need! - Pioneer studies: rule-based methods (e.g., Collopy and Armstrong, 1992; Wang, Smith-Miles, and Hyndman, 2009) - More recently: **feature-based forecasting** via meta-learning. --- class: center, split-60 hide-slide-number background-image: url(figs/generation.jpg) background-size: cover .column[ .sliderbox.vmiddle.shade_main.center[ .font2[Feature-based forecasting] .white.left[*Framework*]]] .column[ ] --- class: split-two white .column.bg-indigo[.content.vmiddle.center[ ### .white[Raw data] <img src="figure/unnamed-chunk-2-1.svg" width="504" style="display: block; margin: auto;" /> ]] .column[.content.vmiddle.center[ ### Feature representation <img src="figure/unnamed-chunk-3-1.svg" width="504" style="display: block; margin: auto;" /> ]] --- class: split-two white .column.bg-indigo[.content.vmiddle.center[ ### .white[Raw data] <img src="figure/unnamed-chunk-4-1.svg" width="504" style="display: block; margin: auto;" /> ]] .column[.content.vmiddle.center[ ### Feature representation <img src="figure/unnamed-chunk-5-1.svg" width="504" style="display: block; margin: auto;" /> ]] --- # More features .content-box-gray[ ### STL decomposition based features By STL, `\(x_t = S_t + T_t + R_t\)`. 1. Strength of trend: `\(F_1 = 1- \frac{\text{var}(R_t)}{\text{var}(x_t - S_t)}.\)` 2. Strength of seasonality: `\(F_2 = 1- \frac{\text{var}(R_t)}{\text{var}(x_t - T_t)}.\)` 3. etc. ] .content-box-green[ ### More available at - [`tsfeatures` ](https://cran.r-project.org/web/packages/tsfeatures/index.html) (Hyndman, Kang, Montero-Manso, Talagala, Wang, Yang, and O'Hara-Wild, 2020) - [`feasts`](https://feasts.tidyverts.org/) (O'Hara-Wild, Hyndman, and Wang, 2021) ] --- class: split-33 white .column.bg-indigo[.content.vmiddle.center[ ### .white[Raw data] <img src="figure/unnamed-chunk-6-1.svg" width="504" style="display: block; margin: auto;" /> .tiny[From M3.] ]] .column[.content.vmiddle[ ### Feature extraction ```r library(tsfeatures) M3.selected %>% tsfeatures(c("frequency", "stl_features", "entropy", "acf_features", "pacf_features", "heterogeneity", "hw_parameters", "lumpiness", "stability", "max_level_shift", "max_var_shift", "unitroot_pp", "unitroot_kpss", "hurst", "crossing_points")) #> # A tibble: 100 × 41 #> frequency nperiods seasonal_period trend spike linearity curvature e_acf1 e_acf10 seasonal_strength peak trough #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 4 1 4 0.973 2.70e- 7 -6.42 0.118 -0.0248 0.326 0.140 1 3 #> 2 4 1 4 0.535 7.58e- 4 -0.338 -1.58 -0.198 0.352 0.148 4 1 #> 3 4 1 4 0.979 3.86e- 7 6.36 0.731 -0.504 0.670 0.0657 1 4 #> 4 12 1 12 0.158 2.47e- 4 -1.60 -1.60 -0.127 0.250 0.316 10 2 #> 5 4 1 4 1.00 2.37e-10 -6.96 1.26 0.196 0.148 0.0975 3 2 #> 6 4 1 4 0.949 4.63e- 6 -1.70 -2.64 -0.164 0.511 0.0996 4 4 #> 7 4 1 4 0.998 6.41e- 9 6.11 1.43 0.174 0.176 0.0713 1 2 #> 8 4 1 4 0.967 1.29e- 6 6.28 1.21 -0.381 0.654 0.198 2 3 #> 9 4 1 4 0.609 2.05e- 4 1.99 0.160 -0.418 0.456 0.707 1 4 #> 10 4 1 4 0.950 3.63e- 6 3.08 -3.72 -0.135 0.366 0.569 1 4 #> # … with 90 more rows, and 29 more variables: entropy <dbl>, x_acf1 <dbl>, x_acf10 <dbl>, diff1_acf1 <dbl>, #> # diff1_acf10 <dbl>, diff2_acf1 <dbl>, diff2_acf10 <dbl>, seas_acf1 <dbl>, x_pacf5 <dbl>, diff1x_pacf5 <dbl>, #> # diff2x_pacf5 <dbl>, seas_pacf <dbl>, arch_acf <dbl>, garch_acf <dbl>, arch_r2 <dbl>, garch_r2 <dbl>, alpha <dbl>, #> # beta <dbl>, gamma <dbl>, lumpiness <dbl>, stability <dbl>, max_level_shift <dbl>, time_level_shift <dbl>, #> # max_var_shift <dbl>, time_var_shift <dbl>, unitroot_pp <dbl>, unitroot_kpss <dbl>, hurst <dbl>, #> # crossing_points <dbl> ``` ]] --- # PCA `\(\Rightarrow\)` 2d <center> <img src="figs/instancespace.png" height="550px"/> </center> --- # The framework for feature-based forecasting <center> <img src="figs/forecastdiag.png" height="550px"/> </center> --- class: split-40 # Challenge 1: Training data .column[.content.vmiddle[ We need diverse training data. <center> <img src="figs/forbes.png" height="350px"/> </center> ]] -- .column[.content.vmiddle[ .content-box-gray[ - Yanfei Kang, Rob J. Hyndman, Kate Smith-Miles. (2017). Visualising Forecasting Algorithm Performance using Time Series Instance Space, *International Journal of Forecasting* 33(2): 345–358. - Yanfei Kang, Rob J Hyndman, Feng Li (2020). GRATIS: GeneRAting TIme Series with diverse and controllable characteristics,* Statistical Analysis and Data Mining* 13(4): 354-376. ]]] --- # Challenge 2: Features - Manual choice and estimation of features, which vary from tens to thousands (Fulcher and Jones, 2014). - Extracted using the historical data that are doomed to *change*. - Not robust in the case of limited historical data. -- .content-box-gray[ - Xixi Li, Yanfei Kang, Feng Li (2020). Forecasting with time series imaging, *Expert Systems with Applications* 160: 113680. - Yanfei Kang, Wei Cao, Fotios Petropoulos, Feng Li (2021). Forecast with forecasts: Diversity matters, *European Journal of Operational Research*. ] --- # Other challenges .content-box-gray[Meta-learners] .content-box-green[Intermittent data] .content-box-red[Uncertainty estimation] -- .content-box-gray.small[ - Thiyanga S. Talagala, Feng Li, Yanfei Kang (2021). FFORMPP: Feature-based forecast model performance prediction, *International Journal of Forecasting*. - Evangelos Theodorou, Shengjie Wang, Yanfei Kang, Evangelos Spiliotis, Spyros Makridakis, Vassilios Assimakopoulos (2021). Exploring the representativeness of the M5 competition data, *International Journal of Forecasting*. - Xiaoqian Wang, Yanfei Kang, Fotios Petropoulos, Feng Li (2021). The uncertainty estimation of feature-based forecast combinations, *Journal of the Operational Research Society*. - Li Li, Yanfei Kang, Feng Li (2021). Bayesian forecast combination using time-varying features. [Working paper](https://arxiv.org/abs/2108.02082). ] --- class: center, split-60 hide-slide-number background-image: url(figs/generation.jpg) background-size: cover .column[ .sliderbox.vmiddle.shade_main.center[ .font2[Training data generation]]] .column[ ] --- # Gaussian Mixure Autoregressive (MAR) models .content-box-gray[ ### `\(\text{MAR}(K;p_1, \cdots, p_K)\)` model `\(x_t=\phi_{k0}+\phi_{k1}x_{t-1}+\cdots+\phi_{kp_k}x_{t-p_k}+\epsilon_t, \epsilon_t \sim N(0, \sigma_k^2)\)` <br> with probability `\(\alpha_k\)`, where `\(\sum_{k=1}^K \alpha_k= 1\)`. ] .content-box-green[ ### Merits of MAR models ✨ - Consist of multiple stationary or non-stationary autoregressive components. - Possible to capture many (or any) time series features, e.g., non-stationarity, nonlinearity, non-Gaussianity, cycles and heteroskedasticity ] --- class: split-two # What do they look like? .column[.content.vmiddle[ ### Yearly ```r *library(gratis) # library(feasts) set.seed(2021) *mar_model(seasonal_periods=1) %>% # generate(length=30, nseries=2) %>% autoplot(value) + theme(legend.position="none") ``` ]] .column[.content.vmiddle[ <img src="figure/unnamed-chunk-8-1.svg" width="504" style="display: block; margin: auto;" /> ]] --- class: split-two # What do they look like? .column[.content.vmiddle[ ### Quarterly ```r library(gratis) library(feasts) set.seed(2021) *mar_model(seasonal_periods=4) %>% # generate(length=60, nseries=2) %>% autoplot(value) + theme(legend.position="none") ``` ]] .column[.content.vmiddle[ <img src="figure/unnamed-chunk-9-1.svg" width="504" style="display: block; margin: auto;" /> ]] --- class: split-two # What do they look like? .column[.content.vmiddle[ ### Monthly ```r library(gratis) library(feasts) set.seed(2021) *mar_model(seasonal_periods=12) %>% # generate(length=120, nseries=2) %>% autoplot(value) + theme(legend.position="none") ``` ]] .column[.content.vmiddle[ <img src="figure/unnamed-chunk-10-1.svg" width="504" style="display: block; margin: auto;" /> ]] --- # Visualisation in 2D space <center> <img src="figs/coverage.png" height="550px"/> </center> --- class: split-two # Efficient time series generation with target features .column[.content.vmiddle[ Practitioners in certain areas may be **only interested in a subset of features**. - Heteroskedasticity and volatility in financial time series. - Peaks and spikes in energy time series. ]] -- .column[.content.vmiddle[ How? - Genetic Algorithm (GA) to evolve time series with length `\(n\)`. - GA to tune the MAR model parameters `\(\Theta = (\alpha_k, \phi_i)\)`. ]] --- class: split-two # Time series generation with target features .column[.content.vmiddle[ ```r set.seed(2021) generate_ts_with_target( n = 1, ts.length = 120, freq = 12, seasonal = 1, features = c('stl_features'), * selected.features = c('seasonal_strength', 'trend'), # * target = c(0.9, 0.9)) %>% # autoplot() ``` .small[Web application available at: https://ebsmonash.shinyapps.io/tsgeneration/.] ]] .column[.content.vmiddle[ <img src="figure/unnamed-chunk-11-1.svg" width="504" style="display: block; margin: auto;" /> ]] --- class: center, split-66 hide-slide-number background-image: url(figs/generation.jpg) background-size: cover .column[ .sliderbox.vmiddle.shade_main.center[ .font2[Automatic feature extraction] .white.left[*Time series imaging*] ]] .column[ ] --- class: split-two # Forecasting with time series imaging .column[.content.vmiddle[ <center> <img src="figs/imaging.png" height="500px"/> </center> ]] .column[.content.vmiddle[ - First transform time series into recurrence plots, from which local features can be extracted using computer vision algorithms. - SBoF (spatial bag-of-features) model and deep neural networks are used. - The extracted features are used for forecast combination. - We obtain highly comparable performances to the top methods in the M4 and tourism competitions. ]] --- class: center, split-66 hide-slide-number background-image: url(figs/generation.jpg) background-size: cover .column[ .sliderbox.vmiddle.shade_main.center[ .font2[Automatic feature extraction] .white.left[*Forecast diversity*] ]] .column[ ] --- # How to use diversity to forecast? - Input: the forecasts from a pool of models. - Measure their diversity, a **feature** that has been identified as a decisive factor in forecast combination (Thomson, Pollock, Onkal, and Gonul, 2019; Lichtendahl and Winkler, 2020). - Through meta-learning, we link the diversity of the forecasts with their out-of-sample performance to fit combination models based on diversity. --- # Diversity matters? .pull-left[ <img src="figure/unnamed-chunk-13-1.svg" width="576" style="display: block; margin: auto;" /> ] .pull-right[ ``` #> # A tibble: 9 × 2 #> Method MASE #> <chr> <dbl> #> 1 auto_arima_forec 0.363 #> 2 nnetar_forec 0.396 #> 3 tbats_forec 0.408 #> 4 snaive_forec 0.412 #> 5 ets_forec 0.435 #> 6 naive_forec 0.53 #> 7 rw_drift_forec 0.603 #> 8 thetaf_forec 0.744 #> 9 stlm_ar_forec 0.822 ``` ] --- # Combining two methods .pull-left[
] .pull-right[ ``` #> # A tibble: 9 × 2 #> Method MASE #> <chr> <dbl> #> 1 auto_arima_forec 0.363 #> 2 nnetar_forec 0.396 #> 3 tbats_forec 0.408 #> 4 snaive_forec 0.412 #> 5 ets_forec 0.435 #> 6 naive_forec 0.53 #> 7 rw_drift_forec 0.603 #> 8 thetaf_forec 0.744 #> 9 stlm_ar_forec 0.822 ``` ] --- # Combining two methods .pull-left[
] .pull-right[ ``` #> # A tibble: 9 × 2 #> Method MASE #> <chr> <dbl> #> 1 auto_arima_forec 0.363 #> 2 nnetar_forec 0.396 #> 3 tbats_forec 0.408 #> 4 snaive_forec 0.412 #> 5 ets_forec 0.435 #> 6 naive_forec 0.53 #> 7 rw_drift_forec 0.603 #> 8 thetaf_forec 0.744 #> 9 stlm_ar_forec 0.822 ``` ] --- # Combining two methods .pull-left[
] .pull-right[ ``` #> # A tibble: 9 × 2 #> Method MASE #> <chr> <dbl> #> 1 auto_arima_forec 0.363 #> 2 nnetar_forec 0.396 #> 3 tbats_forec 0.408 #> 4 snaive_forec 0.412 #> 5 ets_forec 0.435 #> 6 naive_forec 0.53 #> 7 rw_drift_forec 0.603 #> 8 thetaf_forec 0.744 #> 9 stlm_ar_forec 0.822 ``` ] --- # Yes, diversity matters! <img src="figure/unnamed-chunk-21-1.svg" width="864" style="display: block; margin: auto;" /> --- # Yes, diversity matters! <img src="figure/unnamed-chunk-22-1.svg" width="864" style="display: block; margin: auto;" /> --- # Measuring diversity .small[ $$ `\begin{aligned} MSE_{comb} & = \frac{1}{H} \sum_{i=1}^{H}\left( \sum_{i=1}^{M}w_if_{ih} - y_{T+h}\right)^2 \\ & = \frac{1}{H}\sum_{i=1}^{H}\left[ \sum_{i=1}^{M}w_i(f_{ih} - y_{T+h})^2 - \sum_{i=1}^{M}w_i(f_{ih} - f_{ch})^2\right] \\ & = \frac{1}{H}\sum_{i=1}^{H}\left[\sum_{i=1}^{M}w_i(f_{ih} - y_{T+h})^2 - \sum_{i=1}^{M-1} \sum_{j=1,j>i}^{M}w_iw_j(f_{ih}-f_{jh})^2\right] \\ & = \sum_{i=1}^{M}w_i MSE_i - \sum_{i=1}^{M-1} \sum_{j=1,j>i}^{M}w_iw_jDiv_{i,j}, \end{aligned}` $$ `\(H\)` is the forecasting horizon, `\(M\)` is the number of forecasting methods, and `\(T\)` is the historical length. ] --- # Measuring diversity $$ `\begin{aligned} Div_{i,j}& = \frac{1}{H} \sum_{i=1}^{H} (f_{ih}-f_{jh})^2, \\ sDiv_{i,j} &= \frac{\sum\limits_{h=1}^H(f_{ih}-f_{jh})^2}{\sum\limits_{i=1}^{M-1}\sum\limits_{j=i+1}^M\left[\sum\limits_{h=1}^H(f_{ih}-f_{jh})^2\right].} \end{aligned}` $$ --- # Diversity for forecast combination <img src="figs/diversity-extraction.jpg"/> --- # Method pool <center> <img src="figs/methods.png" height="500px"/> </center> --- # Forecasting M4 100,000 series with various frequencies. <center> <img src="figs/XGBoost.png" width="800px"/> </center> --- # Forecasting FMCG Sales of fast moving consumer goods (FMCG) from a major North American food manufacturer - Monthly data. - April 2013 to June 2017. <center> <img src="figs/FMCG.png" width="800px"/> </center> --- # Other challenges .content-box-gray[Meta-learners] .content-box-green[Intermittent data] .content-box-red[Uncertainty estimation] .content-box-gray.small[ - Thiyanga S. Talagala, Feng Li, Yanfei Kang (2021). FFORMPP: Feature-based forecast model performance prediction, *International Journal of Forecasting*. - Evangelos Theodorou, Shengjie Wang, Yanfei Kang, Evangelos Spiliotis, Spyros Makridakis, Vassilios Assimakopoulos (2021). Exploring the representativeness of the M5 competition data, *International Journal of Forecasting*. - Xiaoqian Wang, Yanfei Kang, Fotios Petropoulos, Feng Li (2021). The uncertainty estimation of feature-based forecast combinations, *Journal of the Operational Research Society*. - Li Li, Yanfei Kang, Feng Li (2021). Bayesian forecast combination using time-varying features. [Working paper](https://arxiv.org/abs/2108.02082). ] --- class: center, middle # Thanks! ### `Web`: *http://yanfei.site* ### `Lab`: *http://kllab.org* <center> <img src="figs/rob.png" width="200px"/> <img src="figs/kate.jpeg" width="200px"/> <img src="figs/fotios.jpg" width="200px"/> <img src="figs/feng.jpg" width="200px"/> <img src="figs/vangelis.jpeg" width="200px"/> <br> <img src="figs/thiyanga.jpg" width="200px"/> <img src="figs/xixi.png" width="200px"/> <img src="figs/wei.jpg" width="200px"/> <img src="figs/xiaoqian.png" width="200px"/> <img src="figs/lili.jpeg" width="200px"/> </center>