New paper: Exploring the representativeness of the M5 competition data

Authors: Evangelos Theodorou, Shengjie Wang, Yanfei Kang, Evangelos Spiliotis, Spyros Makridakis, Vassilios Assimakopoulos 

Abstract: The main objective of the M5 competition, which focused on forecasting the hierarchical unit sales of Walmart, was to evaluate the accuracy and uncertainty of forecasting methods in the field in order to identify best practices and highlight their practical implications. However, whether the findings of the M5 competition can be generalized and exploited by retail firms to better support their decisions and operation depends on the extent to which the M5 data is representative of the reality, i.e., sufficiently represent the unit sales data of retailers that operate in different regions, sell different types of products, and consider different marketing strategies. To answer this question, we analyze the characteristics of the M5 time series and compare them with those of two grocery retailers, namely Corporación Favorita and a major Greek supermarket chain, using feature spaces. Our results suggest that there are only small discrepancies between the examined data sets, supporting the representativeness of the M5 data.

Links: Working paper

New Paper: Exploring the social influence of Kaggle virtual community on the M5 competition

Authors: Xixi Li, Yun Bai, Yanfei Kang

Abstract: One of the most significant differences of M5 over previous forecasting competitions is that it was held on Kaggle, an online community of data scientists and machine learning practitioners. On the Kaggle platform, people can form virtual communities such as online notebooks and discussions to discuss their models, choice of features, loss functions, etc. This paper aims to study the social influence of virtual communities on the competition. We first study the content of the M5 virtual community by topic modeling and trend analysis. Further, we perform social media analysis to identify the potential relationship network of the virtual community. We find some key roles in the network and study their roles in spreading the LightGBM related information within the network. Overall, this study provides in-depth insights into the dynamic mechanism of the virtual community’s influence on the participants and has potential implications for future online competitions.

Links: Working paper

New paper: Improving forecasting with sub-seasonal time series patterns

Authors: Xixi Li, Fotios Petropoulos, Yanfei Kang

Abstract: Time series forecasting plays an increasingly important role in modern business decisions. In today’s data-rich environment, people often aim to choose the optimal forecasting model for their data. However, identifying the optimal model often requires professional knowledge and experience, making accurate forecasting a challenging task. To mitigate the importance of model selection, we propose a simple and reliable algorithm and successfully improve the forecasting performance. Specifically, we construct multiple time series with different sub-seasons from the original time series. These derived series highlight different sub-seasonal patterns of the original series, making it possible for the forecasting methods to capture diverse patterns and components of the data. Subsequently, we make forecasts for these multiple series separately with classical statistical models (ETS or ARIMA). Finally, the forecasts of these multiple series are combined with equal weights. We evaluate our approach on the widely-used forecasting competition datasets (M1, M3, and M4), in terms of both point forecasts and prediction intervals. We observe improvements in performance compared with the benchmarks. Our approach is particularly suitable and robust for the datasets with higher frequencies. To demonstrate the practical value of our proposition, we showcase the performance improvements from our approach on hourly load data.

Links: Working paper