Nonnegative Matrix Factorization Based Decomposition for Time Series Modelling

. We propose a novel method of time series decomposition based on the non-negative factorization of the Hankel matrix of time series and apply this method for time series modelling and prediction. An interim (surrogate) model of time series is built from the components of the time series using random cointegration, while the best cointegration is selected using a nature-inspired optimization method (Artificial Bee Colony). For modelling of cointegrated time series we use the ARX (AutoRegressive with eXogenous inputs) model. The results of modelling using the historical data (daily highest price) of S&P 500 stocks from 2009 are presented and compared against stand-alone ARX models. The results are evaluated using a variety of metrics (RMSE, MAE, MAPE, Pearson correlation, Nash-Suttcliffe efficiency coefficient, etc.) as well as illustrated graphically using Taylor and Target diagrams. The results show a 51-98% improvement of prediction accuracy (depending upon accuracy metric used). The proposed time series modelling method can be used for variety applications (time series denoising, prediction, etc.).


INTRODUCTION
Modelling of time series (including the prediction of any future values) is an important problem with many areas of application, including problems in bioinformatics [1], medical diagnosis [2], air pollution forecasting [3], industrial machine condition monitoring [4], environmental modelling [5], financial investment [6], production planning [7], sales forecasting [8] and stock portfolio management [9].In the domain of financial investment management, to invest and take proper decision, a more precise forecasting of financial environments is an important issue.
Financial time series originate from real-world systems (such as stock exchanges) and represent the outcome of complex multi-layer dynamic interactions between multiple agents [10].Real-world systems have mostly nonlinear and non-stationary behaviour.A nonlinear time series is a signal generated by a nonlinear dynamic process, in which the output is not directly proportional to the input, i.e. even a small change in input may lead to large change in output.Moreover, real-world systems often operate under transient non-stationary conditions, characterized by time-changing statistics.These properties prevent from acquisition of an effective predictive model and its reliable application.Therefore, the researchers are motivated to explore new data modelling methods.
Time series can be represented by different patterns of behaviour.Sometimes it is useful to decompose time series into several components, each representing some pattern or class of behaviour.It is common to decompose the economical time series data into trend (long term variation), cyclical (repeated but non-periodic fluctuations), seasonal and irregular (or noise) components [11].Predictability can be used as a criterion to decomposing a times series into deterministic and non-deterministic components (Wold decomposition [12]).STL (Seasonal Trend decomposition using Loess) [13] can be used to estimate seasonal effects on time series and to predict future seasonally adjusted values.When they have time varying properties, the Holt-Winters decomposition [14] uses exponential to derive the decomposition model, which also includes error corrections in forecasting future values of the time series.Wavelet decomposition [15] decomposes a time series using the scaling function and wavelet functions (i.e.mother wavelet).Second generation wavelets can be used to generate wavelet functions in the spatial domain and to deal with complex structure, arbitrary boundary conditions, and irregular sampling intervals of time series [16].
Empirical Mode Decomposition (EMD) [17] initially has been developed for natural and engineering sciences for analysing nonlinear and non-stationary data such as sea wave data and earthquake signals, but has been applied to financial data as well [18].EMD decomposes any time series into a finite number of intrinsic mode functions (IMF).Since the decomposition is based on the local characteristic time scale of the data, it is applicable to non-linear and non-stationary processes, and also can extract variability on different time scales.By performing clustering [19] or remixing [20] of IMFs one can identify different structural patterns of a time series and further analyze them with respect with their orthogonality and cross-correlation properties.However, the application of EMD has many important problems such as the end effect [21] and the IMF stopping criterion [22].Extensions of EMD such as BoostEMD [23] try to alleviate these problems and improve the characteristics of IMF.
Despite long history and the availability of many time series decomposition algorithms and models, in many cases they are unable to account for complex structure (such as multiple or fractional seasonality) of time series.
We propose a novel method of time series decomposition based on the nonnegative factorization of the Hankel matrix of time series in Section 2. In Section 3 we apply the method for time series prediction using the random cointegration approach.We describe the results of experiments using the historical stock data in Section 4. Finally, conclusions and discussion of future work are presented in Section 5.

DECOMPOSITION METHOD
We propose a new EMD-like signal decomposition method, called Nonnegative Hankel Matrix Factorization based Decomposition (NHMFD).The task is to decompose signal ( ) x t into long term (trend), cyclical, and irregular components: ( ) here j c is a cyclical component (or Intrinsic Mode Funtion, IMF), and T r is a trend, and R r is the irregular (random) component.
The steps for proposed decomposition are as follows: 1. Normalization: the signal ( ) x t is represented as time series X and normalized to [0,1] as follows: here min X and max X are the smallest and largest values of the time series.

Construction of Hankel matrix:
Hankel matrix H of the normalized time series ⋯ is a square matrix: The size of Hankel matrix H is 1 2 here l is the length of .X

Reconstruction of Hankel matrix:
The signal is reconstructed as here H ′ is the reconstructed Hankel matrix, E is the reconstruction error.

Reconstruction of signal:
The signal is reconstructed from the reconstructed Hankel matrix H ′ by taking the means of matrix elements along its minor (secondary) diagonals as follows: { } , , , , , The reconstruction of a signal is not exact, i.e. , here ε is the re- construction error.Linear regression is performed to find the fitting coefficients α and β such as to minimize error ( ) Calculation of intrinsic mode: the mode of a signal is calculated as follows.
Let fitted reconstructed signal be: X X . Then the first component of the signal is defined by centering the fitted reconstructed signal as: and the decomposition residue is ˆ.
7. Iterative decomposition of residue: decomposition is continued with residue until the desired number of extracted modes is reached.8. Denormalization.Extracted modes and residue are denormalized as follows:  Fig. 1 shows an example of decomposition for a time series (1st subplot).The series was decomposed into two cyclical components, irregular residue and trend residue.

MODELLING METHODOLOGY
To derive a model of a time series, we use a combination of several methods: 1) Nonnegative Hankel Matrix Factorization based Decomposition (NHMFD) proposed in Section 2 to derive Intrinsic Mode Functions (IMFs) (see Section 2).
2) Random cointegration of IMFs.A random vector P is generated that is multiplied by the IMF component matrix C to obtain a surrogate time series .
S Cointegration is defined as the existence of a stationary linear combination of nonstationary time series [24].This indicates that there is a long run equilibrium relationship between the series.While stationarity in a strict statistical sense may not be always achieved, cointegration is considered to improve the statistical characteristics of time series [25], which is good for predictability.The cointegrated series can be considered as surrogate data series [26], which inherit the statistical properties of the original data but may have some desired properties such as predictability.The original time series can be restored as a sum of IMFs with unitary weights as given by Eq. 1.The cointegrated (surrogate) time series S can be generated by summing time series components j c with random weights j p as follows: 3) Selection of best fitting random cointegration vector P using a nature-inspired optimization method.We use Artificial Bee Colony (ABC) [27] algorithm to find a cointegration vector that ensures the lowest RMSE of 1-step ahead prediction using the ARX model on surrogate testing data.ABC is based on a model of foraging behaviour of a honeybee colony.This model consists of three essential components: food sources, employed foragers, and unemployed foragers, and defines two leading modes of the honeybee colony behaviour: recruitment to a food source and abandonment of a source.The ABC algorithm implements a population-based search in which artificial bees change their positions aiming to discover the places of food sources with high nectar amount and finally the one with the highest nectar.4) Modelling of cointegrated time series using ARX (autoregressive with exogenous inputs) model [28], with a surrogate time series ˆŜ PC = as an input and original time series as output.The input-output relationship of an ARX model is:

EXPERIMENTS & RESULTS
We use the historical data (2009) of S&P 500 stocks (245 days, 379 stocks with non-zero value).Here we analyze the daily highest price value.The dataset is separated into 3 subsets: train (40%), test (40%) and validate (20%).Train subset is used for decomposition, test subset for finding best co-integration vector and validate subset for calculating prediction accuracy.We decompose each time series into 4 components (2 cyclical, trend and irregular residues).For the ABC algorithm, we use 100 bees (colony size), maximum number of iterations is 200.Algorithm is run 20 times and the best solution is retained.
We model the 1-day ahead value of time series and evaluate model accuracy using a variety of metrics (RMSE (Root Mean Square Error), MAE (Mean Absolute Error), MAPE (Mean Absolute Percentage Error), MSE (Mean Squared Error), Pearson correlation, Nash-Suttcliffe (NS) efficiency coefficient, MSPE (Mean Squared Prediction Error), RMSPE (Root Mean Square Percentage Error)) as well as illustrate graphically using Taylor diagram and target diagram.
Fig. 2 shows the mean values of RSME, MAPE, MAE and MSE.In all cases, the model of a time series derived using the proposed method is better (error is smaller).
Fig. 3 shows the mean values of Pearson correlation, NS coefficient as well as the mean rank.Mean rank was calculated by ranking the modelling results of each stock (better = 1, worse = 2) based on the RMSE value.The mean rank of the proposed method (1.21) is better (lower) than the mean rank of the ARX model (1.79).The difference is significant with p = 3•10 -67 (using paired t-test).Fig. 4 presents the probability that the proposed method will score better than the ARX model on a given accuracy metric.The probability value was calculated by bootstrapping (random sampling with replacement) using 10000 samples.The probability that the proposed method yields better accuracy ranges from 0.709 (using MAE) to 0.916 (using Pearson correlation).The results are summarized in Table 1.Mean values of accuracy metrics are given and a relative improvement of the proposed method over ARX.Overall, the proposed method has improved the prediction accuracy of 78.6 % of stocks (by RMSE).

CONCLUSION
We have introduced a new signal decomposition method based on the non-negative factorization of the Hankel matrix of a time series.The method offers some advantage over its conceptually similar counterpart, Empirical Mode Decomposition (EMD), by allowing to obtain the desired number of Intrinsic Mode Functions (IMF).The results of decomposition have been used to generate surrogate time series by prediction error minimization driven random cointegration of IMFs.In this paper, cointegration, to our knowledge, has been applied for the first time to IMFs, ever.The surrogate time series used as input allows for more accurate prediction of time series than predicting directly from the original time series.The results obtained using historical stock data and the ARX prediction model show a mean improvement of accuracy of 51-98% depending upon considered accuracy metric, while probability to achieve higher accuracy is 70-91% when compared to ARX prediction using the original time series only.
Future work will focus on the application of the proposed method on other datasets as well as on comparison with other time series modelling and prediction methods.

3 .
Nonnegative matrix factorization: performs factorization of the form H V W ≈ × , here are positive matrices, and k is the number of factors.Factorization is not exact: matrices V and W are chosen to minimize the residual D between H and :

9 .
Extraction of trend and irregular component.Finally, the residue is decomposed into trend and irregular (random or noise) component by computing the least-squares fit of a straight line to the residue and subtracting the resulting function from the residue as follows:,

Fig. 4 .
Fig. 4. Probability that the proposed method scores better than ARXTaylor diagram (Fig.5, left) is used to perform the comparative assessment of several different models and to quantify the degree of correspondence between the modelled and observed behaviour in terms of three statistics: Pearson correlation, RMSD (Root Mean Square Deviation), and the standard deviation[29].The model that is located lower on the diagram is considered to represent the reality better.Note that the proposed model (Prop) is lower than the ARX model.

Table 1 .
Summary of results