Fuzzy Time Series Forecasting Model Based on Automatic Clustering Techniques and Generalized Fuzzy Logical Relationship

,


Introduction
In nearly two decades, fuzzy time series approach introduced by Song and Chissom has been used widely for its superiorities in dealing with imprecise knowledge (like linguistic) variables in decision making.In the literature, many studies have been made to propose new methods or improve forecasting accuracy for fuzzy time series forecasting.For simplifying the computational process, Chen [1] improved Song's methods and presented a simplified forecast model in 1996.Since the lengths of intervals greatly affect forecasting accuracy in fuzzy time series, Yu and many others [2][3][4][5] adjusted the lengths of intervals with use of distribution or optimization technique.In view of higher accuracy of forecast results, the weighted forecast models concerned with the various recurrence and chronological order had also been improved by some researchers [6][7][8].In addition, many original models based on the conventional fuzzy time series were presented and combined with novel algorithms or technologies.For example, Singh [9,10] proposed fuzzy forecast methods to forecast the crop production based on computational method with difference parameters.Lee et al. [11][12][13][14] presented several fuzzy forecast models based on the fuzzy time series, genetic algorithm, the simulated annealing algorithm, and type-2 fuzzy set to forecast temperature and TAIFEX.Kuo et al. [15] firstly introduced the particle swarm optimization (PSO) into the fuzzy time series models for forecasting TAIFEX.Song's [16] and Aladag's models [17,18] gained more accurate forecasts by employing artificial neural network to determine fuzzy relationships.
Since first-order fuzzy time series models have a simple structure, they are easy when facing trouble on explaining more complex relationships.And the first-order models are not able to meet the demand of prediction involved in multifactors or long-term time series.As compared with the alternative forecasting models, such as ARIMA, hidden Markov, and Arch models, there are still much room for higher forecasting accuracy in applying fuzzy time series 2 Mathematical Problems in Engineering models.For these reasons, Chen et al. [19][20][21][22][23] proposed some new methods which analyze high-order fuzzy time series forecasting model to deal with the enrollments forecasting problem.Aladag et al. [17] introduced a high-order model based on feed-forward neural network.Lee et al. [12,24] had also presented some high-order models based on two-factor and genetic simulated annealing techniques.Most of time series researchers [14,[25][26][27][28] had shown their, respectively, interest in high-order fuzzy time series forecasting models.
In process of forecasting with fuzzy time series models, fuzzy logical relationship (FLR) is one of the most critical factors that influence the forecasting accuracy.In view of techniques for partitioning the universe of discourse and constructing the fuzzy logic relationships effectively, the above high-order models consist of three parts.The first one is mining the FLRs by applying some advanced algorithms such as genetic algorithms, rough set, neural networks, type-2 fuzzy set, and simulated annealing algorithm [12,14,17,18,20,21,25,27]. The second one is the class represented by Singh [9,10] whose models are based on computational method.The last but not least one is the kind of models based on grouping the FLRs represented by [19, 22-24, 26, 28].The first type of hybrid models can get higher forecast accuracy than the other two classes.However, the forecasting process of these algorithms, like a black box, is not easy to be understood.Unlike the fuzzy set theory, its procedure and forecasts are not understandable and accountable for most of decision makers.Although the second type of models had been implemented on a real life problem of crop production and rice production besides enrollment forecast, the models have little to do with FLRs in the procedures of forecasting.It just obtains high forecasting accuracy by dividing the intervals to produce accurate localizations of the forecasting values.In regard to the third type models, the procedures of mining FLRs and forecasting principles are based solely on the FLRs sets, that is, conventional fuzzy time series.The forecasting procedure and principles had been expressed clearly for fuzzy time series researchers and easy to be understood by the decision maker who does not know anything about fuzzy set theory or prerequisite advanced algorithms.
For these reasons, this study proposes a high-order fuzzy time series model based on automatic clustering [28][29][30] and generalized fuzzy logical relationships [31].The process of abstracting the relationships matrixes among time series and finding out the patterns of time series fluctuations are carried out on the basis of understandable fuzzy rules.Of the above three kinds of models, the proposed method belongs to the third.Since the models of [19,26] are similar finding the most appropriate forecasting principle with statetransition analysis and backtracking scheme, the models of [24,26] aim for multifactor forecasting problems and [28] are improved by finding an optimal interval length.[19,22,23] then choose as the counterparts for comparing the singlefactor forecasting results with determining length of interval.There are two data sets used for the empirical analysis: the enrollments of the University of Alabama and Shanghai Stock Composite Index close price.In view of the three criteria of evaluations: the root mean squared error, mean absolute error, and mean absolute percentage error, the proposed method gets a higher forecasting accuracy rate than the counterparts.

Preliminaries
2.1.Some Definitions.In view of making our exposition selfcontained, this section provides some definitions and the framework of fuzzy time series models.Followed with some related definitions of fuzzy time series, framework of fuzzy time series model [16,[32][33][34] is summarized in the second part of this section.At the end of this section, the definition and an operation for generalized fuzzy logical relationship are also presented to prepare for the proposed model in the next section.
Definition 3. Let (−1) =   and () =   .The relationship between two consecutive observations, () and ( − 1), referred to as a fuzzy logical relationship (FLR), is denoted by   →   , where   is called the left hand side (LHS) and   the right-hand side (RHS) of the FLR.
is called the th principal fuzzy logical relationship.
This operation is not the only one for combining knowledge of all principal fuzzy logical relationships; one can define someone else for this work.Based on these definitions, this paper will present a high-order fuzzy time series model in the following section.

Automatic Clustering Algorithm.
In this section, we review the automatic clustering algorithm for clustering numerical data.The algorithm is essentially a modification and improvement of the one presented [29,30].The algorithm is now summarized as follows.
Step 1. Sort the numerical data in an ascending order.Assume that there are  different numerical data in the sorted data sequence and the ascending numerical data sequence is as follows: where   1 ,   2 , . . ., and    1 denote the numerical data with the same value, 1 ≤  ≤  and  1 ≥ 1.Then, the value of "average dif, " which denotes the average of the differences between any two adjacent data, is defined with following formula: Step 2. Based on the value of ave dif, determine wherever two adjacent numerical data   and   in the data sequence can be put into a cluster by the following three rules.
(1) If   is the only element in the first cluster,   is a datum following   , shown as {  },   , . .., and   −   ≤ ave dif; then put   into the cluster in which   belongs to.
(2) Given that   is the last element in some cluster and   is a datum following   , shown as . . ., {. . .,   }, . .., let clu dif denote the average difference of the distances between every pair of adjacent data in the cluster and calculated as where    denotes the element in a cluster and 1 ≤  ≤ .If   −   ≤ ave dif and   −   ≤ clu dif; then put   into the cluster in which   belongs to.
(  (c) Repeatedly check the current interval and the current cluster until all the clusters have been checked.

Proposed Model
In this section, we present a novel multivariable forecasting model based on automatic clustering algorithm and generalized fuzzy logical relationships.Since the proposed method is a fuzzy time series model related to the number of factors denoted as  and principal fuzzy relationship denoted as , we name it GTS(, ).In other words, GTS(, ) means a multivariable fuzzy time series model based on  factors and  principal fuzzy logical relationships.In the similar way of the conventional fuzzy time series forecasting models, the proposed algorithm is introduced in a stepwise manner as follows.
Step 2 (define fuzzy sets based on the universe of discourse and fuzzify the historical data for each factor).For a given factor, the fuzzy set   would be expressed as The membership degree of the value   at time  in   ( = 1, 2, . . ., ) is defined by the following formula: where   is the value at time  and   is the length of interval   .
Step 3 (establish the fuzzy logical relationships based on the number of factors and principal relationship).Given the sample data set and the definition of fuzzy sets, all fuzzy logical relationships between two consecutive data are created.To forecast the time series, the fuzzy logical relationship matrix must be created in this step based on the fuzzy logical relationships.There are many different methods for the work; this paper applies the method proposed by Lee et al. in [8].For example, the fuzzy logical relationships of a GTS(, ) model can be grouped into  ×  relationship matrixes denoted as  (,) ( = 1, 2, . . ., ,  = 1, 2, . . ., ).
Then, for a given , there are  forecasts for time .The conclusive forecasting value for time  can be obtained by following formula: where   ( = 1, 2, . . ., ) is the adjustment parameter for the th forecast; one can obtain it by minimizing the RMSE of the training data set.With the adjustments, the conclusive forecasting value can be adapted to simulate the fluctuation pattern of training data.
To know the detailed process of the model, please confer to reference [34].

Data Description.
To demonstrate the effectiveness of the proposed models, large and multiple amounts of data are needed.Since the fuzzy time series forecasting model was proposed, the enrollment data were widely used to test the improved methods.Furthermore, financial data are the most popular target of research, the typical data, Stock Price, have also been studied by people.Thus, the enrollments and Shanghai stock exchange composite index closing prices are used as the illustration data sets for the empirical analysis.The financial data used in this paper are the daily Shanghai Stock Exchange Composite Index (SSECI) closing prices covering the period from 1997 to 2006.

Criteria of Evaluation.
In statistics, the root mean squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) are three ways to quantify the difference between values implied by an estimator and the true values of the quantity which have been estimated before.MSE is a risk function to measure the average of the squares of the difference.For an unbiased estimator, the MSE is the variance, and the RMSE is the square root of the variance known as the standard error.In addition, RMSE has over MSE that its scale is the same as the forecast data.Thus, we take RMSE as the first representative of the size of an "average" error.MAE is also measured in the same units as the original data and is usually similar in magnitude to, but slightly smaller than, the RMSE.Because percentage errors are not scale-independent and MAPE is an average of the absolute percent errors, furthermore, MAPE is simply to calculate and easy to be understood, which attest to its popularity; we also take it as a criterion for comparisons of forecasting results in the paper.

Performance Evaluation.
In Table 1, we have listed the results of GTS(1, ) on enrollments prediction.Compared with Hwang's method [23] and Chen's methods [19,22] on the enrollment experiment, the proposed model has a more accurate prediction.Moreover, we also apply the proposed method to handle forecasting the close price of Shanghai stock index.The comparison of the four criteria of 1997's SSECI is listed in Table 2. From these two tables, we can see that the proposed GTS(, ) gets more accurate predicted results with the increase of .On all standards of evaluation, the trends are the same as the data of the 1997's SSECI.Figure 1 has depicted the depicted results of the 1997's SSECI monthly.In Figure 2, the last 41 predicted results of the 1997's SSECI are depicted.From these figures, we can easily draw the same conclusion that the higher  value, the more accurate the prediction.
In fact, we can also arrive at the same conclusion from the results of other years.The mean predictions of ten years' SSECI from 1997 to 2006 are listed in Table 3. From the table, we can also arrive to the same conclusion that the higher  values or  values, the more accurate the prediction.From Tables 2 and 3, it is clear that the higher  values gain little RMSE and the higher order model is better than the lower.This conclusion can also be testified by Figure 2 which has told us another important message that the shorter length of interval can result in robuster forecast errors.All of these conclusions have an important meaning for the proposed mode which can be applied on other data set or area.

Conclusion
After discussing the high-order fuzzy time series models and presenting the definition and operation for generalized fuzzy logical relationship, we have proposed a novel high-order fuzzy time series models based on the new relationship and automatic cluster.The work is driven by three main reasons.Firstly, it is to generalized the fuzzy logical relationship; secondly, it is to abstract the relationships matrixes among timeseries and find out the patterns of time series fluctuations based on understandable fuzzy rules.The last one is to make the fuzzy time series model able to explain more complex relationships.
In the future research, some suggestions are provided to improve this paper.Firstly, the relation of the principal fuzzy relationships and the conventional fuzzy relationships need to be discussed deeply.For example, what is the effects on the forecasting results threw by the definition of membership function and the operations of principal fuzzy logical relationship.Since the proposed model is on the basis of fuzzy logical relationship, to broaden the application of the proposed model and to obtain more forecasting accuracy, it is worth improving the model by hybridizing some advanced algorithms.
proposed model with m = 3 and n = 1 The proposed model with m = 3 and n = 3 The proposed model with m = 3 and n = 5
⋅ ⋅+  /  , or   = ( 1 ,  2 , . . .,   ), where   ∈ [0, 1], 1 ⩽ ,  ⩽ ,  is the number of intervals for the given factor.The value of   indicates the membership degree of   in   .The historical and observed data are fuzzified according to the definition of fuzzy sets.For example, a datum is fuzzified to   when the maximal degree of membership of that datum is in   ; in other words, if   = max{ 1 ,  2 , . . .,   }, (1 ⩽  ⩽ ), then the data at time  should be classified into the th class.In this paper, the fuzzy sets are defined with triangular fuzzy function showed by the following formula: