Sales Growth Rate Forecasting Using Improved PSO and SVM

Accurate forecast of the sales growth rate plays a decisive role in determining the amount of advertising investment. In this study, we present a preclassification and later regression based method optimized by improved particle swarm optimization (IPSO) for sales growth rate forecasting. We use support vector machine (SVM) as a classification model. The nonlinear relationship in sales growth rate forecasting is efficiently represented by SVM,while IPSO is optimizing the training parameters of SVM. IPSO addresses issues of traditional PSO, such as relapsing into local optimum, slow convergence speed, and low convergence precision in the later evolution. We performed two experiments; firstly, three classic benchmark functions are used to verify the validity of the IPSO algorithm against PSO. Having shown IPSO outperform PSO in convergence speed, precision, and escaping local optima, in our second experiment, we apply IPSO to the proposed model. The sales growth rate forecasting cases are used to testify the forecasting performance of proposed model. According to the requirements and industry knowledge, the sample data was first classified to obtain types of the test samples. Next, the values of the test samples were forecast using the SVM regression algorithm. The experimental results demonstrate that the proposed model has good forecasting performance.


Introduction
Advertising investment and sales growth rate are interrelated.Understanding the relationship between these two, and forecasting the "sales growth rate" correctly, is very important for efficient and effective advertising investment under the market economy.Developing sales growth rate forecasting model is nontrivial due to its uncertain, nonlinear, dynamic, and complicated characteristics.Some recent and most commonly used forecasting models are neural network based prediction model [1], multiple linear regression analysis model [2], and grey forecasting model [3].However, these models have their own weaknesses.For example, the neural network based model converges to the locally optimal solutions, which has a negative influence on forecasting results.Multiple linear regression analysis requires correct premises and assumptions and simultaneous examination of multiple dependent variables, which is not trivial.Although grey forecasting model can be constructed by taking into consideration only a few samples, yet it only depicts a monotonously increasing or decreasing process [4], which is not what the sales growth behavior looks like.In order to overcome the above problems, it is important to look for a new forecasting method to forecast sales growth rate.
Support vector machine (SVM) is a novel machine learning method based on statistical learning theory, which has a good generalization capability for small training samples and yields higher accuracy [5,6].SVM has been successfully applied in different fields such as real estate price forecasting [7], face recognition [8], the business failure prediction [9], face detection [10], EMG signals classification for diagnosis of neuromuscular disorders [11], residential house's damage effect near Open-pit against blasting vibration prediction [12], detecting top management fraud [13], microarray data classification [14], default prediction of the small and medium enterprises [15], and traffic flow prediction [16].
In this paper, the forecasting technique of preclassification and later regression is presented, which is effective and feasible for small samples regression and short-term prediction of time series.As the choice of the parameters heavily influences the forecasting accuracy, hence, to obtain an optimal SVM forecasting model, it is important to choose a good 2 Mathematical Problems in Engineering kernel function, tune the kernel parameters, and determine a soft margin constant  and -insensitive loss parameter [17,18].Currently, techniques such as grid search (GS), genetic algorithms (GA), and particle swarm optimization (PSO) have been used for the parameters optimization [19,20].Compared with GA, particle swarm optimization was found to have the capability of global optimization, simplicity and ease of implementation [21]; however, the standard PSO has some demerits, such as relapsing into local optimum, slow convergence speed, and low convergence precision in the later evolution.In order to overcome the above shortcomings, we proposed and improved PSO (IPSO) technique, where evolution speed factor and aggregation degree factor of the swarm are introduced to improve the convergence speed, and the position-extreme strategy is used to avoid plunging into local optimum.In each iteration process, the inertia weight is changed dynamically based on the current evolution speed factor and aggregation degree factor, which makes the algorithm attain effective dynamic adaptability.Considering the above advantages, this study introduces the IPSO as an optimization technique to simultaneously optimize the SVM parameters.
Furthermore, to achieve a better forecast performance for sales growth rate, combining the IPSO with the "forecasting technique of preclassification and later regression, " we proposed the "regression model based on SVM classification" optimized by IPSO.In summary, the main contributions of this paper are as follows.
(1) We design and implement a growth rate forecast model running on the sales growth rate forecasting cases.The model is extensible, being able to combine the knowledge discovered by SVM and the industry knowledge.
(2) We proposed the IPSO to optimize the kernel function parameters for classification and regression.
During each iteration, the inertia weight is changed dynamically based on the current evolution speed factor and aggregation degree factor, which provides the algorithm with effective dynamic adaptability.It has a better performance than the standard PSO in searching global optimum while resolving conflict between convergence and the global search for improved forecasting accuracy.
(3) We first classified the sample data to decide the types of the testing samples, and then the values of the testing samples were predicted using the SVM regression algorithm.This limits the forecast samples in the same type range, reducing the forecasting range, and enhancing the forecast accuracy.
(4) After the classification, the range of the samples is narrowed down and the trend of forecast is obtained based on these narrowed samples.Samples of the same types resulting from classification contain similar trends and help the model make full use of the data trend.Such capability is not present in the regression without preclassification.
The rest of this paper is organized as follows.Section 2 introduces the regression and classification theory of SVM and PSO algorithm while emphasizing on the analysis of PSO in detail.IPSO is introduced in Section 3 to overcome the premature convergence and the local optimum of PSO.The regression method based on SVM classification optimized by IPSO is presented in Section 4. In Section 5, firstly, we verify the validity of the proposed IPSO algorithm based on three classic benchmark functions; secondly, the IPSO is used to the regression model based on SVM classification and compared with the other three models.The results indicate that the IPSO has far superior performance to PSO in global optimization and convergence speed, and the proposed model has better performance than the other three models.Finally, the conclusions and future research suggestions are highlighted in Section 6.

The Regression and Classification Theory of Support Vector
Machine (SVM).Support vector machinewas originally used for classification but its principle was extended to the task of regression and forecast as well.The SVM classification model can be described as follows.
Let the training data set be {(  ,   )}, where   ∈   are the input values and   ∈ {−1, +1},  = 1, 2, . . ., , are the forecasting values; the generalized linear SVM finds an optimal separating hyperplane: by solving the following optimization problem: min ,, This optimization model can be solved by introducing the Lagrange multipliers   for its dual optimization model.After the optimal solution  *  is obtained, the optimal hyperplane parameters  * and  * can be determined, and the separating hyperplane can be described as follows: For nonlinear classification, assume that there is a transform:  :   → ,   → (), making (,   ) = ()⋅(  ), where (⋅) denotes inner product operation.According to the functional theory, as long as a kernel function meets the Mercer condition, then it corresponds to the inner product of a transform space [22].Therefore, the nonlinear classification function can be determined as follows: The SVM regression model can be described as follows.
Let the training set be {( 1 ,  1 ), . . ., (  ,   )} ∈   × , where   is the input vector,   is the output value, and  is the total number of the data points.
The function (  ) is represented by using a linear function in the feature space: where  is the weight vector,   is the input vector, and  is the threshold.In addition, the coefficients  and  are estimated by the following linear optimization problem: min where   and  *  are nonnegative slack variables which measure the deviation (  − (  )),  is punishment coefficient, and  is insensitive loss function. * guarantees the satisfaction of constraint condition;  controls the equilibrium between the complexity of model and training error;  is a preset constant that controls tube size.
For nonlinear regression, introduction of kernel function and then nonlinear regression function can be determined: The kernel function (,   ) typically has the following alternatives.
(1) Linear kernel function: (2) Polynomial kernel of degree: (3) Gauss kernel: (4) Sigmoid kernel function: where  is the kernel parameter, which denotes the width of Gauss kernel function and affects the complexity of the sample data distribution in the high space.
In this paper, we choose the Gauss kernel function as the kernel function of the classification model and the regression model.The objective functions are formulas (4) and (7), and the corresponding restrictions are described as above.In the classification model, the samples   are dependent variables and   are decision variables, representing the forecast type label.In regression model, the samples   are dependent variables and   are decision variables, representing the forecast values.In order to improve the performance of the classification model and the regression model, IPSO is used to optimize the parameters (, ).For convenience, we use (, ) to denote (, ).

Particle Swarm Optimization (PSO)
. PSO is a metaheuristic based on evolutionary computation, which was developed by Kennedy and Eberhart [23].As described by Eberhart and Kennedy, the PSO algorithm is an adaptive algorithm based on a social-psychological metaphor; a population of individuals (referred as particles) adapts by returning stochastically toward previously successful regions [24,25].Below we provide a brief outline on the working of PSO.
In PSO, the swarm consists of  particles;  ∈  * represents the number of particles; each particle has a position vector   = ( 1 ,  2 , . . .,   ) and a velocity vector   = (V 1 , V 2 , . . ., V  ) [26,27], where  = 1, 2, . . ., .Particles representing a potential problem solution move through a dimensional search space.During each generation, each particle is accelerated toward the particles previous best position and the global best position.Where the best previously visited position of the particle  is denoted by   = ( 1 ,  2 , . . .,   ), the best previously visited position of the swarm is denoted by   = ( 1 ,  2 , . . .,   ).The new velocity value is then used to calculate the next position of the particle in the search space.This process will keep the iteration until setting the maximum number of iterations or a minimum error is achieved.The updating of velocity and particle position can be obtained by using the following formulas: where  = 1, 2, . . ., ,  = 1, 2, . . ., ;  denotes the inertial weight coefficient;  1 ,  2 are learning factors;   1 and   2 are positive random number in the range under normal distribution;  denotes the th iteration;    is the position of the particle  in -dimensional space; V  ∈ [V max , V min ] denotes the velocity of a particle  in -dimensional space;  +1  represents the position of the particle  at the  + 1th iteration; and V +1  denotes the movement vector of particle  at the  + 1th iteration.Moreover, in formula (12), the first term  × V   denotes the particle's inertia; the second term Mathematical Problems in Engineering model, and the third term  2 ×   2 × (   −    ) stands for the particle's social-only model.
More specifically, the training procedure for the PSO algorithm is briefly described as follows.
Step 1. Initialize all particles; initialize parameters of the PSO algorithm including the velocity   = (V 1 , V 2 , . . ., V  ) and position   = ( 1 ,  2 , . . .,   ) of each particle.Set the acceleration coefficient  1 and  2 , particle dimension, and the fitness threshold Acc.  1 and   2 are the two random numbers with the range from 0 to 1.
Step 2. Calculate the fitness values of all particles and store the   and   at the current iteration.
Step 3. If the number of iterations is terminated or the accuracy is satisfied, then output   and   positions, and the algorithm terminates.Otherwise, go to Step 4.
Step 4. For each particle, compare the current position and individual optimum, if better, then update.For each particle, compare the current position and the global optimum, if better, then update.
Step 5. Calculate the velocity vectors in formula ( 12) for all particles.
Step 6. Modify the positions of all particles utilizing formula (13) and then go to Step 2.
In PSO, inertia weight  is employed to control the impact of the previous history of velocities on the current velocity.A larger scale contributes to searching for the global optimal solution in an expansive area, fast convergence, but its precision is not good because of the rough search.The smaller scale improves the precision of the optimal solution, but the algorithm may be trapped in a local optimization.So, the balance between exploration and exploitation in PSO is dictated by .Thus, proper control of the  is very important to search the optimal solution accurately and efficiently.To balance the global exploration and local exploration capability, some researchers adopt linearly decreasing inertia weight [28,29].Typically,  line () is reduced linearly with each iteration, from  start to  end .It can be described as follows: where  is the current iteration number,  max is the maximum number of iteration,  start is the maximum value of inertia weight, and  end is the minimum value of inertia weight.

Evolution Speed-Aggregation Degree Strategy.
Analyzing the linearly decreasing inertia weight, it can be found that there are some problems in this method.Firstly, if better solution is detected during early evolution, the chances increase for quick convergence to the optimal solution; however, the linearly decreasing inertia weight makes the algorithm converge slowly.Secondly, in the later evolution, with  decrease, it results in decline in global search capability of the algorithm, weakened diversity, and easily trapping into local optimum.In order to overcome the deficiencies of the linear weight, this paper adopts a nonlinearly descending inertia weight of PSO to balance the global and local exploration capability.
Let (   ) be the th generation best global position corresponding to the fitness function value and ( −1  ) the  − 1th generation best global position corresponding to the fitness function value.
Definition 1 (evolution speed ).Consider where min() represents the minimum value function and max() represents the maximum value function.
According to the above assumptions and definition, it can be found that 0 <  ≤ 1.The parameter not only considers the algorithm iteration history, but also reflects the evolution of the particle swarm speed; that is, if the value of  is smaller, then the evolution speed isfaster.After a certain number of iterations, the value of  remains 1, which determines that the algorithm has found the optimal solution.
Whether PSO algorithm is premature convergence or global convergence, the particles of the swarm will appear to be in "gather" phenomenon.This means that either all particles gathered at a particular position or gathered in a few specific positions.Therefore, another factor affecting the performance of the algorithm is the aggregation degree of the particles.
In iteration process, the best global position corresponding to the fitness function value (   ) is always better than the current best position corresponding to the fitness function value (   ) of each particle, because the current best position corresponding to the fitness function value will compare with the best global position corresponding to the fitness function value in each iteration and update the best global position if the current position is better.Particularly, if the current best position corresponding to the fitness function value is equal to the best global position corresponding to the fitness function value, then we consider the (   ) is better than (   ), and the best global position does not need to be updated.Let (   ) be the th generation best global position corresponding to the fitness function value, and the th generation average fitness function value is described as follows: Definition 2 (aggregation degree ).In our case the smaller the value of the fitness function is the better it gets.The aggregation degree is defined as follows: Obviously, 0 <  ≤ 1, which reflects the current level of aggregation of all particles and, to some extent, also reflects the diversity of the particles.Compared to the smaller value of , for larger value of , the aggregation degree of the swarm is higher, and the particle variability is lower.In particular, when  = 1, all particles of the swarm are identical in properties.But if the algorithm falls into the local optimum, in this case, it will not be easy for the swarm to escape the local extreme point.
Based on the above discussion, we can obtain the nonlinearly inertia weight expression: where  is the weight of evolution speed and  is the weight of aggregation degree.

Position-Extreme Strategy.
To make the algorithm escape the local optimum, we set the judgment condition, which is to change the global optimal values in the evolution process.
If the global optimal value does not improve in  consecutive iterations, that is,  > limit, then we consider the algorithm trapped into local optima.In such a case, the search strategy of the particles will change so that the particles escape from local optimum and start exploring new positions.When the particles get into a new local optimum, the algorithm will choose the smaller local optimum from before and after two local optimum values, based on the smaller fitness function value priority principle, and then enter into the next update.This means that the particle's current local optimum value will be compared with the previously obtained local optimum value during each iteration, to obtain a new local optimum value.The corresponding update equations are described as follows: where rand() is a random function and rand(0, 1) is a random number between 0 and 1.

The Improved PSO Algorithm.
According to the strategy mentioned above, the improved PSO algorithm can be summarized as follows.
Step 1 (initialize IPSO).Initialize all particles; initialize parameters of IPSO algorithm including the velocity   = (V 1 , V 2 , . . ., V  ) and position   = ( 1 ,  2 , . . .,   ) of each particle.Set the acceleration coefficient  1 and  2 , particle dimension, the maximum number of iterations  max , the maximum number of consecutive times limit, the weight of evolution speed , the weight of aggregation degree , the maximum value of inertia weight  start , the minimum value of inertia weight  end , and the fitness threshold Acc.  1 and   2 are the two random numbers ranging between 0 and 1.  is the current number of iterations.
Step 3 (define and evaluate fitness function).For classification problems, Acc is defined as classification accuracy; that is,

Acc =
The number of correctly classified samples The total number of samples .
For the regression problems, Acc is defined as regression error (RMSE); that is, where  is the number of the samples,   are the original values, and ŷ are the forecasting values.
Step 4 (update velocity and position of each particle).Search for the better kernel parameters according to formulas (12) and (13).And the inertia weight is changed dynamically based on the current evolution speed factor and aggregation degree factor, which is formulated as formula (18).
Step 6 (check stop condition).If  >  max or Fitness function value < Acc, then stop the iteration and   is the optimal solution which represents the best parameters for SVM.Otherwise, go to Step 7.
Step 7 (judge the global optimum vale unchanged in consecutive  times).If  > limit, then go to Step 8; otherwise, go to Step 3.
Step 8 (updated the position according to the new position formulas ( 19)).In this paper, the parameter combination (, ) is the optimized object, which is taken as the input of IPSO.When the stop condition is met, IPSO will output the optimal parameter combination.

Sales Growth Rate Forecasting Model Based on SVM Classification Optimized by IPSO
Sales growth rate forecasting is a time series forecasting problem.The future sales growth rates are predicted based on the historical sales data.Firstly, the experimental data should be preprocessed to improve the efficiency and the precision of forecasting model.These include the selection of attributes and data normalization.Secondly, establish the classification  model by training the sample data and obtain type label of the sample data.Then, construct the sales growth rate forecasting model, and train it on the sample data with the same type label.Thirdly, evaluate the model using root mean square error (RMSE) to validate the forecasting performance of the "regression method based on SVM classification" by using the sample data, which is shown in Figure 1.

The Preprocessing of Sales Data.
In the preprocessing phase, first, the data is normalized.The main purpose of normalization is to avoid attributes in greater numerical ranges dominate those in smaller numerical ranges.In addition, the normalization could avoid numerical difficulties during the later calculation stages.The data is normalized according to the following formula: where   are the scaled values,   are original values,  max  is the maximum value of the attribute  in the data set, and  min  is the minimum value of attribute  in the data set.Then, the training sample sets are constructed, which is expressed as follows: where  is the input vector,  is the output vector, and  is the dimension of the input vector.

Regression Method Based on SVM Classification Optimized by IPSO.
In solving the nonlinear regression problem, to make full use of the advantages of SVM classification, we adopt the "regression method based on SVM classification." As IPSO has far superior performance on global optimization and convergence speed, so IPSO is applied to determine the parameters of SVM, which is shown in Figure 1.Let the training data set  = {(  ,   ) |  = 1, 2, . . ., } and the testing data set  = {(  ,   ) |  = 1, 2, . . ., }, where   ∈   are input attributes and   are decision attributes.The basic steps of the "regression model based on SVM classification" optimized by IPSO are represented as follows.
Step 2.2.Select the kernel function, and adopt IPSO to optimize the parameters.
Step 2.3 Train the normalized data and then get the SVM classification model.
Step 3. Using this classification model we can obtain the type label of the testing samples.Classify the testing samples and get the type label  of each sample (  ,   ).
Step 4. For   ∈ type  ∧ (  ,   ) ∈ type ,   is training set; adopt SVM regression algorithm to forecast   value of each testing samples.
Step 4.2.Select the kernel function, and adopt IPSO algorithm to optimize the parameters.
Step 4.3.Train the normalized training set, and establish SVM regression model.
Step 4. 4. Utilize the training model to forecast   value of each testing samples.

Experimental Analysis
To evaluate the performance of the proposed IPSO algorithm and the regression method based on SVM classification, we conduct two numerical experiments.Firstly, experiment 1 validates the proposed IPSO algorithm based on three classic benchmark functions.Secondly, we verify the effectiveness and feasibility of the regression method based on SVM classification in experiment 2. We use the sales growth rate data set for this purpose.In experiment 2, Gaussian kernel function was selected as kernel function of SVM.

Experiment 1 (IPSO versus PSO)
5.1.1.The Classic Benchmark Functions.In order to compare the performance of IPSO and standard PSO, three classic benchmark functions are considered in our experiment, namely, Sphere function, Rosenbrock function, and Rastrigin function.The selection of these functions is based on the need of having slightly diverse functions to avoid any bias in selection.Sphere function is a unimodal quadratic function; Rosenbrock function is a unimodal function, which is difficult to minimize; and Rastrigin function is a multimodal function having a large number of local optimum.Three classic benchmark functions are detailed as follows.
According to the characteristics of the functions, we know that  1 () and  2 () are unimodal functions; there is only one optimum in their domain, which is used to test the optimization precision and execution performance of the IPSO algorithm;  3 () is a multimodal function, there are many local optima in its domain, which is used to test the global search ability and the ability to avoid premature of the IPSO algorithm.

Experiment 1 (Comparative Analysis of Algorithm Performance).
In our first experiment, the classic benchmark functions are set to the fitness function of the particles.To eliminate causal factors, each function optimization experiments are run 10 times, and finally calculate the average.The algorithm parameters are set as follows: size of particle  = 20, the maximum number of iterations  max = 300, the initial inertia weight  start = 0.9, the learning factor  1 and  2 = 1.5, the maximum number of the optimal value consecutive same times limit = 145, the evolution speed factor  = 0.6, and the aggregation degree factor  = 0.05.In iteration process, the inertia weight is adaptively adjusted depending on the fitness function values.The termination condition of iterations is when the fitness function value achieves convergence condition or reaches the maximum number of iteration.
Table 1 shows that, for all functions, IPSO algorithm optimization results significantly better than standard PSO algorithm, and the average iteration time is significantly reduced; that is, IPSO algorithm can significantly improve the convergence speed of the particles.We observed that for the unimodal functions, the standard PSO algorithm can also get the theoretical optimum, but as a whole, the robustness of the algorithm is poor.
Figures 2, 3, and 4 demonstrate the above experiments.For Figure 2, in order for better comparison between IPSO and PSO, the horizontal axis uses the log-scale, and the maximum number of iterations is also set at 300.From the figures we can know that when solving  1 (), the performance of the two algorithms is similar, and both of them can converge to the global optimum, but IPSO algorithm converges faster, requires less iteration, and has higher efficiency; when solving  2 (), two algorithms can converge to the global optimum, but IPSO algorithm has an obvious advantage in convergence rate; when solving  3 (), PSO algorithm traps into local optimum and is difficult to find the global minimum point.But IPSO algorithm can converge to the global optimum in a short time and has the strong optimization capability.Overall, IPSO outperformed the traditional PSO algorithm on the selected functions, providing basis to be used for the optimization of SVM parameters.
We further analyze the solution of the function  3 () due to its function characteristics.The PSO algorithm easily falls The number of iterations   into local optimum, resulting in slow convergence speed and even stagnation.From Figure 4, we can find that the PSO algorithm traps into the local optimum when the number of the function evolution generations approximately equals to 25 and appears stagnation phenomenon.However, IPSO algorithm can jump out of the local extreme point and quickly find the optimal solution.Because of the adoption of the evolution speed factor and the aggregation degree factor, the inertia weight  can adaptively adjust according to the actual situation of the PSO iteration, resulting in improved search capability and convergence speed of the algorithm.In addition, the position-extreme strategy can avoid the algorithm plunging into local optimum.Therefore, the evolution speed factor, the aggregation degree factor, and the position-extreme strategy can effectively improve the performance of PSO algorithm.

Experiment 2 (Validate the Effectiveness and Feasibility of the "Regression Method Based on SVM Classification"
Optimized by IPSO)

Data Set.
To study the relationship between the advertising investment and the sales growth rate, we chose the advertising investment and the sales growth rate historical data.We then applied the regression model to forecast the trend of the sales growth rate.The forecasting dates are from 2012(Q1) to 2012(Q4), with 4 groups of data.As we provide short term forecasting, the data far from forecasting date provide less useful information to forecasting value; therefore, instead, we select 16 groups of data from 2008(Q1) to 2011(Q4) as input to construct and train forecast model.Then, we carry out forecasting and compare the results with actual data.The sales growth rate is subject to many influences, such as TV advertising, Loushu, radio advertising (RA), folding, poster, leaflets, direct mail (DM), newspaper advertisements, and panels,, and there may exist correlation between these attributes.So, based on needs and attribute correlation analysis, we select the input attributes as follows: Loushu, folding, poster, leaflets, DM, newspaper advertisements, panels, walled packaging, sales offices packaging, slogan, SMS, and TV advertising.The sales growth rate is decision attribute.The data of the sales growth rate for modeling is shown in Figure 5.
As Figure 5 illustrates, the sales growth rate changes with the seasons and its change extent is large.So, the preclassification according to the sales growth rate and later regression can improve the forecasting accuracy, because the classification narrowing the range of the sample data and the regression is based on similar homogenous samples.In the following sections, we will build the "regression method based on SVM classification" to forecast the unknown trends.

Establishing the Classification Model.
According to the requirement analysis, the data from 2008(Q1) to 2011(Q4) are adopted as the training data; and the data from 2012(Q1) to 2012(Q4) are adopted as the testing data.The test samples are divided into three types in accordance with the sales growth rate and the industry knowledge: type I, low growth: 0 ≤  < 30.0%; type II, normal growth: 30.0%≤  < 60.0%; type III, high growth:  ≥ 60.0%, which are described in Table 2.
In Table 2, the first column represents the time when the statistics were collected; for example, 2008(Q1) represents the first quarter of 2008; the second column represents the original types corresponding to statistical time; the third column represents the forecast types, which are obtained by using the SVM classification model.The normal type indicates the results of correct classification.The bold type indicates the results of misclassification.
From Table 2, it can be seen clearly that the overall forecasting accuracy of the sample is 90%, that is, 2 misjudgments out of 20, where the accuracy of the training samples is 93.75% and the accuracy of the testing samples is 75%.
For forecasting the sales growth rate, Gaussian kernel function is chosen as the kernel function in classification and regression stages.First of all, we construct the classification model and train the model on the training set and build the classification model with the minimum classification error of cross-validation (CV).Secondly, we evaluate the accomplished performance on classification model by testing the trained model over the test data set.The classification results of the testing data set are shown in Table 2 and Figure 6.Thirdly, we build the regression model based on preclassification and analyzed it.Finally, we assess the validity of the regression results.Figure 6 shows the adoption of the SVM classification model to forecast the type labels from 2012(Q1) to 2012(Q4).The "red box" represents the forecasting classification label, and the "blue star" represents the actual classification label.If the forecasting labels of the classification agree with the actual classification labels, then we obtain the correct results of the classification.From Figure 6, it can be seen that for the four data classifications, one is misclassification, which corresponds to the "red box, " depicting the absence of overlap between accurate and misclassified data.
In training stage, we adopt classification accuracy rate as the fitness function.Besides, as cross validation (CV) is the preferred procedure in testing the out-of-sample classification capability when the dataset size is small [30], so CV is adopted in this paper to avoid experimental bias.

Comparison with Other Models.
On the basis of the classification, we set up the SVM regression model (i.e., the "regression model based on classification") and perform regression analysis using the sample data.In regression based on preclassification, IPSO is adopted to optimize the kernel parameters during two phases, initially during the classification and later during regression analysis.The parameters optimization results by IPSO are shown in Figure 7. From Figure 7, it can be seen clearly that after 100 iterations, IPSO obtains the optimal parameter combination ( = 2.9439,   = 14.8097), and also seen the best fitness value and the average fitness value of particles in each iteration.
Additionally, we also use genetic algorithms (GA) and grid search (GS) for comparison purpose on the same sample data against the results obtained by IPSO in selecting the optimal SVM parameters.The optimal parameters optimization result by GA is shown in Figure 8.The figure shows that after the iterations count reaches 100, GA obtains the optimal parameter combination ( = 7.1426,  = 2.1765).At the same time, we also see the best fitness value and the average fitness value of particles in each iteration.The optimal parameters optimization result by GS is shown in Figure 9. Figure 9 shows that after 100 iterations GS obtains the optimal parameter combination ( = 64,  = 0.17678).Meanwhile, we can observe that the fitness value of the particles keeps on changing under different parameter combinations.
In these experiments, we adopt the -CV in PSO, GA, and GS as tuning parameters.Different vales of  correspond  3.In Table 3, the bold values indicate the best fitness compared to the other two methods with the same value of .For example, when  = 2, that is, 2-CV, the RMSE of IPSO is 1.25, the RMSE of GA is 3.92, and the RMSE of GS is 4.63.In particular, for PSO algorithm, when  = 4, the algorithm gets the minimum error in parameter optimization; for GA algorithm, when  = 5, the algorithm obtains the minimum error in parameter optimization; for GS algorithm, on the whole, the error is larger than the former two methods, because there is no attribute selection.Last row of Table 3 reports the average values of the fitness for  = 2, 3, 4, 5.It can be seen that IPSO has the minimum average fitness value, that is, generally, using the IPSO to get the best fitness value as compared to the other two methods for the given data set.
Table 3 and Figures 7-9 show the performance of three approaches in determining the optimal parameters.It is concluded that IPSO is superior to the other two approaches in terms of overall fitness.To validate our proposed model and provide a comparison, in this study, we have also used three other forecasting models, namely, (SVM) direct regression model, multiple linear regression model, and BP neural network model.The forecasting values and RMSE among the regression model based on classification, (SVM) direct regression model, BP neural network, and multiple linear regression are shown in Figures 10 and 11. Figure 10 shows the results of using four models forecasting the sales growth rate from 2012(Q1) to 2012(Q4).Figure 11 illustrates the error between predicted and actual values and the average error.The figures show that the regression model based on classification has outperformed its competitors in forecasting sales growth rate.

Analysis and Discussion
. We proposed the forecasting technique of preclassification and later regression based on SVM model.The above experimental results demonstrate that the proposed method is effective and feasible generally and outstanding in small samples regression and short-term prediction of the time series.The general findings are as follows.
(1) SVM maps input vectors nonlinearly into a highdimensional feature space and construct the optimum separating hyperplane to realize the classification.Exploiting this characteristic, the "regression model based on SVM classification" can obtain higher forecast accuracy, even though the classification exist  error, because SVM has higher classification accuracy, and the forecast is defined on the samples with the same class label, which have the same or similar change trend.
(2) According to the change characteristics of the sample data, the actual requirements and industry knowledge, the sample cluster was classified first to decide the types of the test samples, which are divided into three types: type I: low growth; type II: normal growth; type III: high growth.This method limits the forecast samples in the same type range and makes full use of the samples' change trend, reducing the forecasting range (or number), which makes the forecast accuracy higher.
(3) After the classification, the range (or number) of the samples is narrowed, and the overall trend of forecast is weaker than the overall regression.So, in relation to the time series medium-term or longterm prediction, this method has some limitations.In order to compensate for this shortcoming, add new training samples according to the time series, to improve the gradient and the time-effectiveness of training samples.

Conclusions
In this study, the forecasting technique of preclassification and later regression is proposed, based on the classification of SVM for sales growth rate forecasting.We propose IPSO to optimize the parameters of SVM.We observed that the inertia weight  has a great impact on the convergence speed and accuracy of the PSO algorithm.As the linearly weight decreasing strategy does not make good use of the particles' iteration and cause disadvantages, so we propose IPSO algorithm for performance improvement.We introduce two factors in this algorithm; the evolution speed factor and the aggregation degree factor to balance the global search and parameters Updated the position according to the new position formulation according to the formulation; inertia

Figure 1 :
Figure 1: Regression model based on SVM classification optimized by IPSO.

Figure 5 :
Figure 5: The sales growth rate (%) in the quarter of a year from 2008 to 2012.

Figure 6 :
Figure 6: The classification results of testing data set.

(Figure 7 :
Figure 7: The fitness of selecting the optimal parameters by IPSO.

Figure 8 :FitnessFigure 9 :
Figure 8: The fitness of selecting the optimal parameters by GA.

Figure 10 :
Figure 10: The forecasting results of the growth rate by four kinds of models.

Figure 11 :
Figure 11: Comparison of absolute error forecasting among four kinds of models.

Table 1 :
Comparison of IPSO and PSO on the three classic benchmark functions.

Table 2 :
Classification results of the sample data.