Self-Adaptive Prediction of Cloud Resource Demands Using Ensemble Model and Subtractive-Fuzzy Clustering Based Fuzzy Neural Network

In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands.


Introduction
In cloud computing [1,2], high accurate and efficient resource provisioning is an important aspect for maximizing the utility. In IaaS mode cloud computing [3], resources are allocated in the form of virtual machines which are composed of virtual hardware virtualized by hypervisor [4]. Users send requests to cloud center and try to obtain the resources in accordance with their demands. However, before cloud center provisioning these resources for users, some time may be needed to prepare and initialize the instance, i.e. the VMs. On the other hand, when the VM is running, resources dynamic adjusting is also needed to guarantee the QoS (quality of service). Some schemes do not consider the customer driven management, where resources have to be dynamically rearranged based on customers' demands [5]. The rearrangement of resources cannot take effect instantly and some time is needed, which leads to insufficiently providing the elastic management of resources [6]. Moreover, if the resources are not allocated properly, the performance of VMs may be restricted, or resources may be idle and wasted. This severely decreases the utility and meanwhile the QoS of cloud computing.
It stands to reason that resource provisioning in the cloud environment is influenced directly by performance predictions [7]. In order to know how to allocate resources beforehand, it is important to characterize users' demands and preferences accurately. To make an accurate prediction, this paper analyses the main factors that affect the prediction performance and proposes a prediction method that proves to be more accurate and effective.
The key contributions of this paper are listed as follows: (1) We make analysis of the characters of user demands and preferences. Corresponding models and solutions are researched. (2) Self-adaptive cloud resource demands prediction algorithm using fuzzy neural network is proposed. Besides historical data, the base predictors' output results are adopted by FNN with different weight. (3) Fuzzy c-means combined with subtractive clustering (i.e., subtractive-fuzzy clustering) algorithm is 2 Computational Intelligence and Neuroscience adopted to optimize the convergence features and learning speed.
(4) The learning algorithm of fuzzy neural network is optimized with self-adjusting learning rate and momentum weight, which improves the robustness and the real-time performance.
(5) To evaluate the prediction algorithm, some statistic indexes are introduced to compare with other algorithms, MSE (mean squared error), MAE (mean absolute error), and PRED( ).

Cloud Resource Demands Prediction.
Researches on resource demand prediction are mainly focused on how to save energy [8], improve performance [7,9], increase profit [5,10,11], and so on. To optimize resource management and task scheduling, Ramezani et al. [12] introduce a prediction method of VM workload patterns and VM migration time using fuzzy expert system. However, only a simple prediction model is depicted and the details are not explicated. By contrast, a fuzzy prediction method is given in [13] to model the uncertain workload and the vague availability of virtualized server nodes, using type-1 and type-2 fuzzy logic systems. Adopting fuzzy algorithm, the performance of prediction method is more robust but the accuracy is decreased. This method needs to be combined with another prediction method to realize high performance. There are some methods predicting resource demands based on duration time. Reference [14] proposes an approach for long-term trend prediction using moving average method. To control jitter in a small range, it further improves the conventional moving averages method using standard deviations. This method mainly aims at long-term prediction and the short-term prediction is not mentioned.
To balance the performance and the system cost, some researches make efforts to maximize the system utility. In [15], fast up and slow down algorithm is introduced to maximize the performance while maintaining the stability.
As workload has an obvious nonlinear feature, many machine-learning algorithms have also been used to support its prediction. Neural network (NN) is introduced as a prediction method [16][17][18][19]. Combining with the typical predicting methods such as sliding window method [20], auto regression model [21], and exponential smoothing model [22], they have worked well in predicting the workload. To improve the inference ability, fuzzy neural network is introduced for predicting.

Fuzzy Neural Network.
Fuzzy neural network is a combination of a fuzzy logic system and a neural network. It keeps the merit of each [23]. The algorithm is widely used in various applications such as pattern recognition, prediction, and system control [24][25][26]. In prediction area, it has been discovered to be more accurate than other conventional or soft computing techniques. In [27], neurofuzzy and neural network techniques are adopted to forecast the sea level in Darwin Harbor. It is proved that adaptive neurofuzzy inference system (ANFIS) is more effective in predicting than autoregressive moving average (ARMA) algorithm. References [28,29] introduce FNN approach into energy consumption demands predicting. Results of [28] even show that the hybrid ANFIS model has better performance than ANN in terms of prediction accuracy. Some works [30,31] use FNN algorithm in hydrological time series prediction. In the machine condition maintenance area, FNN algorithm is used to predict condition of the machine or the components [32,33].
According to the literature reviewed above, FNN is used in many areas for predicting and the performance is fine. Unfortunately, in cloud resource demands predicting area, few researches have adopted FNN as predicting method. In this paper, we adopt FNN with self-adjusting learning rate and momentum weight as the core of the prediction system. To improve the performance, we use ensemble model and clustering algorithm to optimize FNN prediction system. Before the data is sent to the predictor, some preparation work needs to be done.

System
Overview. Before prediction of user demands, we firstly analyze the user requests, including the utilization data structure, content, and number of historical resources. By analyzing the historical data, we may draw conclusions about user preference, demand description, and so forth. The shortterm versus long-term and the fluctuation period versus flat period prediction are separately treated. Fluctuationthreshold (th ) and flat-threshold (th ) are defined to distinguish the flat period and fluctuation period. In different periods, different base prediction methods are adopted according to the characters, such as second moving average model (SMA), exponential moving method (EMA), autoregression model (ARM), and trend seasonality model (TSM). The output of the base predictors is sent to the fuzzy neural network as input. Fuzzy neural network uses the historical data and the base prediction value as training data, which improves the accuracy of the results. The output of fuzzy neural network is used to instruct the resource allocation in IaaS cloud center. Prediction results and the actual resource demands are evaluated using statistical analysis and different criteria. The evaluation results are fed back to the historical database to improve the prediction performance. The overview of resource demands prediction system is depicted in Figure 1.

Overall Situations of User Demands.
As depicted in Figure 1, there are some opposite circumstances to be concerned, such as long-term demands and short-term demands and fluctuation period and nonfluctuation period. Fluctuation period is an abnormally violent vibration on a cloud resource over a period of time.
In user long-term resources requirements, there are some characters, as we summarize in the following: (1) the regularity is more obvious than in short-term. It is understandable that the long-term users may show some repetition regularity;  Figure 1: The overview of the prediction system.

Window size
Prediction time Prediction value interval (2) in the long running of the system, there may be some fluctuation periods and some flat periods. Coming to shortterm requirements, the regularity may be not much obvious, and the fluctuation feature is more noticeable. Therefore, the long-term or short-term may be not conflicted with fluctuation or flat periods. For long-term data, we can on the one hand summarize the regularity. On the other hand, the flat and fluctuant periods should be distinguished. For short-term data, the regularity is not easy to figure out and the fluctuation should be processed. The prediction speed should be ensured as the short-term resources provision gives first place to quick response than other performance. The difference between short-term and long-term processing mainly lies in the fluctuation period processing. Hereby we discuss the fluctuation period and flat period, respectively.

Flat Period Procession.
Based on the characters of flat period, the second moving average (SMA) [34] is adopted, it can effectively reduce the lag deviation between prediction value and actual value. In this method, we define a sliding window whose input size is , that is,  The predicted output value + after a prediction time interval is the dependent variable set , where = [ 1 , 2 , . . . , ]. Then the relationship can be abstracted as follows: The th user's resource requirement prediction at time + can be expressed as follows: In function (2), + ( ) is the prediction value in time + , is the time sequence number to be predicted. ( ) and ( ) satisfy the following constraint equation: In formula (3), (1) ( ) and (2) ( ) separately represent the first moving average value and the second moving average value at the th time of the resource the th user requested.
is the time period of moving average period. In addition, (1) ( ) and (2) ( ) can be expressed as follows: Then the total number of the resources requested by all the users is From the analysis above, we can see that the prediction value at the time + is decided only by the values of the periods' values at the time and the total number of the users. And the prediction result can be calculated at time .

Fluctuation Period Procession.
Exponential moving method (EMA) is an effective method for short-term prediction and particularly suitable for time series prediction of the nonseasonal effect owing to its quick responsiveness and weight decreases with time passed. Predicted values are calculated using smoothing constant . The exponential moving average is expressed as follows: In (7), ( , ) is the moving average value between the past time − ( − 1) and the current time . Then the time interval is . is the smoothing constant that can be calculated by = 2/( + 1). We can see that confines to [0, 1].
The EMA method gives a higher weight to the later measure value and lower weight to the earlier measure value. So the EMA method is able to response rather quickly to the fluctuations in a short-term demand and workload conditions [35]. However, there will be some delay introduced as the window size increases. Based on Andreolini and Casolari [36], in nonlinear load trackers, the polynomial orders should be properly selected. If the order is low (degree ≤ 2), then the function will not react quickly enough to load changes. If the order is high (degree ≥ 4), the function will be unnecessarily complex and some undesirable sparks will be introduced. The cost may be too expensive for a run-time context.

The Identification of Flat Period and Fluctuation Period.
Though we give different prediction methods according to the wiggle levels, it is difficult to know or identify the boundary of different wiggle levels in the overall situation. In this section, we define "fluctuation-threshold (th )" and "flat-threshold (th )" to distinguish the fluctuation and flat periods. Fluctuation-threshold is defined as the upper limit of the degree of vibration demand on cloud resource, while flat-threshold is the lower limit. In the last period of time, if the difference of prediction value and −1 in series { , −1 , . . . , −( −1) } is greater than a certain value , then the th is reached. If the difference of and −1 is less than a certain value , then the th is reached. For resource type , the demands are experiencing a fluctuation period if the demand data in last time satisfies the condition Deg ≥ th , where Deg means the fluctuation degree of the prediction trend. th is the upper limit value. Type resource demands are experiencing a flat period if the demand data in last time satisfies Deg ≤ th . If th ≤ Deg ≤ th , the demands of resource are intervenient flatness and fluctuation. The above procedure can be illustrated by Pseudocode 1.

Resource Demands Prediction Using Optimized Fuzzy Neural Network
In order to predict the resource demands accurately and effectively, a prediction method with different individual base prediction models ensemble and fuzzy neural network is proposed in this section. With different base prediction models, different demands occasions can be estimated accordingly, and the most likely future outcome is able to be predicted. With the results of base predictors, the fuzzy neural network tends to present better predicting performance. To improve the learning ability, we use self-adjusting learning rate and Inputs · · · · · · · · · · · · momentum weight to optimize the learning procedure. Clustering algorithm is adopted to initialize the fuzzy inference rules of the FNN. The introduction of fuzzy neural network promises the robustness and accuracy of the prediction system.
The core of the prediction model adopts a two-level structure, as shown in Figure 3. The first level is an ensemble model that contains different base predictor models. The output of the first level is sent to the second level, fuzzy neural network, which is responsible for optimizing the precision and the robustness of the prediction results.

Base Prediction Models.
As we know, diversity is necessary for the survival and evolution of species ensemble model. So as for the performance of the prediction models, it is important to introduce the diversity to the prediction ensemble model. To guarantee the prediction performance; the base prediction models should be firstly selected. Besides the prediction models mentioned in Section 3, some other models are introduced. The guideline of choosing is based on the capacity and overheads.

Autoregression Model. Autoregression model (ARM) is
one of the linear models that are used for estimating the relationships between one dependent output variable and one or more independent variables . It represents how the dependent value changes along with each independent variable changing. The fundamentals of the method are to treat the historical measurement data as a stochastic process which can be treated as a white noise driven filter. It is proved effective for predicting host load. The form of an AR model is where is white noise signal that contains all the unpredictable information in the past.

Fuzzy Neural Network.
In fuzzy neural network, we use neural network to evolve the fuzzy inference rules. We consider a multi-input single-output (MISO) fuzzy model which consists of rules. The th if-then rules of fuzzy inference system can be expressed as follows.
Rule . If 1 is ( ) 1 and 2 is ( ) 2 and . . . is ( ) , then is ( ) . In the rules, , ( ) , and are the linguistic labels (e.g., high or low) associated with the node functions. The "if part" (antecedent) is fuzzy in nature, while the "then part" (consequent) is a crisp function of an antecedent variable. Layer 5 · · · · · · · · · · · · · · · · · · · · · · · ·  Layer 1 is known as the input layer. Nodes in this layer transmit the input data to the next layer directly. Input data is abstracted as = { 1 , 2 , . . . , }. The relationship between input and output can be expressed as follows: Layer 2 is known as the fuzzification layer. In this layer, every node performs the calculation of a Gaussian membership function and specifies the degree to which the given input satisfies the quantifier . Consider = ( ) = exp (net (2) ) .
Here, and are the center and the width of the Gaussian function of the th term of the th input variable, respectively. Both and are adjustable parameters. Layer 3 is the fuzzy inference layer. The fuzzified results of the individual scalar functions of each input data are aggregated. All potential rules of the input data are formulated by applying fuzzy intersection, which means product of data.
Thus, a product operation denoted as ∏ is performed to obtain the output of this layer. Consider The output of layer 3 represents the firing strength of the rules.
Layer 4 is the normalization layer. Each node in layer 4 is labeled with which denotes "normalization. " The ratio of the th rule firing strength to the total firing strengths is calculated in this layer. The relationship between input data and output data is expressed as follows: The output of this layer is known as normalized firing strength.
Layer 5 and layer 6 are known as the defuzzification layers. In layer 5, every node in this layer is labeled with . The node function is as follows: Here is the adjustable weight parameter. Parameters in this layer are known as consequent parameters.
In layer 6, there is only one node labeled with ∑. It is used to compute the output of the fuzzy neural network. The output is the summation of all the incoming data from layer 5. Consider

The Learning Procedure of FNN.
After the structure of FNN is constructed, we use error back propagation method to adjust the parameters , , and . The objective function is defined as wherêis the instructor signal. The error propagates from the output layer back to the input layer. To clearly analyze the back propagation procedure, we combine layer 5 and layer 6 as one.
In the defuzzification layer (layer 5), the evolution of error is expressed as follows: In layer 4, the differential error of node is Computational Intelligence and Neuroscience 7 In layer 3, (3) is In layer 2, (2) is The update parameters are ( + 1) = ( ) + 1 ⋅ Δ ( ) ,

Self-Adjusting Learning Method and Momentum Weight.
In (17), the learning rate determines the convergence speed of the neuron network. If is small, the changes of synapse weight in the iterative computation procedure will be small and the locus of the weight space becomes smooth. However, the learning rate is decreased. If is too large, the learning rate will increase but the network may become unstable and may cause wobble of the weights. To optimize the convergence speed and stability of the neural network, a momentum term can be included in (17), and it is expressed as follows: where is the momentum constant. Equation (22) introduces the preceding Δ ( −1) into the procedure of calculating Δ ( ) .
The use of the momentum constant is a minor revise for refreshing the weight. However, it makes some advantages for the learning speed of the algorithm. In addition, we adopt a "progressive-increase" and "conservative-decrease" method to adjust the learning rate . If the error declines in the training procedure, we consider the modification direction to be right and a larger adjusting variable is used. If the error is becoming bigger, we regard that the modification is excessive and the adjusting step needs to be slowed down and a smaller value is assigned to variable . Meanwhile, the former modification should also be abandoned. The method is shown in the following function: The variable means the learning steps. inc and dec are, respectively, the increase factor and the decrease factor.

Fuzzy c-Means Clustering.
In FNN, the construction of fuzzy if-then rules is difficult. The improper rule set may result in bad prediction results. Recently, a number of different approaches have been used for designing fuzzy if-then rules based on clustering, gradient algorithms [37], genetic algorithms [38], fuzzy c-means clustering [39], and subtractive clustering [40]. Fuzzy clustering is an efficient technique for constructing the antecedent structures. The aim of clustering methods is to identify a certain group of data from a large data set, such that a concise representation of the behavior of the system is produced. Each cluster center can be translated into a fuzzy rule for identifying the class. In this paper, the fuzzy c-means clustering technique is used for structuring the premise part of the fuzzy system.
By analyzing the membership degree of sample data, the fuzzy c-means algorithm clusters partitions data set of different classes. Consider that there are objects = { 1 , 2 , . . . , }. Fuzzy c-means partitions them into fuzzy clusters, where confines to 1 < < . The centroids of the clusters are = { 1 , 2 , . . . , }. The form of fuzzy clustering of objects is a fuzzy rule set matrix with rows and columns, where and are the total number of data objects and the number of clusters separately.
indicates the degree of association or membership function of the th object with the th cluster. The characters of are shown in the following: The optimization objective function of FCM algorithm is In the above equation, is the exponent weight and it controls the fuzziness of the clusters and is the Euclidian distance between objective and the centroid . By 8 Computational Intelligence and Neuroscience minimizing ( , ), the centroid of the th cluster can be calculated using the following equation: The membership degree matrix can be calculated by the following equation:

Subtractive
Clustering. From the discussion above, we can see that FCM is sensitive to isolated data. As the sum of the membership degree has to be 1, the results may be not good if the sample data is not ideal. Besides, the cluster centroids of FCM are initialized stochastically. If the initial value is not properly selected, the convergence may be affected and local convergence may happen. Thus, FCM relies on initial centroids greatly. Moreover, the diversity of membership function may lower the convergence speed.
To improve the FCM algorithm, we introduce the subtractive clustering as a complement. Subtractive clustering is unsupervised clustering, in which the number of clusters for input data points is determined by the clustering algorithm. The subtractive clustering does not need to define the number of the clusters. The results may be used for initializing the centroids of FCM algorithm. It assumes that each data is a potential cluster centroid. Based on the data density index of the potential centroid data, we select the data that has the highest density as the centroid. The procedure is concluded as follows.
(1) Calculating the density index of each data The clustering radius is determined by the following equation: Data beyond the radius affects little to the density index. We firstly choose the data 1 that has the highest density index 1 as the first cluster centroid. Then the data in the radius is removed from the potential centroid data set.
(2) We use the following equation to modify the density index of each data: s.t. > . A neighborhood with lower density index is defined by the above equation. The aim is to keep one centroid away from others so that the clusters may be distinct from others. As the density index of the data that is close to the first cluster centroid 1 is much lower, the potential to be the centroid is also decreased.
(3) By calculating the density index of the remaining data, the next centroid is obtained. If the constraint equation (31) is achieved, we regard as the centroid of cluster : Here, 0 < < 1 is the constraint parameter that decides the number of the cluster centroid. Through (31), we can see that the number of clusters is inversely proportional to . Furthermore, the identification sequence of centroids is decided by density index. The higher the density index is, the earlier the centroid emerges, and the proper centroid probability becomes greater.

The Combination of Fuzzy c-Means and Subtractive
Clustering. This section introduces a clustering algorithm that combines FCM with subtractive clustering method. We obtain the cluster centroids and number through subtractive cluster. This can effectively improve the convergence speed of FCM and the probability of local convergence is decreased.
The procedure can be described as follows.
Step 1. Set the parameters, including neighborhood radius and , fuzzy exponent weight , and comparison parameter .
Step 2. Calculate the number and centroids of clusters through subtractive method.
Step 3. Use (25) and (27) to calculate the objective function and the membership degree.
Step 4. Verify if the termination constraints are achieved. If ‖ ( +1) − ( ) ‖ < or the maximum iteration is achieved, the operation process terminates. Otherwise, turn to Step 5.
Step 5. Update with + 1 and turn to Step 3. Use (25) and (27) to calculate new cluster centroids and membership degree.

Validation Criterions.
To evaluate the performance of the prediction system, we use a series of metrics [20] including MAE (mean absolute error), MSE (mean squared error), and PRED( ).

MAE and MSE.
MAE is the criterion of measuring the mean deviation between the prediction output and the actual output. MAE can be calculated by the following function: wherêis the actual output and is the prediction value. is the number of the data series. The smaller the value of MAE is, the more accuracy the prediction method is.
Computational Intelligence and Neuroscience MSE represents the energy of the mean error. MSE can be expressed as in the following:
represents the series number of the output data series. The number of all the relative errors that meet the condition −5% ≤̃≤ 5% is supposed as (5) . The whole number is . Then PRED(5) is defined as PRED (5) = (5) .
The index PRED(5) represents the fitness of the prediction model. If the value is close to 1.0, it indicates a good fit of the prediction model.

Feedback Control.
To optimize the performance of the resource demands prediction system, we introduce the feedback control [41] into the system. In each prediction cycle, the feedback controller sends the actual resource demands and prediction results to historical database. Specially, the demands value is specified in fine-grained form, including the elements in data structure vector . In addition, the validation indexes of MAE, MSE, PRED (10), and so forth are also processed in the controller. The feedback controller sends corresponding value to the historical database.

Experimental Evaluation
In this section, experiments are conducted to validate the proposed prediction method. When we predict the fine-grained resource demands, the method of each kind of resource is similar to others. Here we do not distinguish resource type, and we use network traffic as the representation. From [42], we sample 400 days network visit traffic data. We use anterior 350 days traffic data as training data and posterior 50 days traffic data as test data. The training effect is shown in Figure 5. In Figure 5(a), the blue curve represents the prediction value and the red curve represents the actual value. The two curves fit with each other. In Figure 5(b), we can see that the overall effect is promising. The maximum error is controlled within −1.1%∼+1%. The training results are accurate.
The test data is used to verify the prediction method. The test results are shown in Figure 6. In Figure 6(a), the blue curve is the prediction value and the red curve is the actual value. We can see that the most of the two curves almost overlap with each other. Figure 6(b) shows the prediction error. We can see that the maximum normalized error is 8%. Most of the normalized error data falls within −8%∼+8%. The prediction results are acceptable.
The prediction error using different methods is compared with each other and the results are depicted in Figure 7. In Figure 7, the performances of EMA, SMA, AR, and ESFCFNN methods are evaluated. The difference between different predicting methods is shown intuitively. We can see that the prediction error using ESFCFNN prediction method is apparently decreased. The results of ESFCFNN method are more accurate than other methods. The performance of ESFCFNN is optimized greatly.
According to the main criteria we defined from formulas (32) to (35), we test some base prediction methods including SMA, EMA, AR, and ESFCFNN. The results are shown in Figure 8. In Figure 8(a) we can see that both the mean average error in regularity and the max error of the ESFCFNN method are small. The performances of EMA and SMA are close to ESFCFNN. So as for the MSE and SSE, the ESFCFNN takes advantages of the other predictors' merits and realizes self-adaption and robustness. Finally, Figure 8(d) shows the number of errors that falls in 5%. The red columniation of Figure 8(d) is the total number, which are 50. The blue columniation is the number that falls in 5%. From the above analysis, we can conclude that the prediction results of ESFCFNN method are more accurate and this method is promising for predicting users' resource demands. Figure 9(a) depicts the training procedure without selfadjusting learning rate. After 100 training cycles, the error of FNN is approximately 0.048. The error is very large. Figure 9(b) depicts the performance of FNN with selfadjusting learning rate and momentum weight. After 100 training cycles, the error reaches 0.0015. The ratio of to is = / = 0.048/0.0015 = 32.
The performance is much better after using self-adjusting learning rate and momentum weight. Moreover, the convergence speed after using self-adjusting learning rate is improved. The training error falls down to 0.05 within 10 steps when using self-adjusting learning rate. However, if the self-adjusting learning rate algorithm is not adopted, the convergence speed is slowed down. More than 85 steps are needed before error falls down to 0.05. From above analysis, we can see that the performance of FNN with self-adjusting learning rate and momentum weight is improved greatly.
In Figure 10, the performance of FNN without using clustering algorithm is depicted. Figure 10(a) shows the convergence procedure. We can see that the convergence speed is slowed down. After more than 30 steps, the error falls down to 0.05. While Figure 9(b) shows that, with clustering algorithm, this procedure only needs less than 10 steps. Figure 10(b) shows the training error of FNN without using clustering algorithm. Compared with Figure 5, the training error of FNN without using clustering algorithm is greater. From the above comparison, the performance is improved using clustering algorithm.

Conclusions
To improve the performance of resource provision and resource utilization, this paper proposed a cloud resource demands self-adaptive predicting method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network, which is called ESFCFNN for short. We discuss the structure of the prediction system. Users' preferences are analyzed to reduce the amount of calculation. Then the base prediction model is introduced into the system. The results are sent to FNN with self-adjusting learning rate and momentum weight as the inputs. To optimize the convergence performance of FNN, fuzzy-subtractive clustering algorithm is proposed. Fuzzy-subtractive algorithm is composed with fuzzy c-means clustering algorithm and subtractive clustering algorithm. We evaluate the prediction system using statistic criteria including MAE, MSE, SSE, and PRED (5). The results show that ESFCFNN can effectively improve the prediction performance. Though the method this paper proposes is promising in improving the performance, the system is complex. As we can see, there are two prediction layers. The time delay may be increased. In future, the improvement of efficiency is the main point of the research. We would also test the method in the real cloud computing system in future.