Applying Neural Networks to Prices Prediction of Crude Oil Futures

The global economy experienced turbulent uneasiness for the past five years owing to large increases in oil prices and terrorist’s attacks. While accurate prediction of oil price is important but extremely difficult, this study attempts to accurately forecast prices of crude oil futures by adopting three popular neural networks methods including the multilayer perceptron, the Elman recurrent neural network ERNN , and recurrent fuzzy neural network RFNN . Experimental results indicate that the use of neural networks to forecast the crude oil futures prices is appropriate and consistent learning is achieved by employing different training times. Our results further demonstrate that, in most situations, learning performance can be improved by increasing the training time. Moreover, the RFNN has the best predictive power and the MLP has the worst one among the three underlying neural networks. This finding shows that, under ERNNs and RFNNs, the predictive power improves when increasing the training time. The exceptional case involved BPNs, suggesting that the predictive power improves when reducing the training time. To sum up, we conclude that the RFNN outperformed the other two neural networks in forecasting crude oil futures prices.


Introduction
During the past three years, the global economy has experienced dramatic turbulence owing to uneasinessbecause of terrorists' attacks and rapidly rising oil prices.For example, the US light crude oil futures price rapidly climbed to the all-time peak about US$80 recently.Simultaneously, the US Federal Reserve continuously increased its benchmark short-term interest rates by seventeen times to prevent inflation till August 2006.Consequently, many governments and corporate managers attempted to seek a method of accurately forecasting the crude oil prices.

Output layer
Hidden layer

Multilayer Perceptron (MLP)
As a promising generation of information processing system that expresses the ability to learn, recall, and generalize based on training patterns or data, artificial neural networks ANNs are interconnected assembly of simple processing nodes, whose functionality is similar to human neurons.ANNs have become popular during the last two decades for diverse applications, ranging from financial prediction to machine vision.According to Refenes et al. 3 , the main potentials with respect to ANNs include the following: 1 handling complex nonlinear function; 2 learning from training data; 3 adapting to changing environments; 4 handling incomplete, noisy, and fuzzy information; 5 performing high-speed information processing.The most popular and widespread method used to train the multilayer perceptron MLP is the back propagation algorithm.MLP can be interpreted as a universal approximators and is used to estimate the parameter values via a gradient descent algorithm in problems involving nonlinear regression.The popularity of MLP is based on the simplicity and power of the underlying algorithm.Figure 1 shows the structure of MLP.
BPN involves two steps.The first step generates a forward flow of activation from the input layer to the output layer via the hidden layer.
The sigmoid function is usually served as where α ∈ R.An activation function can be differentiated since the steepest descent method is employed to derive the weight updating rule.The response of the hidden layer is the input of the output layer.In the second step, an overall error, E T , which is the difference between the actual and the desired output, is minimized employing a supervised learning task performed by MLP: where E T denotes the total error for a neural network across the entire training set, E p represents the network error for the pth pattern, d pk denotes the desired output of the kth unit in the output layer for pattern p, and y pk is the actual output of the kth unit in the output layer for the pth pattern.
Then the gradient method is applied to optimize the weight vector of E T to minimize the summed square error between the actual and the desired network outputs throughout the training period.The network weight is adjusted whenever a training data is inputted.The size of the adjustment is positively related to the sensitivity of the error function to weight connections.The general weight updating rule for the connection weight between the ith input node and the jth output node is as follows: where η is the learning rate.
BPNs can be widely applied to sample identification, pattern matching, compression, classification, diagnosis, credit rating, stock price trend forecasting, adaptive control, functional link, optimization, and data clustering.They can also be trained via a supervised learning task to reduce the difference between the desired and the actual outputs and have high learning accuracy.Yet BPNs have the following weaknesses: 1 slow learning speed, 2 long executing time, 3 very slow convergence; 4 falling into a local minimum of error functions, 5 lack of systematic methods in the network dynamics, 6 inability to use past experience to forecast its future behavior.This study further uses two dynamic neural networks to predict the crude oil futures prices.

Recurrent Neural Networks
Recurrent neural networks RNNs were first developed by Hopefield 28 in a single-layer form and later were developed using multilayer perceptrons comprising concatenated inputoutput, processing, and output layers.An RNN is a dynamic neural network that permits self-loops and backward connections so that the neurons have recurrent actions and provide local memory functions.The feedback within RNN can be achieved either locally or globally.The ability of feedback with delay provides memory to the network and is appropriate for prediction.According to Haykin 29 , RNN can summarize all required information on past system behavior to forecast future behavior.The ability of RNN to dynamically incorporate past experience based on internal recurrence makes it more powerful than BPN.The structure of RNN generally comprises the following: a recurring information from the output or hidden layer to the input layer, and b mutual connection of neurons within the same layer.The advantages of RNNs are as follows: 1 fast learning speed, 2 short executing time, and 3 fast converging to a stable state.
Refenes et al. 3 argued that RNNs exhibited full connection between each node and all other nodes in the network, whereas partial recurrent networks contain a number of specific feedback loops.They quoted Hertz et al. 30 who assigned the name Elman architecture when the feedback to the network input is from one or more hidden layers.This name originated from Elman 31 who designed a neural network with recurrent links for providing networks with dynamic memory.The Elman recurrent neural network ERNN is a temporal and simple recurrent network with a hidden layer, assuming that the neural network operates using discrete time steps.The activations of the hidden units at time t are fed backwards and serve as inputs to "context layer" at time t 1 and thus represent a form of short-term memory that enables limited recurrence.Moreover, the feedback links run from the hidden layer to the context layer and produce both temporal and spatial patterns.As depicted in Figure 2, this network is a two-layer network involving feedback in the first layer 32 .

Elmann Recurrent Neural Network
According to Hammer and Nørskov 33 , ERNN is a special case of RNN, differing mainly in that the learning algorithm is simply a truncated gradient descent method and training is less efficient than the standard method employed for RNN.However, The descriptive equations of ERNN can be considered as a nonlinear state-space model in which all weight values are constant following initialization: where w x i,j demonstrates weight linking the ith hidden-layer neuron and the jth contextlayer neuron, w μ i indicates weight linking the input neuron u κ − 1 and the ith hidden-layer neuron, w y i refers to weight linking the output neuron y κ and the ith hidden-layer neuron, f • expresses nonlinear activation function in the hidden-layer node, and n is the number of hidden-layer nodes.Since it is difficult to interpret the network functions of RNNs, this study further incorporates the fuzzy logic into RNNs.

Output layer (o)
Fuzzy rule layer (k) Membership layer (j) Figure 3: A structure of the recurrent fuzzy neural networks.

Recurrent Fuzzy Neural Networks
Fuzzy sets theory has first been introduced by Zadeh  Besides the fact that RNN's underlying theory is complicated and RNN is difficult to interpret, Hu and Chang 39 also found that there are limitations to forecast the accurate valuation for the long-term period by both BPN and RNN.This study thus uses a recurrent fuzzy neuron network RFNN model in addition to BPN and ERNN.According to Lee and Teng 6 , RFNN has several key aspects: dynamic mapping capability, temporal information storage, universal approximation, and the fuzzy inference system.Li et al. 34 also argued that RFNN has the same dynamic and robust advantage as RNN.In addition, the network function can be interpreted using fuzzy inference mechanism.Therefore, long-term prediction fuzzy models can be easily implemented using RFNNs.The network output in RFNN is fed back to the network input using one or more time delay units.As depicted in Figure 3, the general microstructure of RFNN consists of four layers in general: an input layer, a membership layer, a fuzzy rule layer, and an output layer.
The information transmission process and basic functions of each layer are as follows.
conducted in this layer.From 2.5 , the connection weight at the input layer w 1 i is unity: 2.7 2 Membership layer: the membership layer is also known as a fuzzification layer and contains several different types of neurons, each neuron performs membership function.The membership nodes in this layer correspond to the linguistic label of the input variables in the input layer and serve as a unit of memory.Each of these variables is transformed into several fuzzy sets in the membership layer where each neuron corresponds to a particular fuzzy set, with the actual membership function being provided by the neuron output.Each neuron in this layer represents characteristics of each membership function, and Gaussian function serves as the membership function.The jth neuron in this layer has the following input and output: where 2.9 m ij denotes the mean value of a Gaussian membership function of the jth term with respect to the ith input variable, σ ij represents the standard derivation of the Gaussian type membership function of the jth term with respect to the ith input, x 2 ij t is the input of this layer at the discrete time t, g 2 ij t − 1 denotes the feedback unit of memory which stores the past network Information and represents the main difference between FNN and RFNN, and θ ij indicates the connection weight of the feedback unit.
Each node in the membership layer possesses three adjustable parameters: m ij , σ ij , and θ ij .
3 Fuzzy rule layer: The fuzzy rule layer comprises numerous nodes, each node corresponds to a fuzzy operating region of the process being modeled.This layer constructs the entire fuzzy rule data set.The nodes in this layer equal the number of fuzzy sets corresponding to each external linguistic input variable and receive the one-dimensional membership degree of the associated rule from the nodes of a set in the membership layer.The output of each neuron in the fuzzy rule layer is obtained by using a multiplication operation.The input and output for the kth neuron in the fuzzy rule layer are as follows: where j is the ith input value that inputs to the neuron of the fuzzy rule layer, and g 3 i denotes the output of a fuzzy rule node representing the "firing strength" of its corresponding rule.Links before and fuzzy rule layer indicate the preconditions of the rules, and links after and fuzzy rule layer demonstrate the consequences of the fuzzy rule nodes.
4 Output layer: the output layer performs the defuzzification operation.Nodes in this layer are called output linguistic nodes, where each node is for an individual output of the system.The links between the fuzzy rule layer and the output layer are connected by the weighting values w jk .
For the kth neuron in the output layer, where w jk is the output action strength of the kth output associated with the jth fuzzy rule and serves as the tuning factor of this layer, g 4 k is the final inferred result, and y k represents the kth output of the FRNN.

Valuation Performance
This work uses the mean square error MSE method to assess the performance of three neural networks.The MSE is calculated as the average of the sum of the square of the error, which is given by the difference between the actual and the designed output.MSE thus is computed as where T indicates the total number of samples, T 1 refers to the number of estimated samples, r t represents the actual output, and r t denotes the desired output.

Data Description
This study focuses on energy futures for the near-month.Daily oil prices for Brent, WTI, DUBAI, and IPE are used in this investigation.The data sources are obtained from the Energy Bureau of USA and International Petroleum Exchange IPT of the Great Britain.This work explores the influence of training times on prediction performance so that it classifies the training period from January 1, 1990 to April 30, 2005 into three five-year sections.Table 1 shows the different training periods.

Comparison among Various Neural Networks
This work divides the training period into three parts and uses Matlab software to perform training and testing.In order to compare the predictive power of these three artificial neural network ANNs , the training function, namely, Levenberg Marquardt method, is employed, and the number of iterations over the data set is arbitrarily set to 1000 in order to train individual neural networks.Table 2 shows the symbolization and training times of multi-layer perceptron MLP , Elman recurrent neural networks ERNNs and recurrent fuzzy neural networks RFNNs . (

1) The Comparison of Learning Performance
Following 1000 training times, Table 3 illustrates the following ranking for the learning performance of the three ANNs: RFNN ranks first, followed by ERNN, and finally BPN.Table 3 shows that the learning performance of the ANNs improves with increasing training time of ANNs.One exceptional case is the MSE at part 2 under RFNN, which is less than that obtained from part 3. (

2) The Comparison of Predictive Power
The empirical results indicate that the predictive power of the three ANNs is ranked as follows: RFNN ranks first, followed by ERNN, and finally MLP.Table 4 shows that, under ERNNs and RFNNs, the predictive power of the ANNs improves with increasing training time.However, the predictive power of MLP differs from those of ERNNs and RFNNs as the predictive power of the MLP retrogresses with increasing training time.One possible explanation for this phenomenon is that a large difference exists between the forecasting value and the actual value from March 20, 2005 to March 28, 2005.The MLP is not a dynamic network and it cannot be applied by the past experience to the behavioral forecasting.

Conclusion
This study uses multi-layer perception MLP , Elman recurrent neural networks ERNNs and recurrent fuzzy neural networks RFNNs to forecast the crude oil prices and compare All of the MSE values obtained under different training times through MLP, ERNNs, and RFNNs are below 0.0026768, suggesting that the use of the neural networks to forecast the crude oil futures prices is appropriate, and consistent learning ability can be obtained by using different training times.This investigation confirms that, under most circumstances, the more training times the neural networks take, the more the learning performance of the neural networks improves.The only exceptional case occurs at part 2 under the RFNN model, where MSE is slightly less than that obtained from part 3.
Regarding the predictive power of the three neural networks, this study finds that RFNN has the best predictive power and MLP has the least predictive power among the three neural networks.This work also finds that, under ERNNs and RFNNs, the predictive power improves when increasing the training time.However, the results are different from those obtained under MLP, indicating that the predictive power improves when decreasing the training time.Possible explanation for this phenomenon is the existence of a large difference between the predictive value and the actual value during a 9-day period.To summarize, this study concludes that the recurrent fuzzy neural network is the best among the three neural networks.

Figure 2 :
Figure 2: Structure of the simple recurrent neural network.
35,36.Zimmermann 37 argued that fuzzy set theory can be adapted to different circumstances and contexts.Buckley and Hayashi 38 stated that fuzzy neural network FNN is a layered, feedforward, neural net that has fuzzy set signals and weights.Neural networks may utilize the data bank to train and learn, while the solution obtained by fuzzy logic may be verified by empirical study and optimization.Omlin et al. 5 noted that FNN comprises both clear physical meanings and good training ability.However, FNN only applies to static problems.

Table 1 :
The periods of the training patterns.

Table 2 :
The training times of three parts of various ANNs.

Table 3 :
The comparison of the learning ability MSE for the three neural networks.

Table 4 :
The comparison of the predictive power of the three neural networks.predictive power of the above three neural network models.Results of this work are summarized as follows. the