Review on Methods to Fix Number of Hidden Neurons in Neural Networks

This paper reviews methods to fix a number of hidden neurons in neural networks for the past 20 years. And it also proposes a new method to fix the hidden neurons in Elman networks for wind speed prediction in renewable energy systems.The random selection of a number of hidden neurons might cause either overfitting or underfitting problems. This paper proposes the solution of these problems. To fix hidden neurons, 101 various criteria are tested based on the statistical errors.The results show that proposedmodel improves the accuracy and minimal error. The perfect design of the neural network based on the selection criteria is substantiated using convergence theorem. To verify the effectiveness of the model, simulations were conducted on real-time wind data. The experimental results show that with minimum errors the proposed approach can be used for wind speed prediction. The survey has been made for the fixation of hidden neurons in neural networks. The proposed model is simple, with minimal error, and efficient for fixation of hidden neurons in Elman networks.


Introduction
One of the major problems facing researchers is the selection of hidden neurons using neural networks (NN).This is very important while the neural network is trained to get very small errors which may not respond properly in wind speed prediction.There exists an overtraining issue in the design of NN training process.Over training is akin to the issue of overfitting data.The issue arises because the network matches the data so closely as to lose its generalization ability over the test data.
Artificial neural networks (ANN) is an information processing system which is inspired by the models of biological neural networks [1].It is an adaptive system that changes its structure or internal information that flows through the network during the training phase.ANN is widely used in many areas because of its features such as strong capacity of nonlinear mapping, high accuracy for learning, and good robustness.ANN can be classified into feedforward and feedback network.Back propagation network and radial basis function network are the examples of feedforward network, and Elman network is an example of feedback network.The feedback has a profound impact on the learning capacity and its performance in modeling nonlinear dynamical phenomena.
From the development of NN model, the researcher can face the following problems for a particular application [2].
(i) How many hidden neurons can be used?(ii) How many training pairs should be used?(iii) Which training algorithm can be used?(iv) What neural network architecture should be used?
The hidden neuron can influence the error on the nodes to which their output is connected.The stability of neural network is estimated by error.The minimal error reflects better stability, and higher error reflects worst stability.The excessive hidden neurons will cause over fitting; that is, the neural networks have overestimate the complexity of the target problem [3].It greatly degrades the generalization capability to lead with significant deviation in prediction.In this sense, determining the proper number of hidden neurons to prevent over fitting is critical in prediction problem.The modeling process involves creating a model and developing the model with the proper values [4].One of the major challenges in the design of neural network is the fixation of hidden neurons with minimal error and highest accuracy.
The training set and generalization error are likely to be high before learning begins.During training, the network adapts to decrease the error on the training patterns.The accuracy of training is determined by the parameters underconsideration.The parameters include NN architecture, number of hidden neurons in hidden layer, activation function, inputs, and updating of weights.
Prediction plays a major role in planning today's competitive environment, especially in the areas characterized by high concentration of wind generation.Due to the fluctuation and intermittent nature of wind, prediction result varies rapidly.Thus this increases the importance of accurate wind speed prediction.The proposed model is to be implemented in Elman network for the accurate wind speed prediction model.The need for wind speed prediction is to assist with operational control of wind farm and planning development of power station.The quality of prediction made by the network is measured in terms of error.Generalization performance varies over time as the network adapts during training.
Thus various criteria were proposed for fixing hidden neuron by researchers during the last couple of decades.Most of researchers have fixed number of hidden neurons based on trial rule.In this paper, new method is proposed and is applied for Elman network for wind speed prediction.And the survey has been made for the fixation of hidden neuron in neural networks for the past 20 years.All proposed criteria are tested using convergence theorem which converges infinite sequences into finite sequences.The main objective is to minimize error, improve accuracy and stability of network.This review is to be useful for researchers working in this field and selects proper number of hidden neurons in neural networks.

Literature Survey
Several researchers tried and proposed many methodologies to fix the number of hidden neurons.The survey has been made to find the number of hidden neurons in neural network is and described in a chronological manner.In 1991, Sartori and Antsaklis [5] proposed a method to find the number of hidden neurons in multilayer neural network for an arbitrary training set with P training patterns.Several existing methods are optimized to find selection of hidden neurons in neural networks.In 1993, Arai [6] proposed two parallel hyperplane methods for finding the number of hidden neurons.The 2  /3 hidden neurons are sufficient for this design of the network.
In 1995, Li et al. [7] investigated the estimation theory to find the number of hidden units in the higher order feedforward neural network.This theory is applied to the time series prediction.The determination of an optimal number of hidden neurons is obtained when the sufficient number of hidden neurons is assumed.According to the estimation theory, the sufficient number of hidden units in the second-order neural network and the first-order neural networks are 4 and 7, respectively.The simulation results show that the second-order neural network is better than the first-order in training convergence.According to that, the network with few nodes in the hidden layer will not be powerful for most applications.The drawback is long training and testing time.
In 1996, Hagiwara [8] presented another method to find an optimal number of hidden units.The drawback is that there is no guarantee that the network with a given number of hidden units will find the correct weights.According to the statistical behaviors of the output of the hidden units, if a network has large number of hidden nodes, a linear relationship is obtained in the hidden nodes.
In 1997, Tamura and Tateishi [9] developed a method to fix hidden neuron with negligible error based on Akaike's information criteria.The number of hidden neurons in three layer neural network is  − 1 and four-layer neural network is /2 + 3 where  is the input-target relation.In 1998, Fujita [10] proposed a statistical estimation of number of hidden neurons.The merits are speed learning.The number of hidden neurons mainly depends on the output error.The estimation theory is constructed by adding hidden neurons one by one.The number of hidden neurons is formulated as  ℎ = log ‖  ‖/ log , where  is total number of candidates that are randomly searched for optimum hidden unit  is allowable error.
In 1999, Keeni et al. [11] presented a method to determine the number of hidden units which are applied in the prediction of cancer cells.Normally, training starts with many hidden units and then prune the network once it has trained.However pruning does not always improve generalization.The initial weights for input to hidden layer and the number of hidden units are determined automatically.The demerit is no optimal solution.
In 2001, Onoda [12] presented a statistical approach to find the optimal number of hidden units in prediction applications.The minimal errors are obtained by the increase of number of hidden units.Md.Islam and Murase [13] proposed a large number of hidden nodes in weight freezing of single hidden layer networks.The generalization ability of network may be degraded when the number of hidden nodes ( ℎ ) is large because th hidden node may have some spurious connections.
In 2003, Zhang et al. [14] implemented a set covering algorithm (SCA) in three-layer neural network.The SCA is based on unit sphere covering (USC) of hamming space.This methodology is based on the number of inputs.Theoretically the number of hidden neurons is estimated by random search.The output error decreases with  ℎ being added.The  ℎ is significant in characterizing the performance of the network.The number of hidden neurons should not be too large for heuristic learning system.The  ℎ found in set covering algorithm is 3/2 hidden neurons where  is the number of unit spheres contained in  dimensional hidden space.In the same year Huang [15] developed the model for learning and storage capacity of two-hidden-layer feedforward network.In 2003, Huang [15] formulated the following:  ℎ = √( + 2) + 2√/( + 2) in single hidden layer and  ℎ = √/( + 2) in two hidden layer.The main features are the following.
(i) It has high first hidden layer and small second hidden layer.(ii) Weights connecting the input to first hidden layer can be prefixed with most of the weights connecting the first hidden layer and second hidden layer that can be determined analytically.(iii) It may be trained only by adjusting weights and quantization factors to optimize the generalization performance.(iv) It may be able to overfit the sample with any arbitrary small error.
In 2006, Choi et al. [16] developed a separate learning algorithm which includes a deterministic and heuristic approach.In this algorithm, hidden-to-output and input-tohidden nodes are separately trained.It solved the local minima in two-layered feedforward network.The achievement is best convergence speed.In 2008, Jiang et al. [17] presented the lower bound on the number of hidden neurons.The number of hidden neurons is  ℎ =   where  is valued upper bound function.The calculated values represent that the lower bound is tighter than the ones that has existed.The lower and upper bound on the number of hidden neurons help to design constructive learning algorithms.The lower bound can accelerate the learning speed, and the upper bound gives the stopping condition of constructive learning algorithms.It can be applied to the design of constructive learning algorithm with training set  numbers.In the same year, Jinchuan and Xinzhe [3] investigated a formula tested on 40 cases:  ℎ = ( in + √  )/ where  is the number of hidden layer,  in is the number of input neuron and   is the number of input sample.The optimum number of hidden layers and hidden units depends on the complexity of network architecture, the number of input and output units, the number of training samples, the degree of the noise in the sample data set, and the training algorithm.
The quality of prediction made by the network is measured in terms of the generalization error.Generalization performance varies over time as the network adapts during training.The necessary numbers of hidden neurons approximated in hidden layer using multilayer perceptron (MLP) were found by Trenn [18].The key points are simplicity, scalability, and adaptivity.The number of hidden neurons is  ℎ =  +  0 − 1/2 where  is the number of inputs and  0 is the number of outputs.In 2008, Xu and Chen [19] developed a novel approach for determining optimum number of hidden neurons in data mining.The best number of hidden neurons leads to minimum root means Squared Error.The implemented formula is  ℎ =   (/ log ) 1/2 , where  is number of training pairs,  is input dimension, and   is first absolute moment of Fourier magnitude distribution of target function.
In 2009, Shibata and Ikeda [20] investigated the effect of learning stability and hidden neuron in neural network.The simulation results show that the hidden output connection weight becomes small as number of hidden neurons  ℎ becomes large.This is implemented in random number mapping problems.The formula for hidden nodes is  ℎ = √   0 where   is the input neuron and  0 is the output neuron.Since neural network has several assumptions which are given before starting the discussion to prevent divergent.In unstable models, number of hidden neurons becomes too large or too small.A tradeoff is formed that if the number of hidden neurons becomes too large, output of neurons becomes unstable, and if the number of hidden neurons becomes too small, the hidden neurons becomes unstable again.
In 2010, Doukim et al. [21] proposed a technique to find the number of hidden neurons in MLP network using coarseto-fine search technique which is applied in skin detection.This technique includes binary search and sequential search.This implementation is trained by 30 networks and searched for lowest mean squared error.The sequential search is performed in order to find the best number of hidden neurons.Yuan et al. [22] proposed a method for estimation of hidden neuron based on information entropy.This method is based on decision tree algorithm.The goal is to avoid the overlearning problem because of exceeding numbers of the hidden neurons and to avoid the shortage of capacity because of few hidden neurons.The number of hidden neurons of feedforward neural network is generally decided on the basis of experience.In 2010, Wu and Hong [23] proposed the learning algorithms for determination of number of hidden neurons.In 2011, Panchal et al. [24] proposed a methodology to analyze the behavior of MLP.The number of hidden layers is inversely proportional to the minimal error.
In 2012, Hunter et al. [2] developed a method used in proper NN architectures.The advantages are the absence of trial-and-error method and preservation of the generalization ability.The three networks MLP, bridged MLP, and fully connected cascaded network are used.The implemented formula: as follows,  ℎ =  + 1 for MLP Network,  ℎ = 2 + 1 for bridged MLP Network and  ℎ = 2  − 1 for fully connected cascade NN.The experimental results show that the successive rate decreases with increasing parity number.The successive rate increases with number of neurons used.The result is obtained with 85% accuracy.
The other algorithm used to fix the hidden neuron is the data structure preserving (DSP) algorithm [25].It is an unsupervised neuron selection algorithm.The data structure denotes relative location of samples in high dimensional space.The key point is retaining the separate margin underlying the full set of neuron.The optimum number of hidden nodes is found out by trial-and-error approach [26].The advantages are the improvement in learning and classification, cavitations signal using Elman neural network.The simulation results show that error gradient and  ℎ selection scheme work well.Another approach is to fix hidden neuron based on information entropy which uses decision tree algorithm [27].The  ℎ is generally decided based on the experience.Initially  ℎ should be trained.The activation values of hidden neuron should be calculated by inputting the training sample.Finally information is calculated.To select the hidden neurons, SVM stepwise algorithm is used.In this algorithm, linear programming SVM is employed to preselect the number of hidden neurons [28].The performance is evaluated by RMSE (root means square error).The advantage is improved computation time.The hidden neuron is selected empirically such as 2, 4, 6, 12, 24, and it is applied in sonar target classification problem [29].It is close to Bayes classifier.The analysis of variance is done on the result of the aspect angle dependent test experiments.The improvement of performance is by 10%.
Another approach to fix hidden neuron is the sequential orthogonal approach (SOA).This approach [30] is about adding hidden neurons one by one.Initially, increase  ℎ sequentially until error is sufficiently small.This selection problem can be approached statistically by generalizing Akaike's information criterion (AIC) to be applied in unrealizable model under general loss function including regularization.The other existing methods are trial and error, thump rule, and so forth.In thump rule,  ℎ is between size of number of input neurons and number of output neurons.Another rule is equal to 2/3 size of input layer and output layer [19].The other  ℎ is less than twice the size of input layer.The number of hidden neuron depends on number of inputs, outputs, architectures, activations, training sets, algorithms, and noises.The demerit is higher training time.The other existing techniques are network growing and network pruning [31,32].The growing algorithm allows the adaptation of network structure.This starts with undersized  ℎ and adds neurons to number of hidden neurons.The disadvantages are time consuming and no guarantee of fixing the hidden neuron.
The researchers have been implemented various methods for selecting the hidden neuron.The researchers are aimed at improving factors like faster computing process, more efficiency and accuracy and less errors.The proper selection of hidden neuron is important for the design of neural network.

Problem Description
The proper selection of number of hidden neurons has been analyzed for Elman neural network.To select hidden neurons in order to to solve a specific task has been an important problem.With few hidden neurons, the network may not be powerful enough to meet the desired requirements including capacity and error precision.In the design of neural network, an issue called overtraining has occurred.Over training is akin to the problem of overfitting data.So fixing the number of a hidden neuron is important for a given problem.An important but difficult task is to determine the optimal number of parameters.In other words, it needs to measure the discrepancy between neural network and an actual system.In order to tackle this, most researches have mainly focused on improving the performance.There is no way to find hidden neuron in neural network without trying and testing during the training and computing the generalization error.The hidden output connection weights becomes small as number of hidden neurons become large, and also the tradeoff in stability between input and hidden output connection exists.A tradeoff is formed that if the  ℎ becomes too large, the output neurons becomes unstable, and if the number of hidden neuron becomes too small, the hidden neuron becomes unstable again.The problems in the fixation of hidden neurons still exist.The properties of neural networks are convergence and stability to be verified by the performance analysis.The problem of wind speed prediction is closely linked to intermittency nature of wind.The characteristics of wind involve uncertainty.
The input and output neuron is to be modeled, while  ℎ should be fixed properly in order to provide good generalization capabilities for the prediction.ANN comprising deficient number of hidden neuron may not be feasible to dynamic system.During the last couple decades, various methods were developed to fix hidden neurons.Nowadays most predicting research fields have been heuristic in nature.There is no generally accepted theory to determine how many hidden neurons are needed to approximate any given function in single hidden layer.If it has few numbers of hidden neurons, it might have a large training error due to underfitting.If it has more numbers of hidden neurons, might have a large training error due to overfitting.An exceeding number of hidden neurons made on the network deepen the local minima problem [30].The proposed method shows that there is a stable performance on training despite of the large number of hidden neurons in Elman network.The objective is to select hidden neurons to design the Elman network and minimize the error for wind speed prediction in renewable energy systems.Thus, research is being carried out for fixing hidden neuron in neural networks.The optimal number of hidden neurons based on the following error criteria.The error criteria such as mean square error (MSE), mean relative error (MRE), and Mean Absolute Error (MAE) are assessed on the proposed model performance.The fixing of the number of hidden neurons in Elman network is based on minimal error performance.The formulas of error criteria are as follows; where   is predicted output,    is actual output,   is average actual output, and  is number of samples.The process of designing the network plays an important role in the performance of network.

Proposed Architecture
There exists various heuristics in the literature; amalgamating the knowledge gained from previous experiments where a near optimal topology might exist [33][34][35].The objective is to devise the criteria that estimate the number of hidden neurons as a function of input neurons () and to develop the model for wind speed prediction in renewable energy systems.The estimate can take the form of a single exact topology to be adopted.

Overview of Elman Network.
Elman network has been successfully applied in many fields, regarding prediction, modeling, and control.The Elman network is a recurrent neural network (RNN) adding recurrent links into hidden layer as a feedback connection [36][37][38].It consists of input, hidden, recurrent link and output layer.The recurrent layer copies one step delay of hidden layer.It reflects both input and output layers' information by intervening fedback between output of input and hidden layers.The output is taken from the hidden layer.The feedback is stored in another layer called recurrent link layer which retains the memory.It is chosen due to hidden layer being wider than output layer.This wider layer allows more values to be feedback to input, thus allowing more information to be available to the network.The hidden layer has the hyperbolic tangent sigmoid activation function, and output layer has purelin activation function.
For the considered wind speed prediction model, the inputs are temperature (  ), wind direction (  ), and wind speed (  ).As a result, three input neurons were built in the output layer.The wind speed to be predicted forms the single output neuron in output layer.The proposed approach aims to fix the number of neurons so as to achieve better accuracy and faster convergence.From Figure 1, the input and the output target vector pairs are as foloows.
Let   be the weight between context layer and input layer.
Let  1 be the weight between input and hidden layer.
Let  2 be the weight between hidden and recurrent link layer.
The neurons are connected one to one between hidden and recurrent link layer: (⋅) is purelin activation function, ℎ(⋅) is hyperbolic sigmoid activation function.
From Figure 1, it can be observed that the layers make independent computation on the data that they receive and pass the results to another layer and finally determine the output of the network.The input ( − 1) is transmitted through the hidden layer that multiplies  1 by hyperbolic sigmoid function.The network learns the function based on current input  1 ( − 1) plus record of previous state output     ().Further, the value () is transmitted through second connection multiplied with  2 by purelin function.As a result of training the network, past information reflects to the Elman network.The number of hidden neuron is fixed based on new criteria.The proposed model is used for estimation and prediction.The key of the proposed method is to select the number of neurons in hidden layer.The proposed architecture for fixing number of hidden neurons in Elman Network is shown in Figure 1: Input of recurrent link layer,   () =  ( − 1) .
(3) 4.2.Proposed Methodology.Generally, neural network involves the process of training, testing, and developing a model at end stage in wind farms.The perfect design of NN model is important for challenging other not so accurate models.The data required for inputs are wind speed, wind direction, and temperature.The higher valued collected data tend to suppress the influence of smaller variable during training.To overcome this problem, the min-max normalization technique which enhances the accuracy of ANN model is used.Therefore, data are scaled within the range [0 1].The scaling is carried out to improve accuracy of subsequent numeric computation.The selection criteria to fix hidden neuron are important in prediction of wind speed.The perfect design of NN model based on the selection criteria is substantiated using convergence theorem.The training can be learned from previous data after normalization.The performance of trained network is evaluated by two ways: First the actual and predicted wind speeds comparison and second computation of statistical errors of the network.Finally wind speed is predicted which is the output of the proposed NN model.

Data Collection.
The real-time data was collected from Suzlon Energy Ltd., India Wind Farm for a period from April 2011 to December 2012.The inputs are temperature, wind vane direction from true north, and wind speed in anemometer.The height of wind farm tower is 65 m.The predicted wind speed is considered as an output of the model.The number of samples taken to develop a proposed model is 10000.
The parameters considered as input to the NN model are shown in Table 1.The sample inputs are collected from wind farm are as shown in Table 2.

Data Normalization.
The normalization of data is essential as the variables of different units.The data are scaled within the range of 0 to 1.The scaling is carried out to improve accuracy of subsequent numeric computation and obtain better output.The min-max technique is used.The advantage is preserving exactly all relationships in the data, and it does not introduce bias.The normalization of data is obtained by the following transformation (4).
Normalized input, where   ,  min ,  max are the actual input data, minimum and maximum input data.  min ,   max be the minimum and maximum target value.

Designing the Network.
Set-up parameter includes epochs and dimensions.The training can be learned from the past data after normalization.The dimensions like number of input, hidden, and output neuron are to be designed.The three input parameters are temperature, wind direction and wind speed.The number of hidden layer is one.The number of hidden neurons is to be fixed based on proposed criteria.The input is transmitted through the hidden layer that multiplies weight by hyperbolic sigmoid function.The network learns function based on current input plus record of previous state.Further, this output is transmitted through second connection multiplied with weight by purelin function.As a result of training the network, past information is reflected to Elman network.The stopping criteria are reached  According to convergence theorem, parameter converges to finite value: Here 4 is limit of sequence as  → ∞.
If sequence has limit, then it is a convergent sequence. is the number of input parameters.
The considered 101 various criteria for fixing the number of hidden neuron with statistical errors are established in Table 4.The selected criteria for NN model is (4 2 + 3)/( 2 − 8) it has been observed that the error values are less compared to other criteria.So this proposed criterion is very effective for wind speed prediction in renewable energy systems.
The actual and predicted wind speeds observed based on proposed model is shown in Figure 2. The advantages of the proposed approach are minimal error, effective, and easy implementation for wind speed prediction.This proposed algorithm was simulated and obtained a minimal MSE of 0.018, MRE of 0.0049, and MAE of 0.04.

Discussion and Results
Several researchers proposed many approaches to fix the number of hidden neurons in neural network.The approaches can be classified into constructive and pruning approaches.The constructive approach starts with undersized network and then adds additional hidden neuron [7,39].The pruning approach starts with oversized network and then prunes the less relevant neuron and weights to find the smallest size.The problems of proper number of hidden neurons for a particular problem are to be fixed.The existing method to determine number of hidden neurons is trial-and-error rule.This starts with undersized number of hidden neurons and adds neurons to  ℎ .The disadvantage is that it is time consuming and there is no guarantee of fixing the hidden neuron.The selected criteria for NN model is (4 2 + 3)/( 2 − 8) which used 39 numbers of hidden neurons and obtained a minimal MSE value of 0.018 in comparison with other criteria.The salient points of the proposed approach are discussed here.The result with minimum error is determined as best solution for fixing hidden neurons.Simulation results are showing that predicted wind speed is in good agreement with the experimental measured values.Initially real-time data are divided into training and testing set.The training set performs in neural network learning, and testing set performs to estimate the error.The testing performance stops improving as the  ℎ continues to increase; training has begun to fit the noise in the training data, and overfitting occurs.From the results, it is observed that the proposed methodology gives better results than the other approaches.In this paper, proposed criteria are considered for designing a three-layer neural networks.The proposed models were run on a Lenova laptop computer with Pentium III processor running at 1.3 GHz with 240 MB of RAM.The statistical errors are calculated to evaluate the performance of network.It is known that certain approaches produce large size network that is unnecessary whereas others are expensive.
The analysis of wind speed prediction is carried out by the proposed new criteria.Table 5 shows that the proposed model gives better value for statistical errors in comparison with other existing models.

Conclusion
In this paper, a survey has been made on the design of neural networks for fixing the number of hidden neurons.The proposed model was introduced and tested with real-time wind data.The results are compared with various statistical errors.The proposed approach aimed at implementing the selection of proper number of hidden neurons in Elman network for wind speed prediction in renewable energy systems.The better performance is also analyzed using statistical errors.The following conclusions were obtained.
(1) Reviewing the methods to fix hidden neurons in neural networks for the past 20 years.
(2) Selecting number of hidden neurons thus providing better framework for designing proposed Elman network.
(3) Reduction of errors made by Elman network.
(4) Predicting accurate wind speed in renewable energy systems.
(5) Improving stability and accuracy of network.

Figure 1 :
Figure 1: Architecture of the proposed model for fixing the number of hidden neurons in Elman Network.

Table 2 :
Collected sample inputs from wind farm.

Table 3 :
Designed parameters of Elman network.

Table 4 :
Statistical analysis of various criteria for fixing number of hidden neurons in Elman network.

Table 3 .
4.2.4.Selection of Proposed Criteria.For the proposed model, 101 various criteria were examined to estimate training process and errors in Elman network.The input neuron is taken into account for all criteria.It is tested on convergence theorem.Convergence is changing infinite into finite sequence.All chosen criteria are satisfied with the convergence theorem.Initially, apply the chosen criteria to the Elman network for the development of proposed model.Then, train the neural network and compute statistical errors.
4.3.Proof for the Chosen Proposed Strategy.Based on the discussion on convergence theorem in the Appendix, the proof for the selection criteria is established henceforth.

Table 5 :
Performance analysis of various approaches in existing and proposed models.