Multilayer Perceptron for Robust Nonlinear Interval Regression Analysis Using Genetic Algorithms

On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets.


Introduction
In many practical applications, since the available information is often derived from uncertain assessments, real intervals can be employed to represent uncertain and imprecise observations [1]. Interval regression analysis, which provides interval estimation of individual dependent variables, is an important tool for dealing with uncertain data [1][2][3]. Interval regression analysis was developed on the basis of an important tool, namely, fuzzy regression analysis introduced by Tanaka et al. [4], whose objective is to build a model that contains all observed output data in terms of fuzzy numbers [4,5].
Among computational models in intelligence, neuralnetwork-based approaches have been employed to deal with nonlinear fuzzy or interval regression analysis (e.g., [6][7][8][9][10]) to facilitate the usefulness of fuzzy regression analysis. It is known that the neural-network-based approaches can overcome the difficulty in nonlinear fuzzy regression with LP, which is to choose a nonlinear model from an infinite number of alternatives [2]. When training data are not contaminated by outliers, which can be simply interpreted as data with a large deviation from its designated location [3,11], these methods perform well and the estimated data interval includes almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, data interval obtained by these methods can be influenced by outliers. It is noted that statistical data preprocessing is helpful to detect outliers, but the robust interval regression analysis can not only resist outliers but also provide interval estimation of individual dependent variables.
To overcome the corruption arising from outliers, robust learning approaches involving computational intelligence (e.g., neural networks and support vector machines) have been employed to determine the upper and lower bounds of data interval. For instance, Huang et al. [3] employed two MLPs to determine nonlinear interval models using a new cost function, in which the cost function introduced in [6] and the robust BP algorithm for function approximation [12] were taken into account. Jeng et al. [2] employed two radial basis function networks to determine the upper and lower bounds after applying the support vector regression approach to determine the initial structure and parameter values of neural networks. Moreover, Hwang et al. [1] proposed 2 The Scientific World Journal a robust method by combining the possibility estimation formulation integrating the property of central tendency with the principle of support vector interval regression.
It is found that, to determine the maximum and minimum limitations of interval regression, Jeng et al. [2] estimated the trimmed standard deviation of residuals excluding the highest and lowest % of the number of training data. can be specified as zero and 5-10 without and with outliers contained in the data, respectively. To avoid the estimated upper and lower bounds going beyond the maximum and minimum limitations, the values of a constant in the robust learning algorithms, introduced in [1,2], are required to be carefully specified for both the uncontaminated and the contaminated data, to modify the desired output of each pattern. Moreover, in [3,13], the upper limits on the percentage of outliers beyond and beneath the true interval model are prespecified for determining the degree of influence of each training pattern. Although the above-mentioned robust learning algorithms are robust against outliers, the required parameters for these algorithms are related to whether the collected data contain outliers or not. However, since the collected data are more or less contaminated, it is not easy to prespecify the degree of contamination. This motivates us to develop robust method without considering the degree of contamination.
On the basis of the effectiveness of using two MLPs to independently identify the upper and lower bounds of data interval for interval regression analysis, this paper aims to propose robust learning algorithms with the weighting schemes in [6] for MLP by using the genetic algorithm (GA) to identify outliers automatically for determining the robust nonlinear interval regression model, since the GA is a powerful search and optimization method [14,15]. The proposed method has the feature that outliers beyond or beneath the data interval will impose slight effect on the determination of upper and lower bounds. In practice, only parameter specifications for the GA are required for the proposed learning approach. To sum up, the main contribution of this paper is to use MLP to construct a robust nonlinear interval model by making use of the GA.
The rest of this paper is organized as follows. The functional-link net with functional-expansion model for approximation is introduced in Section 2. Section 3 introduces the MLP-based approach proposed by Ishibuchi and Tanaka [6] for nonlinear interval regression analysis. Section 3 describes the proposed robust learning algorithms in detail. In Section 4, in order to examine the effectiveness and applicability of the proposed method for determining a nonlinear interval regression model, several examples and real data are taken into account. From the experimental results, it is seen that the nonlinear interval models obtained by the proposed learning algorithms can include almost all regular data. This paper is concluded in Section 5.

Multilayer Perceptron for Nonlinear Interval Regression Analysis
Since the weighting schemes proposed by Ishibuchi and Tanaka [6] are incorporated into the proposed robust learning algorithms, the MLP-based approach employed to determine the estimated data interval is described in this section. MLP and the MLP-based approach are briefly reviewed in Sections 2.1 and 2.2, respectively.

Multilayer Perceptron.
MLP is usually used as a tool of approximation of functions like regression [16]. A threelayer perceptron with input nodes and a single hidden layer is taken into account. Let us denote the given nonfuzzy input-output pairs by (x , ), = 1, 2, . . . , , where x = ( 1 , 2 , . . . , ) and are the input vector and the corresponding desired output value, respectively. The sigmoid function, whose output ranges from 0 to 1, is commonly used as the transfer function for each hidden node and output node.
When x is presented to MLP, the output value from the th hidden node is computed as where represents the sigmoid function, is the bias to the th hidden node, and is the connection weight from the th input node to the th hidden node. Then, the final output value from the output node is computed as where is the number of hidden nodes, is the bias to the output node, and is the connection weight from the th hidden node to the output node. Thus, there are ( + 2) + 1 synaptic connections. The following cost function can be employed to train MLP: where represents the number of training patterns. It should be noted that since it was shown that MLP with a single hidden layer and any fixed continuous sigmoid function is sufficient to approximate any continuous function [17], a three-layer model is taken into account for our study.

MLP-Based
Approach. Ishibuchi and Tanaka [6] employed two feed-forward MLPs, MLP * and MLP * , to determine effectively a nonlinear interval model. Each of the two networks has only one hidden layer. Let * (x) and * (x) denote the output functions realized by MLP * and MLP * , respectively. In practice, * (x) and * (x) represent the upper and lower bounds of a nonlinear interval model, respectively. A nonlinear optimization problem is formulated to determine the nonlinear interval regression model as follows: The Scientific World Journal 3 where ( * (x )− * (x )) represents the width of the estimated data interval for x . The objective of the above formulation is to determine the nonlinear interval model with the least sum of widths of the predicted intervals for the respective inputs subject to the estimated data interval determined by the two MLPs including all the given input-output pairs. Instead of deriving a learning algorithm directly from the above nonlinear optimization problem, Ishibuchi and Tanaka derived learning algorithms for * (x) and * (x) by the following cost function with weighting scheme : where is a small positive value in the interval (0, 1). To determine the upper bound * (x), is defined as As for determining the lower bound * (x), is defined as It can be seen that the cost function can be multiplied by a small positive value in the interval (0, 1) (i.e., ) depending on whether the desired output of an input pattern is less than or greater than the network-estimated value. Weights updating rules in the MLP-based approach can be derived by gradient descent from the above objective function with weighting scheme. It is obvious that the two learning algorithms for determining * (x) and * (x) are the same except for the weighting schemes. Ishibuchi and Tanaka suggested that a smaller value of ( ) could lead to better satisfaction of the constraint condition. Nevertheless, the nonlinear interval model derived by the MLP-based approach can include all training data with outliers [3]. That is, the nonlinear interval model obtained by the MLP-based approach is sensitive to contaminated training data. To overcome the above problem, Huang et al. [3] pointed out that we should only pay attention to the quality of the training data whose desired outputs are greater than the actual outputs for determining * (x) and that of the training data whose desired outputs are less than the actual output for determining * (x). In practice, those more suspected outliers beyond or beneath the estimated data interval impose slighter effect on the determination of data interval.

The Proposed Robust Learning Algorithms
According to the objective function with the weighting schemes introduced in [6], Section 3.1 demonstrates the formulation of optimization problems. Moreover, the fitness functions of the GA are described. Subsequently, the coding scheme and genetic operations are described in detail in Sections 3.2 and 3.3, respectively. Finally, the proposed robust learning algorithms for determining the nonlinear interval model are presented.

Problem Formulation.
Two neural networks, MLP * and MLP * , are still employed to determine the upper and lower bounds of data interval. To determine the upper bound of the nonlinear interval regression model (i.e., * (x)), the singleobjective optimization problem of constructing MLP * is formulated as In a similar manner, the single-objective optimization problem of constructing MLP * for the lower bound (i.e., * (x)) is formulated as where (MLP) is defined as It can be seen that Ψ is incorporated into the abovementioned cost function . In the determination of * (x) at the th generation during evolution, Ψ for x is set to be of a very small positive value, say Ψ, if x beyond * (x) is an outlier. On the contrary, Ψ is set to be 1. In a similar manner for determining * (x), Ψ for x is set to be of a very small positive value if x beneath * (x) is an outlier. Otherwise, Ψ is set to be 1. In other words, outliers beyond or beneath the data interval can impose a slight effect on the determination of data interval. The single-objective GA, which is a general-purpose optimization technique, can be applied to the above problems by introducing a reward to the fitness function when the constraint condition is satisfied: where and num are constant positive weights of the objective and reward, respectively, and (MLP) is the ratio of input-output pairs that satisfy the constraint condition: where is defined for determining * (x) as whereas for determining * (x), is defined as The reward (MLP) is incorporated into the fitness function (i.e., fitness(MLP)) in order to facilitate all regular data beyond or beneath the estimated data interval.

4
The Scientific World Journal 3.2. Coding. As in [13], the transformation of the network output with respect to the th pattern, say , to the domain interval of the desired output is performed as follows: where and equal (max { | = 1, 2, . . . , } + ) and (min { | = 1, 2, . . . , } − ), respectively. Those unknown parameters including , , ( + 2) + 1 connection weights and bias of a MLP are automatically determined by the GA. Thus, there are ( + 2) + 3 substrings in a binary string. To identify outliers, a substring consisting of bits is employed to determine the set of outliers, and each bit can indicate whether the corresponding training pattern is an outlier or not. In practice, the bit of 0 indicates that the corresponding training pattern is not an outlier; otherwise, that pattern is an outlier. In an initial population, each bit in the string is randomly assigned as either 1 or 0, with the probability of 0.5. A substring except for determining outliers and , can be directly decoded as a real value ranging from −5 to 5. Such a range should be acceptable when the sigmoid function is used as the activation function, since it is known that a node tends to suffer from the premature saturation resulting from infinitely large positive or negative weights.
The determination of the length of a string depends mainly on their domain lengths and the corresponding required precision [18]. That is, if the domain of a variable has the length of 1 where 2 3 −1 < 1 10 2 < 2 3 , and the corresponding required precision is 2 decimal places, then 3 bits are required to code such a variable.

Genetic Operations.
Let pop denote the population size. When the fitness of each chromosome in the current population is obtained, genetic operators including selection, crossover, and mutation [14,15,19] are employed to determine the newly generated pop strings in the next population. Using the binary tournament selection with replacement, two strings are randomly selected from the current population, and the one with the maximum fitness can be placed in the mating pool. This process can be repeated pop times until there are pop strings in the mating pool. In other words, 0.5 pop pairs of chromosomes can be randomly selected from the current population.
After tournament selection, crossover and mutation are applied to a selected parent to reproduce children by altering the chromosomal makeup of two parents. In practice, the one-point crossover operation with the crossover probability Pr is used for exchanging partial information between two substrings in the selected pair of strings, and two new strings are generated to replace their parent strings. Each crossover point in a substring is chosen randomly. That is, we use a substring-wise one-point crossover operation where the total number of crossover points is the same as the number of substrings in each string. The mutation operation with the mutation probability Pr is performed on each bit of strings generated by the crossover operation.

Learning Algorithm Implementation.
The MLP robust learning algorithm for determining the upper bound of the nonlinear interval regression model (i.e., * (x)) is the same as that for determining the lower bound (i.e., * (x)), except for the weighting scheme, (MLP), and Ψ in the fitness function. The learning of each of the two MLPs is independent of each other. That is, two MLPs are independently employed to determine the upper and lower bounds of data interval. The learning algorithm for determining * (x) is written in the following. Output. The upper bound of a nonlinear interval model.

Method
Step 1: Initialization. A population containing pop binary strings is generated randomly.
Step 3: Compute Fitness Values. Compute the fitness value of each string in the current population by (11).
Step 4: Termination Test. con is used as the stopping condition. If the stopping condition is not satisfied, then proceed to the next step. That is, the genetic operations are iterated again to generate the new strings in the next population.
Step 5: Generate New Strings. Genetic operators are employed to generate the pop new strings in the next population from the current population.
Step 6: Perform Elitist Strategy. del strings are randomly removed from the newly generated pop strings. Then, add del best strings in the current population to form the next one.
The best string among the successive generations is taken as the desired solution.

Computer Simulations
Several data sets are employed to examine the effectiveness of the proposed method for determining a nonlinear interval The Scientific World Journal 5 regression model. Each data set is involved in learning a function of one variable (i.e., = 1). Since there is no best set of GA parameter specifications, according to the principles introduced in [18], the prespecified parameter specifications of the proposed robust learning algorithms for each simulation are described as follows.
(1) pop = 50: the most common population size varies from 50 to 500 individuals. Hence, 50 individuals is an acceptable minimum size. In an initial population, each bit in a binary string is randomly assigned as either 1 or 0, with the probability of 0.5.
(2) con = 5000: the stopping condition is specified according to the available computation time. However, a sufficient evolution of the GA is necessary.
(3) del = 2: to avoid generating too much perturbation in the next generation, a small number of elite chromosomes are taken into account.
(5) Pr = 0.95 and Pr = 0.001: since a Pr with a larger value allows the exploration of more of the solution space, a larger Pr is usually taken into account. Furthermore, in order not to generate excessive perturbation, Pr should be specified as a lower value.
(6) = 2, num = 1: it is considered that the minimization of the cost function is the primary objective of the regression analysis; a larger value is thus set to be .
(7) = 0.2, Ψ = 10 −5 : as mentioned above, is suggested to be specified as a small value. Ψ is set to be a very small positive value approaching zero.
Besides, as in [6], a MLP with five hidden nodes (i.e., = 5) is taken into account. Therefore, there are 18 substrings in a binary string. First, uncontaminated data are employed to verify the effectiveness of the proposed learning algorithms. The uncontaminated training data of the first example generated by The simulation result obtained using the proposed method for the above uncontaminated data is depicted in Figure 2. Moreover, as can be seen, the data interval includes almost all given training data. Subsequently, the simulation results for the real data set of studies on National Institute of Standards and Technology (NIST), which involve ultrasonic calibration consisting of 54 observations [20] and quantum defects in iodine atoms consisting of 25 observations [21], are further shown in Figures 3 and 4, respectively. The ultrasonic calibration data, whose response variable is ultrasonic response and whose predictor variable is metal distance, are often used to illustrate the construction of a nonlinear regression model. The quantum defects in data of iodine atoms, whose response variable is the number of quantum defects and whose predictor variable is the excited energy state, can also be employed to construct a nonlinear least squares regression model. It can be seen that the proposed learning algorithms work well for these two real data sets. Contaminated data are further employed to examine the data intervals obtained by the proposed learning algorithms.  For the first example, 5 out of the same 51 input-output pairs (i.e., 46 pieces of regular data) are randomly selected as outliers. The simulation result shown in Figure 5 indicates that the proposed learning algorithms can resist outliers. Those regular data are approximately included in the robust nonlinear interval model. Furthermore, comparison with Figure 1 shows that both results are approximately identical. For the second example, 3 out of the same 51 input-output pairs are randomly selected as outliers. Figure 6 shows that the estimated upper and lower bounds obtained by the proposed learning algorithms are not influenced by outliers. It can be seen that the simulation results shown in Figures 2 and 6 are similar. Using the traditional MLP-based approach, the results shown in [3] are not depicted in Figures 5 and 6 to simplify the presentation. From [3], it can be seen that the nonlinear interval model obtained by the MLP-based approach is sensitive to contaminated training data.

Conclusions
As mentioned above, since the available information is often derived from uncertain assessments, it is reasonable to use real intervals to deal with imprecise observations. Except for the traditional regression function determined merely by minimizing the least squared error, computational models in intelligence have been employed to determine the nonlinear interval regression model. There is no doubt that since the available data often contain outliers, the development of robust algorithms is necessary. Since the collected data are more or less contaminated, it is not easy to estimate the degree of contamination without performing statistical data preprocessing. In comparison with computational models in intelligence presented in [1][2][3]13], the proposed robust learning algorithms have the advantage of avoiding considering the degree of contamination of the collected data. This paper proposes MLP learning algorithms with the weighting schemes in [6] for determining the robust nonlinear interval regression model. Outliers, which are identified by the GA, beyond or beneath the data interval will impose a slight effect on the determination of data interval. As seen from the experimental results, it is seen that the nonlinear interval models obtained by the proposed learning algorithms can include almost all regular data. That is, the proposed learning algorithms are robust against outliers for contaminated data. Thus, it seems that the incorporation of the ratio of training data that are included in the interval model into the fitness function can facilitate the inclusion of regular data in the robust nonlinear interval model.
The nonlinear interval models shown in the previous section are satisfactory for both uncontaminated and contaminated data, whereas customized parameter tuning in The Scientific World Journal 7 computer simulations is not particularly considered for the proposed learning algorithms. The same parameter specifications in the GA are applied to each experiment. For this, it seems that the proposed learning algorithms are not sensitive to GA parameter specifications. The experimental results show that common setting of the GA parameter specifications for the proposed approach is acceptable.
Previously, several literatures with respect to the robust interval regression model have been published. For instance, Fagundes et al. [22] dealt with cases that have intervalvalued outliers in the input data set; Chuang and Lee [23] used data preprocessing to filter out outliers in the training data and then a regression model could be constructed by using the filtered data to train the support vector regression networks; D'Urso et al. [24] proposed a robust fuzzy linear regression model based on the least median squares-weighted least squares estimation procedure for the highly skewed data; Huang [25] proposed a reduced support vector machine approach in evaluating interval regression models with nonfuzzy inputs and interval output, but the models seem not to pay more attention to outlier resistance. In comparison with these methods, the proposed approach uses MLP to construct the robust interval regression models with crisp inputs and crisp outputs by using the GA elaborately. The use of data preprocessing to detect outliers and the consideration of interval-valued data set remain to be studied in future work.

Conflict of Interests
The author declares that there is no conflict of interests regarding the publication of this paper.