Application of the Intuitionistic Fuzzy InterCriteria Analysis Method with Triples to a Neural Network Preprocessing Procedure

The approach of InterCriteria Analysis (ICA) was applied for the aim of reducing the set of variables on the input of a neural network, taking into account the fact that their large number increases the number of neurons in the network, thus making them unusable for hardware implementation. Here, for the first time, with the help of the ICA method, correlations between triples of the input parameters for training of the neural networks were obtained. In this case, we use the approach of ICA for data preprocessing, which may yield reduction of the total time for training the neural networks, hence, the time for the network's processing of data and images.


Introduction
Working with neural networks presents many difficulties; for example, the number of neurons in the perception of the individual values can be too large, and since a proportionally larger amount of memory and computing power is necessary to train the networks, this would lead to a longer periods for training. Therefore, researchers are forced to look for better methods for training neural networks. Backpropagation is the most applied such method-in it neural networks are trained with uplink (applied on a Multilayer Perceptron). There are, however, many other methods that accelerate the training of neural networks [1][2][3], by reducing memory usage, which in turn lowers the needed amount of computing power.
In the stage of preprocessing, the data at the input of the neural network can be used as a constant threshold value to distinguish static from dynamic activities, as it was done in [4]. This way, the amount of incidental values due to unforeseen circumstances is reduced.
Another approach is to use a wavelet-based neural network classifier to reduce the power interference in the training of the neural network or randomly stumbled measurements [5]. Here the discrete wavelet transform (DWT) technique is integrated with the neural network to build a classifier.
Particle Swarm Optimization (PSO) is an established method for parameter optimization. It represents a population-based adaptive optimization technique that is influenced by several "strategy parameters." Choosing reasonable parameter values for PSO is crucial for its convergence behavior and depends on the optimization task. In [6] a method is presented for parameter metaoptimization based on PSO and it is applied to neural network training. The idea of Optimized Particle Swarm Optimization (OPSO) is to optimize the free parameters of PSO by having swarms within a swarm.
When working with neural networks it is essential to reduce the amount of neurons in the hidden layer, which 2 Computational Intelligence and Neuroscience also reduces the number of weight coefficients of the neural network as a whole. This leads to a smaller dimension of the weight matrices, and hence the used amount of memory. An additional consequence from this is the decreased usage of computing power and the shortened training time [7].
Multilayer Perceptrons are often used to model complex relationships between sets of data. The removal of nonessential components of the data can lead to smaller sizes of the neural networks, and, respectively, to lower requirements for the input data. In [8] it is described that this can be achieved by analyzing the common interference of the network outputs, which is caused by distortions in the data that is passed to the neural network's inputs. The attempt to find superfluous data is based on the concept of sensitivity of linear neural networks. In [9] a neural network is developed, in which the outputs of the neurons of part of the layers are not connected to the next layer. The structure thus created is called a "Network in a Network." In this way part of the inputs of the neural network are reduced, which removes part of the information, and along with it part of the error accumulated during training and data transfer. The improved local connection method given in [9] produces a global collation by fundamental cards in the classification layer. This layer is easier to interpret and less prone to overloading than the traditional fully connected layers.
In this paper, we apply the intuitionistic fuzzy setsbased method of InterCriteria Analysis to reduce the number of input parameters of a Multilayer Perceptron. This will allow the reduction of the weight matrices, as well as the implementation of the neural network in limited hardware, and will save time and resources in training.
The neural network is tested after reducing the data (effectively the number of inputs), so as to obtain an acceptable relation between the input and output values, as well as the average deviation (or match) of the result.

Presentation of the InterCriteria Analysis
The InterCriteria Analysis (ICA) method is introduced in [10] by Atanassov et al. It can be applied to multiobject multicriteria problems, where measurements according to some of the criteria are slower or more expensive, which results in delaying or raising the cost of the overall process of decision-making. When solving such problems it may be necessary to adopt an approach for reasonable elimination of these criteria, in order to achieve economy and efficiency.
The ICA method is based on two fundamental concepts: intuitionistic fuzzy sets and index matrices. Intuitionistic fuzzy sets were first defined by Atanassov [11][12][13] as an extension of the concept of fuzzy sets defined by Zadeh [14]. The second concept on which the proposed method relies is the concept of index matrix, a matrix which features two index sets. The theory behind the index matrices is described in [15].
According to the ICA method, a set of objects is evaluated or measured against a set of criteria, and the table with these evaluations is the input for the method. The number of criteria can be reduced by calculating the correlations (differentiated in ICA to: positive consonance, negative consonance, and dissonance) in each pair of criteria in the form of intuitionistic fuzzy pairs of values, that is, a pair of numbers in the interval [0, 1], whose sum is also a number in this interval. If some (slow, expensive, etc.) criteria exhibit positive consonance with some of the rest of the criteria (that are faster, cheaper, etc.), and this degree of consonance is considered high enough with respect to some predefined thresholds, with this degree of precision the decision maker may decide to omit them in the further decision-making process. The higher the number of objects involved in the measurement, the more precise the evaluation of the intercriteria consonances (correlations). This makes the approach completely data-driven and ongoing approbations over various application problems and datasets are helping us better perceive its reliability and practical applicability.
Let us consider a number of criteria, = 1, . . . , , and a number of objects, = 1, . . . , ; that is, we use the following sets: a set of criteria = { 1 , . . . , } and a set of objects = { 1 , . . . , }. We obtain an index matrix M that contains two sets of indices, one for rows and another for columns. For every p, q (1 ≤ ≤ , 1 ≤ ≤ ), in an evaluated object, is an evaluation criterion, and , is the evaluation of the pth object against the qth criterion, defined as a real number or another object that is comparable according to a relation with all the other elements of the index matrix M.
The next step is to apply the InterCriteria Analysis for calculating the evaluations. The result is a new index matrix * with intuitionistic fuzzy pairs ⟨ , , ] , ⟩ that represents an intuitionistic fuzzy evaluation of the relations between every pair of criteria C k and C l . In this way the index matrix M that relates the evaluated objects with the evaluating criteria can be transformed to another index matrix * that gives the relations among the criteria: The last step of the algorithm is to determine the degrees of correlation between groups of indicators depending of the chosen thresholds for and ] from the user. The correlations between the criteria are called "positive consonance," "negative consonance," or "dissonance." Here we use one of the Computational Intelligence and Neuroscience possible approaches to defining these thresholds, namely, the scale shown in Box 1 [16].

InterCriteria Analysis with Triples
The algorithm for identifying intercriteria triples is introduced in [17] by Atanassova et al.
Step 1. Starting from the input dataset of objects measured against criteria, we calculate the total number of ( − 1)/2 intuitionistic fuzzy pairs standing for the intercriteria consonances and plot these pairs as points onto the intuitionistic fuzzy triangle. Instead of maintaining a pair of two numbers for each pair of criteria -, namely, ⟨ , ] ⟩ we calculate (see [18]) for each pair the number : giving its distance from the (1; 0) point, that is, the image of the complete Truth onto the intuitionistic fuzzy triangle. Our aim is to identify top-down all the ( −1)/2 calculated values that are closest to the (1; 0) and, at the same time, closest to each other; hence we sort them in ascending order by their distance to (1; 0); see the example in Table 2.
Step 2. Let us denote with Σ the subset of the closest to (1; 0) triples of criteria. The way we construct the subset Σ may slightly differ per user preference or external requirement, with at least three possible alternatives, as listed below (see Figure 1): (2.1) Select top or top q% of the ( − 1)/2 ICA pairs (predefined number of elements of the subset Σ).
(2.2) Select all ICA pairs whose corresponding points are within a given radius from the (1; 0) point.
3) Select all ICA pairs whose corresponding points fall within the trapezoid formed between the abscissa, the hypotenuse, and the two lines corresponding to = and = for two predefined numbers , ∈ [0; 1].
Step 3. Check if there are triples of criteria, each pair of which corresponds to a point, belonging to the subset Σ. If no, then no triples of criteria conform with the stipulated requirements. However, if triples are to be found, then we extend the subset Σ accordingly, by either taking a larger number or (Substep (2.1)), or a larger radius (Substep (2.2)), or smaller and/or larger (Substep (2.3)). If now the subset Σ contains triples of criteria that simultaneously fulfil the requirements, then go to Step 4.
Step 4. We start top-down with the first pair of criteria, let it be C i -C j , that is, the pair with the smallest , thus ensuring maximal proximity of the corresponding point, say, , to (1; 0) point. We may pick the third criterion in the triple either as which is the next highest correlating criterion with , that is, with (> ), or as which is the next highest correlating criterion with , that is, with (> , noting that it is possible to have = ). Then, we check the distances to (1; 0) of the respective third points and , taking that triple of criteria C i -C j -C k or C i -C j -C l that has the min ( + + , + + ) .
Then for each triple of criteria C i -C j -C x (where ∈ { , }), we calculate the median point of the so formed triangle, which is a point plotted in the intuitionistic fuzzy triangle with coordinates: ⟨̃,]⟩ = ⟨ + + 4 Computational Intelligence and Neuroscience

Artificial Neural Networks
The artificial neural networks [4,19] are one of the tools that can be used for object recognition and identification. In the first step, it has to be learned and after that we can use for the recognitions and for predictions of the properties of the materials. Figure 2 shows in abbreviated notation of a classic two-layered neural network.
In the two-layered neural networks, one layer's exits become entries for the next one. The equations describing this operation are where (i) is the exit of the mth layer of the neural network for = 1, 2; (ii) is a matrix of the weight coefficients of the each of the entries of the th layer; (iii) is the neuron's entry bias; (iv) 1 is the transfer function of the 1st layer; (v) 2 is the transfer function of the 2nd layer.
The neuron in the first layer receives outside entries. The neurons' exits from the last layer determine the neural network's exits as .
The "backpropagation" algorithm [20] is used for learning the neural networks. When the multilayer neural network is trained, usually the available data has to be divided into three subsets. The first subset, named "Training set," is used for computing the gradient and updating the network weights and biases. The second subset is named "Validation set." The error of the validation set is monitored during the training process. The validation error normally decreases during the initial phase of training, as does the training set error. Sometimes, when the network begins to overfit the data, the error of the validation set typically begins to rise. When the validation error increases for a specified number of iterations, the training stops and the weights and biases at the minimum of the validation error are returned [4]. The last subset is named "test set." The sum of these three sets has to be 100% of the learning couples.
For this investigation we use MATLAB and neural network structure 8:45:1 (8 inputs, 45 neurons in hidden layer, and one output) (Figure 2). The numbers of the weight coefficients are 9 × 45 = 405.
The proposed method is focused on removing part of the number of neurons (and weight coefficients) and thus does not reduce the average deviation of the samples, used for the learning testing and validating the neural network.
So we work with a 140×8 table, and a software application that implements the ICA algorithm returns the results in the form of two index matrices (see Tables 1 and 2), containing, respectively, the membership and the nonmembership parts of the intuitionistic fuzzy correlations detected between each pair of criteria (28 pairs). The values in the matrix are colored in red-yellow-green color scale for the varying degrees of consonance and dissonance from green (highest values) to yellow. Naturally, each criterion best correlates with itself, which gives the respective intuitionistic fuzzy pairs ⟨1; 0⟩, or 1s and 0s, along the main diagonals of Tables 1 and 2.
In Table 3 the relations between the pairs of criteria obtained by applying the ICA method are shown.
The calculated distance for each pair of criteria C i -C j from the (1; 0) point in the intuitionistic fuzzy triangle is shown in Table 4 (note that ∈ [0, √2]). The next step is to choose the pair C i -C j with the smallest , thus ensuring maximal proximity of the corresponding point to (1; 0) point. We pick the third criterion in the triple either as that is the next highest correlating criterion with , or as that is the next highest correlating criterion with , taking that triple of criteria C i -C j -C k or C i -C j -C l that has the min( + + , + + ). In Table 5 the pairs of criteria C i -C j in "strong positive consonance," "positive consonance," and "weak positive consonance" are shown.
On the input of the neural network we put the experimental data for obtaining cetane number of crude oil. Testing is done as at the first step; all the measurements of the 140 crude oil probes against the 8 criteria are analyzed in order to make a comparison of the obtained results thereafter. For this comparison to be possible, the predefined weight coefficients    At the first step of the testing process, we use all the 8 criteria listed above, in order to train the neural network.
After the training process all input values are simulated by the neural network.
The average deviation of the all 140 samples is 1,8134. The coefficient (regression values measure the correlation between outputs and targets) obtained from the MATLAB program is 0.97434 (see Table 6).
At the next step of the testing process, we make a fork and try independently to remove one of the columns and experiment with data from the remaining seven columns. We compare the results in the next section, "Discussion." First, we make a reduction of column 1 (based on Table 5) and put the data on the input of the neural network.
After the training process all input values are simulated. The average deviation of all the 140 samples is 1.63 and the coefficient is 0.9772.
At the next step, we alternatively perform reduction of column 3 (according to Table 5), and put the data on the input of the neural network.
After the training process all input values are simulated. The average deviation of the all 140 samples is 1.8525 and the coefficient is 0.97256. After that we can proceed with columns 5, 2, 8, and 4. Now, at the next step, we proceed with feeding the neural network with 6 inputs, with the reduction of both columns, 3 and 5, according to the data from Table 5. The average deviation of all the 140 samples is 1.7644 and the coefficient is 0.97089. In the same way we can reduce the inputs: 1 and 5, 1 and 3, 2 and 3, 3 and 8, 3 and 4, and 4 and 8, simultaneously.
At the next step, we reduce the number of inputs with one more, that is, we put on the input of the neural network experimental data from 5 inputs, with removed columns 1, 3, and 5. The average deviation of all the 140 samples is 1.857 and 6 Computational Intelligence and Neuroscience     the coefficient is 0.97208 (see Table 6). In the same way are removed the parameters 2, 3, and 8 and 3, 4, and 8. Finally, we experiment with the reduction of the fourth column, feeding the neural network with only 4 inputs. After the reduced columns 1, 2, and 4, the fourth reduced column is column 5. After the simulation the average deviation of the all 140 samples is 2.19 and the coefficient obtained from the MATLAB program is 0.95927.

Discussion
In support of the method, Tables 6, 7, and 8 present the correlation coefficients between the different criteria. The tables also present the maximal values of the coefficient sums per criteria. In the last column, the triples of selected criteria are given, as sorted in the descending way by the correlation coefficient C i -C j .
In Table 9 compilations between ICA approach and correlation analysis according to Pearson, Kendall, and Spearman are shown.
The selected pairs, based on the four methods, are identical in the first row. In the second row three of the methods yield identical results (ICA, Kendall, and Spearman), and the only difference is in the selected criteria as calculated by the Pearson method. In the third row, the situation is the same. Here the triples are the same with precision of ordering. Only the triple of correlation criteria calculated by the Pearson method is different. In the fourth row, the triples are quite similar. The triples calculated by ICA and Pearson are identical. The triple determined by Kendall correlation coincides with the first row of the table. The last triple, defined by the Spearman correlation, coincides with the second and third row of the triples defined by the correlation analyses of ICA, Pearson, and Spearman.
So far, such a detailed comparison between the four methods has been conducted over medical [21,22] and petrochemical [23] data. It was observed that considerable divergence of the ICA results from the results obtained by the rest of the methods is only found when the input data contain mistakes, as a result of misplacing the decimal point with at Computational Intelligence and Neuroscience 7 Table 7: Correlation coefficients for pair of criteria C i -C j according to Kendall.    Table 9 ICA least one position to the left or to the right. We anticipate in the future a theoretical research for checking the validity of this practical observation. If it proves to be true, then ICA, together with the rest three types of analysis, will turn into a criterion for data correctness. As we stated above, reducing the number of input parameters of a classical neural network leads to reduction of the weight matrices, resulting in implementation of the neural network in limited hardware and saving time and resources in training. For this aim, we use the intuitionistic fuzzy sets-based approach of InterCriteria Analysis (ICA), which gives dependencies between the criteria and thus helps us reduce the number of highly correlating input parameters, yet keeping high enough the level of precision. Table 10 summarizes the most significant parameters of the process of testing the neural network with different numbers of inputs, gradually reducing the number in order to discover optimal results. These process parameters are the NN-specific parameters "average deviation," "regression coefficient R," and "number of the weight coefficients." The average deviation when we use 8 input vectors is 1.8134 with number of weight coefficients 405. By reducing the number of the inputs the number of weight coefficients is also decreased which theoretically is supposed to reduce the matching coefficient. In this case the removal of column 1 (and therefore one input is removed) causes further decreasing the average deviation of 1.6327. The additional information (without column 5) used for training the neural network is very little, and the total Mean Square Error is less. The result is better compared to the formerly used attempt by training the neural network with 8 data columns.
When we use 7 columns (and 7 inputs of neural networks) excluding some of the columns gives better result than the previous one. This shows that, while maintaining the number of weight coefficients and reducing the maximal membership in the intercriteria IF pairs, the neural network receives an additional small amount of information which it uses for further learning.
Best results (average deviation = 1.5716) are obtained by removing the two columns (6 inputs without inputs 1 and 3) with the greatest membership components of the respective d.
In this case, the effect of reducing the number of weight coefficients from 360 to 315 and the corresponding MSE is greater than the effect of the two columns.
The use of 5 columns (without columns 1, 3, and 5) leads to a result which is less than the previous, that is, 1.857. This shows that with reducing the number of weight coefficients (and the total MSE) and the information at the input of the neural network a small amount of information is lost with which the network is trained. As a result, the overall accuracy of the neural network is decreased. 8 Computational Intelligence and Neuroscience The worst results (average deviation = 2.217) are obtained in the lowest number of columns-4. In this case, columns 1, 2, 4, and 5 are removed. Although the number of weight coefficients here is the smallest, the information that is used for training the neural network is less informative.

Conclusion
In the paper we apply the newest leg of theoretical research on InterCriteria Analysis to a dataset with the measurements of 140 probes of crude oil against 8 physicochemical criteria. On the first step we put all data from these measurements in the input of a classical neural network. After performing ICA analysis of the pairwise intercriteria correlations, we apply the recently developed method for identification of intercriteria triples in attempt to reduce the inputs of the neural network, without significant loss of precision. This leads to a reduction of the weight matrices, thus allowing implementation of the neural network on limited hardware and saving time and resources in training.
Very important aspect of the testing of the neural network after reducing some of the data (resp., the number of inputs) is to obtain an acceptable correlation between the input and output values, as well as the average deviation (or match) of the result.

Conflicts of Interest
The authors declare that they have no conflicts of interest.