The approach of InterCriteria Analysis (ICA) was applied for the aim of reducing the set of variables on the input of a neural network, taking into account the fact that their large number increases the number of neurons in the network, thus making them unusable for hardware implementation. Here, for the first time, with the help of the ICA method, correlations between triples of the input parameters for training of the neural networks were obtained. In this case, we use the approach of ICA for data preprocessing, which may yield reduction of the total time for training the neural networks, hence, the time for the network’s processing of data and images.
Working with neural networks presents many difficulties; for example, the number of neurons in the perception of the individual values can be too large, and since a proportionally larger amount of memory and computing power is necessary to train the networks, this would lead to a longer periods for training. Therefore, researchers are forced to look for better methods for training neural networks. Backpropagation is the most applied such method—in it neural networks are trained with uplink (applied on a Multilayer Perceptron). There are, however, many other methods that accelerate the training of neural networks [
In the stage of preprocessing, the data at the input of the neural network can be used as a constant threshold value to distinguish static from dynamic activities, as it was done in [
Another approach is to use a waveletbased neural network classifier to reduce the power interference in the training of the neural network or randomly stumbled measurements [
Particle Swarm Optimization (PSO) is an established method for parameter optimization. It represents a populationbased adaptive optimization technique that is influenced by several “strategy parameters.” Choosing reasonable parameter values for PSO is crucial for its convergence behavior and depends on the optimization task. In [
When working with neural networks it is essential to reduce the amount of neurons in the hidden layer, which also reduces the number of weight coefficients of the neural network as a whole. This leads to a smaller dimension of the weight matrices, and hence the used amount of memory. An additional consequence from this is the decreased usage of computing power and the shortened training time [
Multilayer Perceptrons are often used to model complex relationships between sets of data. The removal of nonessential components of the data can lead to smaller sizes of the neural networks, and, respectively, to lower requirements for the input data. In [
In this paper, we apply the intuitionistic fuzzy setsbased method of InterCriteria Analysis to reduce the number of input parameters of a Multilayer Perceptron. This will allow the reduction of the weight matrices, as well as the implementation of the neural network in limited hardware, and will save time and resources in training.
The neural network is tested after reducing the data (effectively the number of inputs), so as to obtain an acceptable relation between the input and output values, as well as the average deviation (or match) of the result.
The InterCriteria Analysis (ICA) method is introduced in [
The ICA method is based on two fundamental concepts: intuitionistic fuzzy sets and index matrices. Intuitionistic fuzzy sets were first defined by Atanassov [
According to the ICA method, a set of objects is evaluated or measured against a set of criteria, and the table with these evaluations is the input for the method. The number of criteria can be reduced by calculating the correlations (differentiated in ICA to: positive consonance, negative consonance, and dissonance) in each pair of criteria in the form of intuitionistic fuzzy pairs of values, that is, a pair of numbers in the interval
Let us consider a number of
We obtain an index matrix
The next step is to apply the InterCriteria Analysis for calculating the evaluations. The result is a new index matrix
The last step of the algorithm is to determine the degrees of correlation between groups of indicators depending of the chosen thresholds for
Type of correlations between the criteria
strong positive consonance
positive consonance
weak positive consonance
weak dissonance
dissonance
strong dissonance
dissonance
weak dissonance
weak negative consonance
negative consonance
strong negative consonance
The algorithm for identifying intercriteria triples is introduced in [
Starting from the input dataset of
Let us denote with Σ the subset of the closest to (1; 0) triples of criteria. The way we construct the subset Σ may slightly differ per user preference or external requirement, with at least three possible alternatives, as listed below (see Figure
Select top
Select all ICA pairs whose corresponding points are within a given radius
Select all ICA pairs whose corresponding points fall within the trapezoid formed between the abscissa, the hypotenuse, and the two lines corresponding to
Three alternatives for constructing the subset Σ [
Check if there are triples of criteria, each pair of which corresponds to a point, belonging to the subset Σ. If no, then no triples of criteria conform with the stipulated requirements. However, if triples are to be found, then we extend the subset Σ accordingly, by either taking a larger number
We start topdown with the first pair of criteria, let it be
The artificial neural networks [
Abbreviated notation of a classical Multilayer Perceptron.
In the twolayered neural networks, one layer’s exits become entries for the next one. The equations describing this operation are
The neuron in the first layer receives
The “backpropagation” algorithm [
For this investigation we use MATLAB and neural network structure 8:45:1 (8 inputs, 45 neurons in hidden layer, and one output) (Figure
The proposed method is focused on removing part of the number of neurons (and weight coefficients) and thus does not reduce the average deviation of the samples, used for the learning testing and validating the neural network.
We consider a number of
The ICA method was applied to the 140 crude oil probes, measured against 8 criteria as listed below:
density at 15°C g/cm^{3};
10% (v/v) ASTM D86 distillation, °C;
50% (v/v) ASTM D86 distillation, °C;
90% (v/v) ASTM D86 distillation, °C;
refractive index at 20°C;
H_{2} content, % (m/m);
aniline point, °C;
molecular weight g/mol.
So we work with a
Membership parts of the IF pairs, giving the InterCriteria correlations.

(I)  (II)  (III)  (IV)  (V)  (VI)  (VII)  (VIII) 

(I)  1  0.699  0.770  0.658  0.956  0.176  0.446  0.703 
(II)  0.699  1  0.787  0.597  0.676  0.408  0.640  0.775 
(III)  0.770  0.787  1  0.777  0.728  0.395  0.665  0.922 
(IV)  0.658  0.597  0.777  1  0.627  0.468  0.674  0.771 
(V)  0.956  0.676  0.728  0.627  1  0.134  0.404  0.661 
(VI)  0.176  0.408  0.395  0.468  0.134  1  0.730  0.473 
(VII)  0.446  0.640  0.665  0.674  0.404  0.730  1  0.743 
(VIII)  0.703  0.775  0.922  0.771  0.661  0.473  0.743  1 
Nonmembership parts of the IF pairs, giving the InterCriteria relations.

(I)  (II)  (III)  (IV)  (V)  (VI)  (VII)  (VIII) 

(I)  0  0.288  0.217  0.326  0.042  0.822  0.552  0.295 
(II)  0.288  0  0.204  0.391  0.312  0.580  0.348  0.213 
(III)  0.217  0.204  0  0.212  0.261  0.595  0.325  0.068 
(IV)  0.326  0.391  0.212  0  0.359  0.518  0.312  0.215 
(V)  0.042  0.312  0.261  0.359  0  0.866  0.596  0.339 
(VI)  0.822  0.580  0.595  0.518  0.866  0  0.270  0.527 
(VII)  0.552  0.348  0.325  0.312  0.596  0.270  0  0.257 
(VIII)  0.295  0.213  0.068  0.215  0.339  0.527  0.257  0 
In Table
Correlations between the pairs of criteria.
Type of InterCriteria Relation  Pairs of criteria 

Strong positive consonance 
(IV) 
Positive consonance 
(IIIVIII) 
Weak positive consonance 
(IIIII, IIIIV, IIVIII, IVVIII, IIII) 
Weak dissonance 
(VIIVIII, IIIV, VIVII, III, IVIII, IIV, IVVII) 
Dissonance 
(IIIVII, IIV, VVIII, IIVII, IVV, IIIV) 
Strong dissonance 
(IVVI, VIVIII, IVII) 
Dissonance 
(IIVI, VVII, IIIVI) 
Weak dissonance 
0 
Weak negative consonance 
(IVI) 
Negative consonance 
(VVI) 
Strong negative consonance 
0 
The calculated distance
Distance

(I)  (II)  (III)  (IV)  (V)  (VI)  (VII)  (VIII) 

(I)  0  0.416  0.316  0.473  0.061  1.165  0.783  0.419 
(II)  0.416  0  0.295  0.561  0.450  0.829  0.501  0.310 
(III)  0.316  0.295  0  0.307  0.377  0.849  0.467  0.104 
(IV)  0.473  0.561  0.307  0  0.518  0.742  0.452  0.314 
(V)  0.061  0.450  0.377  0.518  0  1.225  0.843  0.480 
(VI)  1.165  0.829  0.849  0.742  1.225  0  0.382  0.745 
(VII)  0.783  0.501  0.467  0.452  0.843  0.382  0  0.363 
(VIII)  0.419  0.310  0.104  0.314  0.480  0.745  0.363  0 
The next step is to choose the pair
Distance













Chosen triple of criteria 



(I)  (V)  0.956  0.061  (III)  0.770  0.319  0.377  (III)  0.728  0.319  0.377  0.756 


(III)  (VIII)  0.922  0.104  (II)  0.787  0.295  0.310  (II)  0.775  0.295  0.310  0.709 


(II)  (III)  0.787  0.295  (VIII)  0.775  0.310  0.104  (IV)  0.777  0.561  0.307  0.709 


(III)  (IV)  0.777  0.307  (I)  0.770  0.319  0.473  (VIII)  0.771  0.104  0.314  0.725 


(II)  (VIII)  0.775  0.310  (I)  0.699  0.416  0.418  (IV)  0.771  0.561  0.314  1.144 


(IV)  (VIII)  0.771  0.314  (VII)  0.674  0.452  0.363  (VII)  0.743  0.452  0.363  1.129 


(I)  (III)  0.770  0.316  (VIII)  0.703  0.418  0.104  (V)  0.728  0.061  0.377  0.753 


On the input of the neural network we put the experimental data for obtaining cetane number of crude oil. Testing is done as at the first step; all the measurements of the 140 crude oil probes against the 8 criteria are analyzed in order to make a comparison of the obtained results thereafter. For this comparison to be possible, the predefined weight coefficients and offsets that are normally random values between −1 and 1 are now established and are the same in all studies with coefficients 1.
For the learning process, we set the following parameters: performance (MSE) = 0.00001; validation check = 25. The input vector is divided into three different parts: training (70/100); validation (15/100); and testing (15/100). For target we use the cetane number ASTM D613.
At the first step of the testing process, we use all the 8 criteria listed above, in order to train the neural network. After the training process all input values are simulated by the neural network.
The average deviation of the all 140 samples is 1,8134. The coefficient
Correlation coefficients for pair of criteria


Correlation coefficient 

Correlation coefficient 

Correlation coefficient 
max(correlation coefficient 
Chosen triple of criteria 

(I)  (V)  0,989  (III)  0,616  (III)  0,495  1,605  (IVIII) 
(III)  (VIII)  0,971  (IV)  0,819  (II)  0,797  1,789  (IIIVIIIIV) 
(VI)  (VII)  0,831  (VIII)  0,024  (VIII)  0,576  1,406  (VIVIIVIII) 
(III)  (IV)  0,819  (VIII)  0,971  (VIII)  0,796  1,789  (IIIIVVIII) 
At the next step of the testing process, we make a fork and try independently to remove one of the columns and experiment with data from the remaining seven columns. We compare the results in the next section, “Discussion.” First, we make a reduction of column 1 (based on Table
After the training process all input values are simulated. The average deviation of all the 140 samples is 1.63 and the coefficient
At the next step, we alternatively perform reduction of column 3 (according to Table
After the training process all input values are simulated. The average deviation of the all 140 samples is 1.8525 and the coefficient
Now, at the next step, we proceed with feeding the neural network with 6 inputs, with the reduction of both columns, 3 and 5, according to the data from Table
At the next step, we reduce the number of inputs with one more, that is, we put on the input of the neural network experimental data from 5 inputs, with removed columns 1, 3, and 5. The average deviation of all the 140 samples is 1.857 and the coefficient
Finally, we experiment with the reduction of the fourth column, feeding the neural network with only 4 inputs. After the reduced columns 1, 2, and 4, the fourth reduced column is column 5. After the simulation the average deviation of the all 140 samples is 2.19 and the coefficient
In support of the method, Tables
Correlation coefficients for pair of criteria


Correlation coefficient 

Correlation coefficient 

Correlation coefficient 
max(correlation coefficient 
Chosen triple of criteria 

(I)  (V)  0,915  (III)  0,557  (III)  0,470  1,472  (IVIII) 
(III)  (VIII)  0,858  (II)  0,582  (II)  0,566  1,440  (IIIVIIIII) 
(II)  (III)  0,582  (VIII)  0,566  (VIII)  0,566  1,147  (IIIIIVIII) 
(I)  (III)  0,557  (V)  0,915  (VIII)  0,858  1,472  (IIIIV) 
Correlation coefficients for pair of criteria


Correlation coefficient 

Correlation coefficient 

Correlation coefficient 
max(correlation coefficient 
Chosen triple of criteria 

(I)  (V)  0,988  (III)  0,728  (III)  0,641  1,716  (IVIII) 
(III)  (VIII)  0,962  (II)  0,762  (II)  0,753  1,724  (IIIVIIIII) 
(II)  (III)  0,762  (VIII)  0,753  (VIII)  0,962  1,724  (IIIIIVIII) 
(II)  (VIII)  0,753  (III)  0,762  (III)  0,962  1,715  (IIVIIIIII) 
In Table
ICA  Pearson  Kendall  Spearman  


(IVIII)  (IVIII)  (IVIII)  (IVIII) 

(IIIVIIIII)  (IIIVIIIIV)  (IIIVIIIII)  (IIIVIIIII) 

(IIIIIVIII)  (VIVIIVIII)  (IIIIIVIII)  (IIIIIVIII) 

(IIIIVVIII)  (IIIIVVIII)  (IIIIV)  (IIVIIIIII) 
The selected pairs, based on the four methods, are identical in the first row. In the second row three of the methods yield identical results (ICA, Kendall, and Spearman), and the only difference is in the selected criteria as calculated by the Pearson method. In the third row, the situation is the same. Here the triples are the same with precision of ordering. Only the triple of correlation criteria calculated by the
So far, such a detailed comparison between the four methods has been conducted over medical [
As we stated above, reducing the number of input parameters of a classical neural network leads to reduction of the weight matrices, resulting in implementation of the neural network in limited hardware and saving time and resources in training. For this aim, we use the intuitionistic fuzzy setsbased approach of InterCriteria Analysis (ICA), which gives dependencies between the criteria and thus helps us reduce the number of highly correlating input parameters, yet keeping high enough the level of precision.
Table
Table of comparison.
Number of inputs  Average deviation  Regression 
Number of the weight coefficients 

8 inputs  1.8134  0.97434  405 
7 inputs without input 1  1.6327  0.9772  360 
7 inputs without input 3  1.8525  0.97256  360 
7 inputs without input 5  1.6903  0.9734  360 
7 inputs without input 2  2.1142  0.96511  360 
7 inputs without input 8  1.7735  0.97511  360 
7 inputs without input 4  1.9913  0.96932  360 
6 inputs without inputs 3, 5  1.7644  0.97089  315 
6 inputs without inputs 1, 5  1.8759  0.97289  315 
6 inputs without inputs 1, 3  1.5716  0.97881  315 
6 inputs without inputs 2, 3  2.0716  0.96581  315 
6 inputs without inputs 3, 8  1.9767  0.97213  315 
6 inputs without inputs 3, 4  1.9792  0.97163  315 
6 inputs without inputs 4, 8  2.0174  0.96959  315 
5 inputs without inputs 1, 3, 5  1.857  0.97209  270 
5 inputs without inputs 2,3, 8  2.0399  0.96713  270 
5 inputs without inputs 3, 4, 8  2.0283  0.96695  270 
4 inputs without inputs 1, 2, 4, 5  2.217  0.95858  225 
4 inputs without inputs 2, 3, 4, 8  2.1989  0.95927  225 
The average deviation when we use 8 input vectors is 1.8134 with number of weight coefficients 405. By reducing the number of the inputs the number of weight coefficients is also decreased which theoretically is supposed to reduce the matching coefficient. In this case the removal of column 1 (and therefore one input is removed) causes further decreasing the average deviation of 1.6327. The additional information (without column 5) used for training the neural network is very little, and the total Mean Square Error is less. The result is better compared to the formerly used attempt by training the neural network with 8 data columns.
When we use 7 columns (and 7 inputs of neural networks) excluding some of the columns gives better result than the previous one. This shows that, while maintaining the number of weight coefficients and reducing the maximal membership in the intercriteria IF pairs, the neural network receives an additional small amount of information which it uses for further learning.
Best results (average deviation = 1.5716) are obtained by removing the two columns (6 inputs without inputs 1 and 3) with the greatest membership components of the respective
In this case, the effect of reducing the number of weight coefficients from 360 to 315 and the corresponding MSE is greater than the effect of the two columns.
The use of 5 columns (without columns 1, 3, and 5) leads to a result which is less than the previous, that is, 1.857. This shows that with reducing the number of weight coefficients (and the total MSE) and the information at the input of the neural network a small amount of information is lost with which the network is trained. As a result, the overall accuracy of the neural network is decreased.
The worst results (average deviation = 2.217) are obtained in the lowest number of columns—4. In this case, columns 1, 2, 4, and 5 are removed. Although the number of weight coefficients here is the smallest, the information that is used for training the neural network is less informative.
In the paper we apply the newest leg of theoretical research on InterCriteria Analysis to a dataset with the measurements of 140 probes of crude oil against 8 physicochemical criteria. On the first step we put all data from these measurements in the input of a classical neural network. After performing ICA analysis of the pairwise intercriteria correlations, we apply the recently developed method for identification of intercriteria triples in attempt to reduce the inputs of the neural network, without significant loss of precision. This leads to a reduction of the weight matrices, thus allowing implementation of the neural network on limited hardware and saving time and resources in training.
Very important aspect of the testing of the neural network after reducing some of the data (resp., the number of inputs) is to obtain an acceptable correlation between the input and output values, as well as the average deviation (or match) of the result.
The authors declare that they have no conflicts of interest.
The authors are thankful for the support provided by the Bulgarian National Science Fund under Grant Ref. no. DFNII025