A Grey Wolf Optimizer for Modular Granular Neural Networks for Human Recognition

A grey wolf optimizer for modular neural network (MNN) with a granular approach is proposed. The proposed method performs optimal granulation of data and design of modular neural networks architectures to perform human recognition, and to prove its effectiveness benchmark databases of ear, iris, and face biometric measures are used to perform tests and comparisons against other works. The design of a modular granular neural network (MGNN) consists in finding optimal parameters of its architecture; these parameters are the number of subgranules, percentage of data for the training phase, learning algorithm, goal error, number of hidden layers, and their number of neurons. Nowadays, there is a great variety of approaches and new techniques within the evolutionary computing area, and these approaches and techniques have emerged to help find optimal solutions to problems or models and bioinspired algorithms are part of this area. In this work a grey wolf optimizer is proposed for the design of modular granular neural networks, and the results are compared against a genetic algorithm and a firefly algorithm in order to know which of these techniques provides better results when applied to human recognition.


Introduction
In this paper, a grey wolf optimizer for modular granular neural networks (MGNN) is proposed. The main goal of this optimizer is the design of modular neural networks architectures using a granular approach and to evaluate its effectiveness, these modular granular neural networks are applied to one of the most important pattern recognition problems, human recognition. For a long time human recognition has been a widely studied area, where its study mainly lies in finding those techniques and biometric measures that allow having a trustworthy identification of persons to protect information or areas [1,2]. Some of the most used biometric measures are face [3,4], iris [5], ear [6,7], voice [8], vein pattern [9], hand geometry [10], signature [11], and gait [12], among others.
It is important to mention that some soft computing techniques such as neural networks and fuzzy logic combined with a bioinspired algorithm can allow achieving better performance when they are individually used. When two or more techniques are combined the resulting system is called hybrid intelligent system [7,32]. In this paper a hybrid intelligent system is proposed using modular neural networks (MNN), granular computing (GrC), and a grey wolf optimizer (GWO). The optimization of artificial neural network (ANN) using a grey wolf optimizer was already proposed in [33][34][35][36]. These works applied their methods to classification and function-approximation, where optimal initials weights of a neural network are sought using the grey wolf optimizer.  A modular neural network is an improvement of the conventional artificial neural network, where a task is divided into subtasks and an expert module learns some of these subtasks without communication with other modules; this technique allows having systems resistant to failures and works with a large amount of information. Usually this kind of networks has been used for human recognition based on biometric measures, classification problems, and time series prediction [40]. On the other hand, granular computing defines granules as classes or subsets used for complex applications to build computational models where a large amounts of data and information are used [19,41,42]. In this work granular computing is applied to perform granulation of information into subsets that also define number of modules of a modular neural network; the combination of modular neural networks and granular computing was already proposed in [7,37,38], where the advantages of modular granular neural networks over conventional neural networks and modular neural networks were widely demonstrated. In [7], the modular granular neural network architectures were designed using an improvement of a genetic algorithm, a hierarchical genetic algorithm (HGA), where the main differences between them are the control genes in the HGA that allow activating and deactivating genes allowing solving complex problems. That design consisted in optimization of number of modules (subgranules), percentage of data for the training phase, learning algorithm, goal error, and number of hidden layers with their respective number of neurons. In [38], a firefly algorithm was proposed for MGNN optimization using an experts submodules for each division of image. In [37], also modular granular neural network architectures were designed but using a firefly algorithm and without an expert submodule for each division of image. In this work, the design of MGNN architecture is performed and applied to human recognition based on ear, face, and iris, but using a grey wolf optimizer, statistical comparisons are performed to define which of these optimization techniques is better to perform optimization of MGNNs.
This paper is organized as follows. In Section 2, the proposed method is described. The results achieved by the proposed method are explained in Section 3. In Section 4, statistical comparisons of results are presented. Finally, conclusions are given in Section 5.

Proposed Method
The proposed hybrid intelligence method is described in this section; this method uses modular neural networks with a granular approach and their architectures are designed by a grey wolf optimizer.

General Architecture of the Proposed Method.
The proposed method uses modular granular neural networks, this kind of artificial neural network was proposed in [7] and [37], and their optimization were performed using, respectively, a hierarchical genetic algorithm and a firefly algorithm. In this work, the optimization is performed using a grey wolf optimizer and a comparison among HGA, FA, and GWO is performed to know which of these techniques is better for MGNN optimization. As a main task, the optimization techniques have to find the number of subgranules (modules), and as a preprocessing process each image is divided into 3 regions of interest; these regions will be described later. In Figure 1, the granulation process used in this work and Computational Intelligence and Neuroscience proposed in [7] is illustrated, where a database represents a whole granule. This granule can be divided into "m" subgranules (modules), this parameter (m) can have up to a certain limit set depending on the application, each of these subgranules can have different size for example, when this granulation is applied to human recognition, and each granule can have different number of persons that the corresponding submodules will learn. The grey wolf optimizer in this work performs optimization of the granulation and hidden layers and other parameters described later.

Description of the Grey Wolf
Optimizer. This algorithm is based on the hunting behavior of grey wolf and was proposed in [27]. A group of wolves has been between 5 and 12 wolves, and each wolf pack has a dominant hierarchy where the leaders are called alphas, and this type of wolves makes the most important decisions of the pack. The complete social dominant hierarchy is illustrated in Figure 2.
This algorithm is based on 5 points: social hierarchy, encircling prey, hunting, attacking prey, and search for prey. These points are explained as follows.
Social Hierarchy. The best solution is alpha ( ), the second best solution is beta ( ), the third best solution is delta ( ), and the rest of the population are considered as omega ( ), where the omega solutions follow alpha, beta, and delta wolves.
Encircling Prey. During the hunt process grey wolves encircle their prey. Mathematically model encircling behavior can be represented using the equations where → and → are coefficient vectors, → is the prey position vector, → is the position vector of a grey wolf, and is the current iteration. Vectors → and → are calculate by where → 1 and → 2 are random vectors with values in 0 and 1 and → is a vector with components that linearly decreased from 2 to 0 during iterations.
Hunting. It is assumed that alpha, beta, and delta are the best solutions; therefore, they have knowledge about location of prey, as these solutions are saved; the position of the other search agents is updated according to the position of the best search agent. This part is mathematically represented by Attacking Prey (Exploitation). → decreases from 2 to 0 during iterations and → has random numbers in an interval [− , ] so the next position of a search agent will be any position between its current position and the prey.

Search for Prey (Exploration).
There are different components that allow having divergence and a good exploration. The divergence is mathematically modeled using → , this part obliges solutions to diverge and to have a global search; meanwhile → contains values in an interval [0, 2] and provides to the prey random weights to favor exploration and avoid a local optima problem. In Pseudocode 1, the pseudo code of the grey wolf optimizer is shown.

Description of the Grey Wolf Optimizer for MGNN.
The grey wolf optimizer seeks to optimize modular granular neural networks architectures. The optimized parameters are as follows: (1) Number of subgranules (modules).
(2) Percentage of data for the training phase. Each parameter is represented by a dimension in each solution (search agent), and to determine the total number of dimensions for each solution the next equation is used: where is the maximum number of subgranules that the grey wolf optimizer can use and ℎ is maximum of number of hidden layers per module that the optimizer can use to perform the optimization. The variables mentioned above can be established depending of the application or the database, and the values used for this work are mentioned in the next section. In Figure 3, the structure of each search agent is shown.
This optimizer aims to minimize the recognition error and the objective function is given by the equation: where is the total number of subgranules (modules), is 0 if the module provides the correct result and 1 if not, and is total number of data/images used for testing phase in the corresponding module.

Proposed Method Applied to Human Recognition.
One of the most important parameters of the architecture is its learning algorithm, backpropagation algorithms are used in the training phase to perform the learning, and 3 variations of this algorithm can be selected by the proposed optimizer: gradient descent with scaled conjugate gradient (SCG), gradient descent with adaptive learning and momentum (GDX), and gradient descent with adaptive learning (GDA). These 3 algorithms were selected because they have between demonstrated to be the fastest algorithms and with them better performances and results have been obtained [6,7,[37][38][39].
The main comparisons with the proposed method are the optimizations proposed in [7,37,38]. In the first one a hierarchical genetic algorithm is developed, in the second and third work a firefly algorithm is developed to perform the MGNN optimization, and to have a fair comparison the number of individuals/fireflies and number of generations/iterations used in [7,37,38] are the same used by the proposed method in this work; obviously for the GWO these values are number of search agents and iterations. In Table 1, the values of the parameters used for each optimization algorithm are presented.
As it was mentioned, the number of dimensions is established using (4), where values such as ℎ and are established depending on the application. For this work as in [7,37,38], the minimum and maximum values used for the search space of each optimizer are shown in Table 2. The optimization techniques also have two stopping conditions: when the maximum number of iterations/generations is achieved and when the best solution has error value equal to zero. In Figure 4, the diagram of the proposed method is shown.

Data Selection, Databases, and Preprocessing.
The description of the databases, data selection for each phase (training and testing), and the applied preprocessing are presented below.

Data Selection.
To understand the data selection, it is important to mention that the MGNNs as the MNNs and the conventional ANNs have two important phases: (i) First phase: neural network learns information or patterns.
(ii) Second phase: neural network simulates other pieces of information not given for learning.
As it was observed, data selection is an important part of the neural network and for this reason in [7], a new method to select information or images was proposed. In the proposed data selection, depending of a percentage of data (a value between 20% and 80%) for the training phase, this percentage is converted to a number of images (depending of the number of images per person in the database), and randomly images for each phase are selected. In Figure 5, an example is illustrated, when a person has 4 images (as ear database) and 2 of them are for training phase.

Database of Face (ORL).
The ORL database contains 40 persons, and each person has 10 images. This database is from the AT&T Laboratories Cambridge, where each image has a dimension of 92 × 112 pixels, with PGM format. Figure 7 shows a sample of the images of this database [48].

Database of Face (FERET)
. The FERET database [49] contains 11338 images from 994 persons, and each image has a dimension of 512 × 768, pixels, with PGM format. Figure 8 shows a sample of the images of this database.   are 320 × 280 pixels, with JPEG format. Figure 9 shows a sample of the images of this database.

2.3.6.
Preprocessing. The preprocessing applied to these databases is simple because the focus of the proposed method is the optimization of the granulation. For the ear database, the ear image is manually cut, a resizing of the new image to 132 × 91 pixels is performed, and automatically the image is divided into three regions of interest (helix, shell, and lobe); this preprocessing was already performed in [7]. For the FERET database, the Viola-Jones algorithm [51,52] was used to detect the face in each image, and a resizing of 100 × 100 pixels is performed to each image, converted to grayscale, and automatically the image is divided into three regions (front, eyes, and mouth). For iris database the method developed by Masek and Kovesi [53] is used to obtain the coordinates and radius of the iris and pupil to perform a cut in the iris, a resizing of 21 × 21 pixels is performed to each image, and finally, each image is automatically divided into three parts. For the ORL database, each image is automatically 8 Computational Intelligence and Neuroscience  divided into three regions of interest (front, eyes, and mouth). The preprocessing process for these databases is shown in Figure 10.

Experimental Results
The proposed method is applied to human recognition and the results achieved are shown in this section. The main comparisons of the proposed method are against a hierarchical genetic algorithm proposed in [7] and a firefly algorithm proposed in [37,38], where in [7,38] the ear database is used; meanwhile in [37] the iris database is used and architectures of MGNNs are optimized. In [7,38], two optimized tests for the MGNNs were performed, these tests in this work are replicated (30 trials/runs for each test), and to summarize only the 5 best results are shown. In [37], two optimized tests for the MGNNs were performed, the second test in this work is replicated (20 trials/runs), and to summarize also only the Computational Intelligence and Neuroscience   Computational Intelligence and Neuroscience  Table 3, the best 5 results using the proposed method in this work are shown. The behavior of trial #4 is shown in Figure 11, where the best, the average, and the worst results of each iteration are shown. In Figure 12, alpha (first best solution), beta (second best solution), and delta (third best solution) behavior of trial #4 are shown. This trial was one of the fastest trials to obtain an error value equal to zero.
In Figure 13, the recognition errors obtained by the proposed grey wolf optimizer, the HGA proposed in [7], and the FA proposed in [38] are shown.
In all the trials performed by the grey wolf optimizer an error equal to zero is obtained. In Table 4, a comparison of results between the proposed method and the work in [7,38] is shown. An average of convergence of the 30 trials/runs of each optimization technique is shown in Figure 14, where it can be observed that the GWO always found an error equal to zero in the first 5 iterations; meanwhile the HGA and the FA in some runs did not obtain this value.

Test #2 Results for Ear.
In this test, the proposed grey wolf optimizer can use up to 50% of data for the training phase to design the MGNNs architectures. In Table 5, five architectures with the best results are shown.
The behavior of trial #2 is shown in Figure 15, where the best, the average, and the worst results of each iteration are shown. In Figure 16, the alpha (first best solution), beta (second best solution), and delta (third best solution) behaviors of trial #2 are shown. This trial was one of the best trials, where an error of recognition equal to 0.325 is obtained.
In Figure 17, the errors of recognition obtained by the proposed grey wolf optimizer, the HGA proposed in [7] and the FA proposed in [38] for test #2, are shown. It can be visually seen that the results obtained by grey wolf optimizer and firefly algorithm are more stable than the HGA.
In Table 6, a comparison of results between the proposed method and [7,38] is shown. The best result is obtained by the HGA, but the average is slightly improved by the firefly algorithm; meanwhile the worst errors are improved by the proposed method and the firefly algorithm. An average of convergence of the 30 trials/runs of each optimization technique is shown in Figure 18, where the HGA tends in a general behavior to stagnate more than the GWO and the FA.

Face Results (ORL).
The results achieved, using the ORL database, are presented in this section. For this database 2 tests were also performed, but to compare with other works the percentage of data for the training phase is set fixed. Each test is described as follows: (i) Test #1: the percentage of data for the training phase is set to 80%.
(ii) Test #2: the percentage of data for the training phase is set to 50%.

Test #1 Results for Face.
In this test, the proposed grey wolf optimizer uses 80% of data for the training phase to design the MGNNs architectures. In Table 7, five architectures with the best results are shown. The behavior of trial #5 is shown in Figure 19, where the best, the average, and the worst results of each iteration   are shown. In Figure 20, the alpha (first best solution), beta (second best solution), and delta (third best solution) behaviors of trial #5 are shown. This trial was one of the fastest trials to obtain an error value equal to zero. In Figure 21, the recognition rates obtained by [4,38,39] and the proposed grey wolf optimizer are shown. The proposed method and the firefly proposed in [38] allow obtaining a recognition rate of 100%.
Computational Intelligence and Neuroscience In Table 8, a comparison of results is presented. The best result is obtained by the work in [38,39] and the proposed method, but the average and the worst error are improved by the proposed method and the firefly algorithm.

Test #2 Results for Face.
In this test, the proposed grey wolf optimizer uses 50% of data for the training phase to design the MGNNs architectures. In Table 9, the best 5 results using the proposed method in this work are shown.
The behavior of trial #1 is shown in Figure 22, where the best, the average, and the worst results of each iteration are shown. In Figure 23, the alpha (first best solution), beta (second best solution), and delta (third best solution) behaviors of trial #1 are shown. This trial was one of the best trials, where an error of recognition equal to 0.0100 is obtained.
In Figure 24, the recognition rates obtained by [3,38,39,43] and the proposed method are shown.  In Table 10, a comparison of results between the proposed method and the other works is shown. The best and the worst error are improved by the proposed method and the firefly algorithm, but the average of recognition is slightly improved by the proposed method.

Iris Results.
In this test, the proposed grey wolf optimizer uses up to 80% of data for the training phase to design the MGNNs architectures as in [37,44]. In Table 11, five architectures with the best results are shown.
The behavior of trial #2 is shown in Figure 25, where the best, the average, and the worst results of each iteration are shown. In Figure 26, the alpha (first best solution), beta (second best solution), and delta (third best solution)  behaviors of trial #2 are shown. This trial was one of the best trials, where an error of recognition equal to 0 is obtained. In Figure 27, the errors of recognition obtained by [37,44] and the proposed method are presented.
In Table 12, a comparison of results is presented. The best, the average, and the worst errors are improved by the proposed method.
An average of convergence of the 20 trials/runs of each optimization technique is shown in Figure 28, where although these techniques does not tend to stagnate for a long time, the GWO tends to convergence faster with better results.

Summary Results. Summary of results and comparison
with other works using the same databases and neural networks are shown in this section. The testing time of a set of images depends on the number of images and their size, but the training time also depends on the neural network architecture (number of hidden layers, neurons in each hidden layers, and number of modules) and learning factors (initial weights and error goal, among others). An approximation of the training and testing times for each search agent is, respectively, shown in Figures 29 and 30. In Table 13 a summary of each database setup is shown. It can be noticed that the Iris database has more images in each test, but images size is smaller than the other databases; for this reason the training and testing times for this database are the smallest ones. In the case of ear database the number of images is smaller than the other databases but the size of its images is bigger, so the training and testing times tend to increase.
In Table 14, the summary of results obtained using the GWO applied to the ear, face, and iris database is shown.
In [7], modular granular neural networks are proposed and are compared with conventional neural networks using a hierarchical genetic algorithm to design neural networks architectures. In [38], the design of modular granular neural networks architectures is proposed using a firefly algorithm. In [45], the architectures of modular neural networks are 16 Computational Intelligence and Neuroscience  designed using a hierarchical genetic algorithm but without a granular approach; that is, the number of modules and the number of persons learned by each modules always were left fixed. In Table 15, the comparisons among the optimized results obtained using the proposed method and other optimized works are presented, where the average was improved for the ear database by the proposed method (test #1, using 3 images) and the firefly algorithm (test #2, using 2 images).   Azami et al. [43] Ch'Ng et al. [3] Sánchez et al. [38] Sánchez et al. [39] Proposed GWO  In Table 16, the 4-fold cross-validation results for the ear database are shown, where for each training set 3 images for each person were used.
In [43], a neural network is proposed based on a conjugate gradient algorithm (CGA) and a principal component analysis. In [3], the principal components analysis (PCA) and a linear discriminant analysis (LDA) are used. In [38], a firefly algorithm is developed to design modular granular neural networks architectures. In [39], modular neural network with a granular approach is used, but in that work, the granulation is performed using nonoptimized training to assign a complexity level to each person and to form subgranules with persons that have the same complexity level. That method was recommended for databases with a large numbers of people. In [4], a comparison of fuzzy edge detectors based on the image recognition rate as performance index calculated with neural networks is proposed. In Table 17, the comparisons among the optimized results obtained using the proposed method and other optimized works for the face database are presented, where the best, average, and worst values were improved for this database by the proposed method and the firefly algorithm for test #1 (using 8 images) and in test #2 (using 5 images); the average is only improved by the proposed method.    In Table 18, the 5-fold cross-validation results are shown, where for each training set 4 images for each person were used.
In [46] a scale invariant feature transform (SIFT) is proposed. In Table 19, the comparisons among the results obtained using the proposed method and the other work for the FERET database are presented.       In Table 20, the 5-fold cross-validation results are shown, where for each training set 4 images for each person were used.
In [44] and [37], a hierarchical genetic algorithm and a firefly algorithm are, respectively, proposed to optimize modular granular neural networks using iris as biometric measure. The main difference between these works is that in the first and the second one there is no a subdivision of each image as in the proposed method where submodules are experts in parts of the image. In Table 21, the comparison between the optimized results obtained using the proposed method and the other optimized works is presented.   In Table 22, the 5-fold cross-validation results are shown, where for each training set 11 images for each person were used.

Statistical Comparison of Results
The results obtained by the proposed method are visually better than the other works; now statistical -tests are performed in order to verify if there is enough evidence to say that the results of the proposed method are better. In these -tests, the recognition rates and errors previously presented were used. Table 23, the values obtained in the -test between [7] and [38] and the proposed method are shown, where the -values were, respectively, 1.38 and 1.41; this means that there is no sufficient evidence to say that ear results for test #1 were improved with the proposed method.

Statistical Comparison for Test #1. In
In Figure 31, the distribution of the samples is shown, where it can be observed that the samples are very close to each other.
For the ORL database in test #1, the different values obtained in the -test between the proposed method and [4,39] are shown in Table 24. The -values were 4.12 and 2.42; this means that there is sufficient evidence to say that the results were improved using the proposed method. In Figure 32, the distribution of the samples is shown. It can be observed that samples of [39] are very separated from each other.
For the FERET database, the different values obtained in the -test between the proposed method and [46] are shown in Table 25. The -value was 4.24; this means that there is sufficient evidence to say that the results were improved using the proposed method. In Figure 33, the distribution of the samples is shown.
For the iris database, the different values obtained in the -test between the proposed method and [44] and [37] are  shown in Table 26. The -values were, respectively, 3.18 and 5.62; this means that there is sufficient evidence to say that the results were improved using the proposed method.
In Figure 34, the distribution of the samples is shown. It can be observed that samples of [44] are more separated from each other than in [37]. Table 27, the values obtained in the -test between [7] and [38] and the proposed method for ear database are shown, where the -values were, respectively, 2.09 and −5.70; this means that there is sufficient evidence to say that face results were improved with the proposed method only versus [7].  In Figure 35, the distribution of the samples is shown, where it can be observed that the samples for [7] and the proposed method are closer than the proposed method and [38]. The distribution of the proposed method and [38] seems to be uniform.

Statistical Comparison for Test #2. In
The different values obtained in the -test for the face database between the proposed method and [43], [3], [38], and [39] are shown in Table 28. The -values were, respectively, 8.96, 5.90, 0.67, and 1.15; this means that only compared with [3,43] there is sufficient evidence to say that the face results were improved using the proposed method.
In Figure 36, the distribution of the samples is shown, where it can be observed that the samples are very close between the proposed method and [38,39].

Conclusions
In this paper, the design of modular granular neural network architectures using a grey wolf optimizer is proposed. The design of these architectures consists in the number of modules, percentage of data for the training phase, error goal, learning algorithm, number of hidden layers, and their respective number of neurons. As objective function this optimizer seeks to minimize the recognition error applying the proposed method to human recognition, where benchmark databases of ear and face biometric measures were used to prove the effectiveness of the proposed method. Statistical comparisons were performed to know if there is sufficient evidence of improvements using the proposed method, mainly with previous works, where a hierarchical genetic algorithm and a firefly algorithm were developed and also use MGNNs, but more comparisons with other works were also performed. As a conclusion, the proposed method has been shown which improves recognition rates in most of the comparisons, especially when the granular approach is not used. An improvement provided by the grey wolf optimizer over the genetic algorithm and the firefly algorithm lies in the fact that the first one allows having the first three best solutions (alpha, beta, and delta) and their others search agents will update their position based on them; otherwise, the genetic algorithm only has a best solution in each iteration, and the firefly algorithm updates 24 Computational Intelligence and Neuroscience  Figure 36: Sample database (test #2, ORL database). the position of the fireflies by evaluating couples of fireflies, where if one firefly is not better than the other their move will be random. This allows the GWO to have greater stability in its trials and in its results. It is important to mention that the results shown in this work were performed using different databases; this prove that the proposed method was designed to be easily adaptable depending of the number of persons and the number of images independently of the biometric measure used. In future works, the proposed method will seek to reduce the complexity of the MGNNs architectures and to minimize the percentage of information and subgranules to design MGNNs.