Prototype Generation Using Self-Organizing Maps for Informativeness-Based Classifier

The k nearest neighbor is one of the most important and simple procedures for data classification task. The kNN, as it is called, requires only two parameters: the number of k and a similarity measure. However, the algorithm has some weaknesses that make it impossible to be used in real problems. Since the algorithm has no model, an exhaustive comparison of the object in classification analysis and all training dataset is necessary. Another weakness is the optimal choice of k parameter when the object analyzed is in an overlap region. To mitigate theses negative aspects, in this work, a hybrid algorithm is proposed which uses the Self-Organizing Maps (SOM) artificial neural network and a classifier that uses similarity measure based on information. Since SOM has the properties of vector quantization, it is used as a Prototype Generation approach to select a reduced training dataset for the classification approach based on the nearest neighbor rule with informativeness measure, named iNN. The SOMiNN combination was exhaustively experimented and the results show that the proposed approach presents important accuracy in databases where the border region does not have the object classes well defined.


Introduction
The main task of a data classifier is to predict the class of an object that is under analysis. The simplest procedure for data classification tasks is the nearest neighbor ( NN) algorithm. The algorithm strategy for classification comprises three operations: (i) an unlabeled sample is compared to dataset training through a similarity measure; (ii) the labeled objects are sorted in order of similarity to the unlabeled sample; and finally, (iii) the classification occurs giving the unlabeled sample the majority class of the nearest neighbors objects. Because of its simplified algorithm (three basic operations steps), and reduced number of parameters (similarity measure and the number of nearest neighbor), this instancebased learning algorithm is widely used in the data mining community as a benchmarking algorithm [1][2][3][4][5].
Since the NN algorithm has no model, an exhaustive comparison of the unlabeled sample with all the labeled and stored objects in the database is necessary, which increases the computational time of the process. In addition to this weakness of algorithm, the decision boundaries are defined by the instances stored in the training set and, for this, the algorithm has low tolerance to noise; that is, all training dataset objects are considered relevant patterns. Finally, the optimal choice of depends upon the dataset mainly when the object analyzed is in a boundary region, making this parameter to be tuned according to the application [6][7][8][9].
To overcome the drawbacks above, there are in the literature different approaches such as similarity measure alternative to the Euclidean distance to minimize misclassification in boundaries region [10], methods to avoid searching the whole space of training set [11], and dataset summarization to find representative objects of training set [9]. For the dataset summarization approach, there are two main strategies to reduce the dataset volume: one of them based on instance selection and the other based on prototypes. For the 2 Computational Intelligence and Neuroscience approaches based on pattern (or instance) selection, the aim is to find a representative and reduced set of objects from the training dataset, which has the same or higher classification accuracy of a raw dataset [8,[12][13][14][15]. The strategies based on prototype, on the other hand, are defined in two approaches: Prototype Selection (PS) [16] and Prototype Generation (PG) [13,[17][18][19]. The approaches are equivalent; both can be used to identify an optimal subset of representative prototypes, discarding noise, and redundancy. The difference is that PG can also be used to generate and to replace the raw dataset by an artificial dataset. The use of prototypes or reduced training objects that are represented by prototypes minimizes some of NN drawbacks previously mentioned as the exhaustive comparison of all training dataset.
Silva and Del-Moral-Hernandez [5] presented combination methods that use the winning neuron and topological maintain concepts of the Self-Organizing Maps (SOM) neural network to define a reduced subset of objects of the training set that are highly similar to the object that is under analysis for classification [5,20]. This object subset is retrieved and then utilized by the NN to execute the classification task. In other words, the SOM executes a preprocessing for the NN classifier, recovering the similar objects from the winning neuron and from the adjacent neighbors of the SOM map [21].
With respect to drawback in the tuning of parameter , Zhang et al. proposed a computation learning for this parameter [22]. Song et al., on the other hand, proposed a metric based on informativeness to perform the classification process in a boundaries region, where the choice of is more sensible [10]. This algorithm was called NN and the main idea is investigating the nearest objects more informative instead of the closest. This approach outperforms the use of NN with Euclidean distance; however, it further increases the complexity of the comparison, consequently increasing process time [23].
Inspired by use of PG [5,20,21], we introduce a hybrid approach, where in a first step there is the SOM, which has the quantization vector and topological maintenance as important features for using it as a preprocessing in order to present to the classifier algorithm a reduced set of objects, highly similar to the unknown object that is being investigated. Next, the NN algorithm will attribute a class to the unknown object based on the most informative objects of selected set. For the initial exploratory experiments, we observed important results of accuracy and time in classification process [23].
We here formally detail how SOM NN works in hybrid architecture for classification problems. Besides that, here we introduced an experimental methodology to analyze qualitatively the SOM NN classifier in three artificial datasets, experimenting different distribution in the region of class overlapping. In addition, we perform the experiments in 21 databases publicly (7 times more than in the previous study) available in the UCI repository and also sampling way by the 5-fold cross validation method in the complementary website to the paper published by Triguero et al. [9]. The results are analyzed using accuracy, kappa, prototype reduction, and time as performance indices.
The rest of the paper is organized as follows: in Section 2, a brief explanation of Prototype Generation and the taxonomy proposed by [9] are shown; Self-Organizing Maps and the methods to use them in classification with NN are presented in Section 3. In Section 4, the experimental methodology is introduced. Experimental results, discussion, and comparative results are given in Section 5. In the last section, the conclusions are provided.

A Brief Introduction to Prototype Generation.
For a better understanding of the Prototype Generation idea, let us consider an object x of a dataset, defined as a set of descriptive attributes of dimensional and with a class attribute ; that is, x = [ 1 , 2 , . . . , , ]. Then, let us assume that X train is a training dataset with train samples of x . The purpose of Prototype Generation (PG) is to obtain a reduced set, X red , with red instances selected or generated from X train , but with red ≪ train . The cardinality of this reduced set must be sufficiently small to decrease the evaluation time taken by a classifier ( NN, for example), maintaining the classification accuracy. In fact, data reduction approaches aim mainly to summarize the raw dataset, without damaging the analytical properties, which implies performance accuracy.
For the PG methods, prototypes are used by classifiers instead of raw datasets, or they are used to generate an artificial dataset. Data generation can be interesting in some cases to eliminate data noise or to solve dataset with unbalanced class. Since the possibilities of usage are diversified, the literature presents different methods, approaches, and algorithms. This was the reason for Triguero et al. [9] to propose a PG taxonomy that is used to enhance NN drawbacks, which was defined as a hierarchical way of three levels (generation mechanisms, resulting generation set, and type of reduction), and also review the all algorithms of the PG from the literature (see [9] for a detailed explanation).
In the next section, we introduce a brief of Self-Organizing Maps and the approach is proposed, the combination of SOM and NN.

A Brief Summary for the Kohonen Self-Organizing Maps.
Kohonen Self-Organizing Map (SOM) is a type of neural network that consists of neurons located on a regular lowdimensional grid, usually two-dimensional (2D). Typically, the lattice of the 2D grid is either hexagonal or rectangular [24]. The SOM learning or training process is an iterative algorithm which aims to represent a distribution of the input pattern objects in that regular grid of neurons. The similar input patterns are associated in the same neurons or in the adjacent neurons of the grid.
For the SOM training, a dataset is chosen and divided into two distinct sets. The training set is used to train the SOM which is here called X train . The other set is used to test the trained SOM (X test ). After this dataset division, we start the training SOM. Formally, an object is randomly selected from X train during a training, defined as x = [ 1 , 2 , . . . , ], where the element is an attribute or feature of the object, which belongs to . The object is similar to what was before defined, but without the class information. Additionally, each neuron of the SOM grid has a weight vector w = [ 1 , 2 , . . . , ] , where = 1, 2, . . . , ; here is the total number of neurons of the map.
During the learning process, the input pattern is randomly selected from the training set and it is compared with the weights vector of the map, initially initialized randomly. The comparison between x and w is usually made through Euclidean distance. The shortest distance indicates the closest neuron , which will have its weight vector w , updated to get close to the selected input pattern x . Formally, neuron is defined as follows: The closest weights vector w and their neighbors are updated using the Kohonen algorithm [24]. However, the topological neighborhood is defined so that the farther away the neuron from w , the lower the intensity for the neighborhood to be updated. The intensity of the neighborhood function is defined in relation to the training time. In other words, in initial times, the level has high value and, according to the next iterations, it is reduced at each iteration. See Kohonen [24] for a complete explanation of the training rule of the SOM map.

Building a Prototype Generation Based on SOM.
Since the training phase has been completed, each input pattern object from the training set has to be grouped to the closest neuron. The idea in this approach of using SOM as a PG technique is that the index of each instance x is a part of the nearest neuron list. Thus, the list of each neuron is here called the Best Matching Unit List (BMUL), formally defined as where is assigned to the number of the map neuron and BMUL is a list with the indexes of input patterns objects associated with the nearest neuron. The relationship between the instance of training set x and the list of the best match unit BMUL is of many-to-one. That is, some units , which we could call microclusters, must be associated with one or more instances and other units may have no associations; that is, the list can be empty {0}.
The classification method proposed herein explores two important characteristics of the SOM: vector quantization and topological ordering [24]. For better understanding these features, consider the representation of Figure 1 with input patterns objects (filled circles) used for training a SOM map and the weight vectors of each neuron (squares) after the training phase. In this figure, each weight vector represents a microcluster of input patterns, which is a quantization characteristic. The relationship between the weight vectors can be interpreted as a boundary, which can be understood as a Voronoi region, as exemplified by the shaded area in Figure 1. In operational aspects of use, this can be considered in a classification process in which the strategy, introduced and explored herein, means to establish a two-step process. In the first step, when a test sample x (see Figure 1, the unfilled u x t Figure 1: The border between the weight vectors (squares) can be interpreted as a Voronoi region (shaded area). Thus, the input patterns object (filled circles) belongs to a Voronoi region.
circle) is compared to the weight vectors of the trained SOM map (the squares of Figure 1), the algorithm defines the closest unit according to the following equation: Hence, as is the nearest unit, we know the list with input patterns indices that should be queried, that is, BMUL . Illustratively, consider that weight vector belongs to a Voronoi region; see Figure 1, the shaded area, which has a BMUL list with the indices of input patterns known (filled circle). Also in this figure, the unlabeled sample x (unfilled circle) belongs to the region covered by unit (shaded area); that is, in the second step of the classification process, the NN algorithm is performed with a reduced set of objects.
However, note that the input patterns object stored in the dataset (filled circles), which are the closest to the object being classified x (unfilled circle), belong to neighboring Voronoi regions and are consequently represented in other lists; see Figure 1, circle with a dotted line.
For that reason, in a classification task with NN or ( NN as will be introduced in the next section) combined with SOM, the use of the objects represented only as BMUL list results in a substantially reduced classification process time but can reduce the accuracy rate. Thus, we explored the second important feature of SOM, the topological ordering of the training dataset objects. In other words, in addition to the BMUL list, the lists of adjacent neurons in the SOM map grid are also consulted.
The visit of adjacent units depends on the grid initially set at the SOM training phase. For the SOM trained with rectangular lattice topology, the units of the four adjacent units should be considered. Thus, the query list BMUL query for the unknown pattern x is defined as Otherwise, for a hexagonal lattice topology, we have to consider six adjacent units and so on. In previous studies using SOM with NN [5, 20, 21], we compared the two Input: The weight vectors of SOM Map trained (W); training objects dataset (X train ); and an unknown sample (x ) Output: the label of unknown sample ( ) Compare the unknown sample x with each weight vector w of W using Eq. (3) The units to be visited are defined by Eq. (4) and the input patterns objects are retrieved by Eq. (5), recovering a reduced dataset training BMUL query . The reduced dataset and the unknown sample are used by a classifier ( NN or NN) that return the object class.
Algorithm 1: Prototype Generation based on SOM briefly described in a pseudo-code.
neighborhood topologies (rectangular and hexagonal) and the results were equivalent. For this reason, the rectangular lattice topology was chosen in this work. Finally, in the second step of the classification method proposed here, the reduced objects set belonging to BMUL query (4) is used to find the nearest neighbors ( NN). Note that the set of objects extracted from the query lists, that is, X BMUL query , is part of the set of input patterns objects used for the SOM training; that is, and X BMUL query ⊂ X train . Formally, we have Thus, the class of the nearest (or informative instances as will be explained in the next section) is used to label the unknown sample x . This framework combination was initially called SOM NN (and here will be introduced the SOM NN classifier).
In summary, the conventional algorithm NN (or NN) compares the unknown sample with all the instances of the dataset; here, the comparison is limited to a selection of the objects; that is, the comparison is restricted to a small number of instances from the training dataset. The main implementation steps are described as a pseudo-code in Algorithm 1.
As verified in this section, we formalized a strategy to select input pattern objects to be used as references in a classification task and to speed the time of NN algorithm. The next section introduces the NN algorithm which is less sensible to parameter and for this works better than NN in datasets with overlapped classes (boundary not well defined).

Informative Nearest Neighbors.
Some data classification approaches based on nearest neighbor, in addition to defining a given range of values to find the nearest neighbors, also utilize new distance metrics, such as the informative nearest neighbor [10]. In other words, they utilize in the analysis of a new object of unknown class a measure that quantifies which training set object is most informative.
In order to find the informative nearest neighbor, the NN algorithm, as it is called in the proposal by Song et al. [10], calculates the informativity through the following equation: where is the value of the informativity between the neighbor x and the object under analysis of unknown class (x ), to the extent that (x | x ) is the probability of the object x being the informative nearest neighbor. This probability is defined by the following equation: The first term in (7) Pr(x | x ) is defined as the probability that the object x is close to the object x and is defined as x / , where x is the number of objects that have the same class as x . The second part in (7) indicates the probability that the object x is distant from the other objects of the training dataset x . The indicator ⨿[⋅] will be 1 if the class attributes of the objects x and x are different; in other words, ¬ . Therefore, it can be understood as a penalty factor.
The probability Pr(x | x ) in (7) can be defined as a function of distance between the objects; in other words, To understand the NN algorithm in practical terms, consider the dataset utilized in Figure 2(a), where it is represented by shaded circles, to the extent that the shades (dark and light) represent the two classes of the set. Now consider Figure 2(b), which has the same training objects with the addition of an object without class represented by a circle without shade. Now, consider in Figure 2(c) the contours in training objects and test object, representing the classification process executed utilizing the traditional NN, with Euclidean distance and value being equal to 5. In this process, the majority class of the nearest neighbors is the one that is represented by dark shading. And, therefore, the decision-making process is made by this class. However, the object under analysis has as its nearest neighbor an object of the training set that belongs to the class with light shading and this, on the other hand, also has as its neighbor another object of the same class. Therefore, utilizing the NN algorithm, the informativity takes into consideration not only the majority class but also the nearest objects and the concordance that the other objects of the training set have with the nearest object. In conclusion, in the case of the NN, the classification would be made by the class represented by the lightest shading Thus, the concept of informative nearest neighbor has the following definitions. Within the nearest neighbors, the object that is nearest to the object that is being classified, which is distant from other objects of different classes, is considered the most informative object, such that its class is attributed to the unknown object. On the other hand, the object that has a different class from the most informative object is considered least informative. An object is also considered least informative within the nearest neighbors when it has the same class as the most informative object and is nearest to other objects of different classes.
The informativity calculation has a high computational cost because, in addition to comparing the object under analysis with the objects of the training set (first part in (7)), the algorithm still requires a comparison between the training set objects (second part in (7)). In order to reduce the computational effort, Song et al. [10] suggest having the execution of the NN algorithm before executing the NN to define a reduced dataset with most similar objects, according to the Euclidean metric. However, the NN algorithm has the disadvantages presented in the Introduction (the need to store the training set, noise sensibility, etc.) and its use before the NN can affect the performance in the classification of objects that are in a border region, as illustrated in Figure 2(c).
The following section presents a proposal that combines the SOM with the NN algorithm to build a process that will be named SOM NN. This section will also show the advantages of the SOM NN over the NN.

Methodology for Combining SOM and NN: A Hybrid Classification SOM NN
The approach utilized by the SOM NN classifier explores the concept of quantization, topology maintenance, and informativity. As already mentioned, an informative object allows the correct prediction of an unknown object, even in boundary not well defined. When talking about information, we cannot have information quality without first significantly measuring this. Information quality is one of the determining keys for the quality of the decisions and actions that are made according to it [25]. It is exactly what the SOM NN classifier proposes to do; in other words, before predicting the class of the unknown object, it measures the information of the training set objects before making the classification decision. In order to understand the SOM NN combination, consider a SOM trained with the objects from Figure 2(a) without using the class information (shaded color). The prototypes adjusted resulting in trained SOM map (weight vectors) are represented in Figure 3(a). The result of the SOM can be generally understood as being a summary of the training set, through a set of prototypes that have a Voronoi region, with the number of prototypes being smaller than that of the training set, in the following example: the twelve objects were summarized into four prototypes. The number of prototypes is a parameter that refers to the number of neurons of the SOM map. Now consider the new object classification submitted to process that was presented in Figure 2(b). Also consider the prototype set being utilized in a first comparison, instead of the training set. In this case, for the classification process, in initial phase a comparison will be done between the object under analysis and the set of prototypes, as illustrated in Figure 3(a). Repeating the process that takes place in training the SOM for the selection of the winning neuron, made using the Euclidean distance, the nearest neuron is selected (winner or best match) to the object under analysis. From this process where the nearest prototype is known and that, on the other 6 Computational Intelligence and Neuroscience hand, it is possible to know which training set objects are represented by the prototype, see Figure 3(b) where each prototypes has a Voronoi region. Thus, the reduced training set objects are retrieved to start the classification phase with the NN. Finally, the classification will be done with a reduced set, as shown in Figure 3(c).
In summary, the last step in Algorithm 1, the use of classifier algorithm, is executed with NN. This process is called here as SOM NN classifier.
Since the process will depend on the selection of the number of neurons of the SOM map, we will utilize the empirical proposal of Vesanto et al. [26] that defines the number of neurons as being the root of the number of objects of the training set. What happens is that, after training the SOM map, some prototypes can be empty; in other words, the prototype represents no object of the training set. In order to prevent this situation from happening, the proposal of Silva and Del-Moral-Hernandez [5] will be utilized. Thus, besides retrieving the objects of the winning prototype, it will also consider the retrieval of the adjacent prototypes.
The combination using the SOM neural networks approach with the NN explores the main characteristics that define the potential of a data classifier, which are storage reduction, noise tolerance, generalization accuracy, and time requirements [9]. To the contrary of the NN that preprocesses the data utilizing the NN algorithm that has a high computational cost, the algorithm proposed in this work reduces the data representation through the SOM. In addition, with the use of the SOM, the classification time of the SOM NN is expected to be shorter when compared with the NN, which results in less memory use, maximizing the classifier's performance in terms of classification time.
The next section highlights all steps that were done to make the experiments with the SOM NN classifier.

Experimental Methodology, Results, and Analysis
This section will present the dataset utilized in the experiments and the parameterization of the classifiers utilized for the comparison with our SOM NN proposal. The experiments consist in using an artificial dataset for qualitative and quantitative analysis and with public dataset used as benchmarking in the literature to evaluate the efficacy of the algorithm proposed.

Datasets.
In order to provide a qualitative analysis with visualization of the border decision-making area and a quantitative analysis in terms of classifier accuracy, three datasets were generated with the following features: 300 objects, two attributes, two classes, and a balanced number of objects per class. For all datasets, the objects were distributed with the same mean but with difference in the standard deviation value, in order to force an overlapping of classes. Thus, each dataset represents distinct situations on the border of classes: no, low, and high confusion. In order to evaluate the efficacy of the algorithm proposed and compare it with others from the literature, 21 public databases were chosen (Repository of the University of California, Irvine, UCI) that are used as benchmarking for Prototype Generation approaches. Table 1 summarizes the properties of each benchmarking dataset in number of objects (Obj), number of attributes (Att), and number of classes (Cla). For all databases, the attributes are numerical. The separation of these datasets in training and test set were done with the use of the 5-fold cross validation.

Parameterization of Algorithms.
The SOM NN approach will be compared with NN, NN, and SOM NN. The classifiers parameterizations are represented in Table 2. The SOM parametrization is the same for SOM NN and SOM NN.
The experiments were implemented using the R language, version 3.1.2, with RStudio IDE version 0.98 and using a conventional computer with Windows 10, i7 with 8 GB RAM. The experimental results are presented in the following section.

Qualitative and Quantitative Analysis Using the Artificial
Dataset. The objective of experiments using artificial dataset was to compare the performance of NN, NN, SOM NN, and SOM NN classifiers in situations where there are wellseparated classes (Figure 4(a)), classes partially overlapped (Figure 4(b)), and a large number of classes overlapped (Figure 4(c)).
Computational Intelligence and Neuroscience  a u s t r a l i a n 6 9 0 1 4 1 6 4 b a l a n c e 6 2 5  360  90  12  13  newthyroid  215  5  8  14  pima  768  8  18  15  sonar  208  60  6  16  spectfheart  267  44  9  17  tae  151  5  3  18  vehicle  846  18  19  19  vowel  990  13  21  20  wine  178  13  5  21  wisconsin  699  9  17 Analyzing qualitatively, starting by Figures 4(a)(A), 4(b)(A), and 4(c)(A), NN results, we can note that the boundary separation degrades from the moment that the classes start the overlapping. In the worst case, we can observe that a high overlap (Figure 4(c)) is clearly one of the NN disadvantages, because it makes the decision boundary considering all objects as having the same importance. For the NN results (Figures 4(a)(B), 4(b)(B), and 4(c)(B)), it is clear that the border of separation is softer, even when the class overlap increases. This is because the separation was defined by informative representation of the objects from the same class. This fact is most evident in the last experiment (Figure 4(c)), where we can observe that the boundary separation is created to preserve the predominant class in the border region. Figures 4(a)(C and D), 4(b)(C and D), and 4(c)(C and D) are the results using SOM as the Generation Prototype approach. That is, the decision boundary was generated without using all objects of the database but, instead, based on objects distributed in prototypes of the trained SOM map. In this qualitative analysis, the most important to note is that the preservation of the decision boundary was maintained in all experiments, without significant changes.
Finally, we analyzed quantitatively the experiments with artificial data, with an average classification accuracy defined by a dataset with 10 6 objects (which were used to generate the decision boundary of Figure 4). The results are shown in  The conclusion for the experiments using artificial dataset is that the use of NN is more effective than NN when the separation class has high confusion and that for this, the performance accuracy has not been abruptly reduced. We also note in this qualitative analysis that the use of SOM as Prototype Generation method does not significantly degrade the accuracy performance. The use of artificial datasets can make qualitative and quantitative analyses between the classifiers. The next experiment has the objective of expanding the previous study [23] through analysis with other performance measures, such as kappa, impact of dataset reduction on the accuracy, and performance of classification time. For these new experiments, 12 new public datasets were used that are benchmarking in Prototype Generation approach [9].

Experiments and Analysis Using the Benchmarking
Dataset. This section shows the experiments and results for datasets introduced in Table 1. The results are analyzed using the following measures as performances: accuracy, kappa, hypothesis test, rate of dataset reduction, and classification time. Table 3 shows all the classification results for the paper experiments. In this table, the accuracy and kappa measures are shown in terms of average and standard deviation. The other results are also discussed in this section.
In practical aspects, the accuracy and kappa measures are equivalent in terms of performance. For purposes of simplification, only the accuracy will be considered in the extended discussion of the result analysis.
The accuracy is analyzed by comparing the results of the classifiers in pairs. The average and the result deviation of each dataset are compared using the -test with 95% of confidence interval. The comparison result is shown in Table 4. In this table, the datasets indices ("#," see Table 1) are separated according to the classifier results: higher results ( > ), equal results ( = ), and lower results ( < ), with and being a representation of classifiers compared in pairs. From this result, the same comparison structure will be used again for the count of the incidences percentage. The general counting is shown in Table 5. This table has an additional column to represent the sum of the percentages of equal and lower results, in order to show when a classifier performance is really better than the other. This was the reason to compare ( > ) with ( = ) and ( < ) in Table 5.
Analyzing the results from Table 5 is possible to note in the first two lines of table that NN showed to be better than NN in most of cases (66.7% and 52.4%, combined with SOM). The use of SOM in the classification process (last two lines of the table) has been shown to be slightly better or worse in some cases. The SOM performance with NN is improved (52.4%) and with NN there is a little degradation (47.7%). However, an important result that should be emphasized (last row of the table) is that the use of SOM with NN maintains or improves the performance in most databases (52.4%).
As a final analysis of the accuracy performance, in order to show that the degradation with the use of SOM has little impact on the final performance, the comparison of the same pair of classifier presented above is performed using a radar chart ( Figure 6). In this graph, the external values (polar scale) indicate the dataset number of 1 to 21, and the internal values show the accuracy performance, starting from 0.6 to 1.0. The ideal result would be to have the graph contour in 1.0. In this study, the main results are obtained for the overlapped lines, representing an equivalent result for contrasted classifiers. Combining the results of the accuracy performance (statistical and chart), we can consider that NN has, in the vast majority of studies, a superiority in the classification performance when compared to NN. From this result, it is interesting to note that the NN superiority occurs mainly in datasets with performance below 90% as follows: 7,9,11,12,14,15,16,18,19, and 20. On the other hand, the result is lower in experiments with datasets 2 and 21, that is, where the accuracy performance is close to 1 (100%). Therefore, the results also suggest that NN has superior results in datasets in which the decision boundary is not well separated.
The next analysis consists in verifying the SOM efficiency in reducing input objects. For this, the reduction and accuracy percentage of each dataset performance is checked. The results are shown in Figure 7. Interestingly, in both results, SOM NN and SOM NN, there are three regions very well defined in the accuracy reduction experiment. The first datasets have an average of 150.5 objects, with the second averaging 215 and the last averaging 694.5 objects. That is, the reduction varies with the number of objects. Therefore, the results of SOM NN (Figure 7(a)) and SOM NN (Figure 7(b)) show that the more objects in dataset, the higher the reduction rate.
The next results to be analyzed are the time consumed in the classification process. The results are shown in Figure 8, and, for interpretation purposes, the databases are arranged in the vertical axis and are organized in ascending order of number of objects. In vertical axis, each dataset is described by name, number of objects, and number of attributes (described in Table 3). The time shown on the horizontal axis is measured in seconds.
By analyzing in detail the result of the time classification algorithms NN and NN in Figure 8(a), it is observed that, to a certain number of objects, around 180 (datasets appendicitis to wine), the classification time is almost linear. From this point the tendency curve is not clear. The reason is that there are an increasing number of objects in these other databases and also a variation in the number of attributes. This means that the classification time depends not only on the number of objects but also on the number of attributes, for example, the balance database (625 objects) and dermatology (366 objects), whose last dataset has a smaller number of objects and consumes more time. Another interesting case to mention is observed between the base mov libras and vowel. The former has almost half the number of objects and nearly ninefold more attributes than the latter but both consumed an equivalent time in the classification process. Another point to consider in the graph is that, for every experiment, the classification time of NN is higher than NN. This result was expected because, as mentioned earlier, NN is computationally more costly due to the fact that NN is run before it as a preprocessing step and, thus, it finds the closest informative object. Although it seems to be an obvious result, the experiments confirm their reliability. Finally, for a general idea of the time, a tendency line was added to the results and the best adjustment was an exponential trend, with Pearson coefficient above 0.7, which is considered a high value. As it is difficult to find a relationship between the numbers of objects and attributes to explain the process timing, the trend is more indicative about the number of objects. Thus, for this experiment, the classification time is more sensitive to the number of objects.
The same time experiment discussed above was repeated for SOM NN and SOM NN (Figure 8(b)). The behavior of the results in this experiment is similar to that discussed for Figure 8(a). This can be interpreted in two ways. The first is that the above analysis can be applied for these results and, more importantly, that the objects selected by the reduced more advantageous in database with more than 180 objects (from sonar dataset). This result can be observed at the upper end, where the consumption of classification time is high (vowel dataset), and the use of SOM can reduce by more than 3 times the NN and NN classification time.

Contrasting the Results of This Work with That in the Literature.
For an idea about the importance of the results herein mainly using the NN and the combination SOM NN approach, the performance indexes obtained here was compared with the literature result [9]. The approach chosen for the comparative experiments, Chen algorithm, belongs to the same Prototype Generation category of SOM. The algorithm named Chen [9,27] was executed using the datasets of Table 3 and the compiled results for this algorithm are shown in terms of average and standard deviation of accuracy, time, and reduction. The comparative results are summarized in Table 6. Note from the comparative results of Table 6 that NN is the algorithm that has the best performance accuracy. This is an important result because it is the algorithm introduced here as an alternative to NN. In terms of time, the lowest result was obtained by the SOM NN; therefore, it involves the SOM as the approach of Prototype Generation method introduced in this work and it is expected that NN is more time consuming than the NN, as discussed in Section 2. Finally, the Chen algorithm has the bigger reduction, which is to be expected too, since according to Triguero et al. [9], the prototypes parameter has to be configured as being 90% of the number of objects of the dataset.

Conclusion
This paper introduces a new classifier named SOM NN, which is based on the combination of Self-Organizing Maps (SOM) and informative nearest neighbors ( NN). The NN classifier is costly in computational terms, because in a classification process the informativity is not calculated only by the object under classification analysis, but also considering the other objects of the training set. Song et al. [10] suggested the use of NN algorithm (with the best value experimentally found as being 7) before NN to minimize the high computational cost, that is, using 7-NN to find a reduced subset for the classification process with the informative nearest neighbor algorithm.
In order to contribute to the Song et al. [10], in this paper, the NN has been substituted by SOM because of quantization vector and maintenance topological of raw dataset. In other words, a SOM map is trained with the dataset and, after this, the objects of this set are associated with the nearest (or winning) neurons. And, thus, each neuron of the map or prototype represents an object subset. Now, in a classification process, the object is compared with the map prototypes, where the winner is elected. The objects mapped in this winning neuron and adjacent neurons are retrieved and presented to then have the execution of NN.
Thus, due to the preprocessing made by the SOM to the NN algorithm, the computational effort as a whole to find the informative nearest neighbor is much smaller, which results in a significant reduction in the classification time when compared to the classification time of the NN without preprocessing.
Therefore, the primary objective of the classifier addressed in this paper was the maintenance of the accuracy of the NN and the reduction of the classification time in a classification process, thus concluding that the use of the objects represented by the winning neuron and adjacent neurons was effective in the analytical aspects by not degrading the performance of NN. The results presented in  Section 4 indicate this reduction of the time and, in addition, that the classification rates of the SOM NN are statistically similar when compared to the NN, that is, time reduction and accuracy preservation. Another important conclusion in analysis of the classification experiments, mainly using artificial dataset, and also in benchmarking dataset where the accuracy performance was worst, the NN approach presents more significant accuracy results when the objects of different classes are not well separated, with high mixture in the border region.
As a final conclusion, the NN is an algorithm with accuracy performance better than NN. But the classification time is a bottleneck for the algorithm, which is minimized using SOM as a Prototype Generation technique. Thus, the 14 Computational Intelligence and Neuroscience   SOM NN classifier is proposed here which is specialized to solve problems where the border region is not well defined in a tolerable time.

Conflicts of Interest
The authors declare that there are no conflicts of interest with regard to the publication of this paper.