Visualizing Clusters in Artificial Neural Networks Using Morse Theory

This paper develops a process whereby a high-dimensional clustering problem is solved using a neural network and a low-dimensional cluster diagram of the results is produced using the Mapper method from topological data analysis. The low-dimensional cluster diagram makes the neural network’s solution to the high-dimensional clustering problem easy to visualize, interpret, and understand. As a case study, a clustering problem from a diabetes study is solved using a neural network. The clusters in this neural network are visualized using the Mapper method during several stages of the iterative process used to construct the neural network. The neural network and Mapper clustering diagram results for the diabetes study are validated by comparison to principal component analysis.


Introduction
Topological data analysis (TDA) is an emerging field of mathematics that focuses on constructing topological models for data and calculating algebraic invariants of such models [1][2][3].The fundamental idea is to use methods from topology to determine shapes or patterns in high-dimensional data sets [4].One method from TDA called Mapper constructs a lowdimensional topological model for a data set  ⊂ R  from the clusters in the level sets of a function ℎ :  → R  on the data set [5].This topological model for  is a cluster diagram that shows the clusters in the level sets of ℎ (i.e., clusters in the layers of a stratification of ) and how clusters in adjacent, overlapping level sets are connected (i.e., how the neighboring layers are glued together).The topological model built in this way is analogous to how Morse theory is used to construct a cell decomposition of a manifold using sublevel sets of a Morse function on the manifold [5][6][7].The resolution of the cluster diagram produced by Mapper can be adjusted by changing the level sets by varying the number, size, and shape of the regions used to cover the image of the function ℎ.Further, the Mapper method allows for different clustering algorithms to be used.The most important step for obtaining a useful topological model from Mapper is finding a function ℎ :  → R  that solves a particular clustering problem of interest for a data set .This study examines the case when the function ℎ is a neural network.
A feedforward, multilayer perceptron artificial neural network (hereafter called a neural network) is a function  : R  → R  constructed by an iterative process in order to approximate a training function  :  →  between two finite sets of points  and  called the inputs and target outputs, where  ⊆  ⊂ R  and  ⊂ R  .In a context where a target output value represents the classification of an input point, the neural network  is a solution to a classification or clustering problem because  has been trained to learn the rule of association of inputs with target outputs given by .In this manner, many clustering problems for high dimensional data sets  ⊂ R  have been solved by finding collections of points in the domain of  that have similar output values, which is to say that the level sets of a neural network are solutions to a clustering problem [8][9][10][11][12].Although neural networks are adept at solving clustering problems, it is hard to visualize these clusters when the neural network's domain has dimension  > 3. To address this limitation, Mapper will be used to construct a low-dimensional, visualizable topological model that shows the clusters in the level sets of  as well as how clusters in neighboring level sets are connected.More generally, using Mapper to make a cluster diagram of the level sets of a neural network will provide a low-dimensional picture of the solution to a clustering problem that makes interpreting the neural network results much easier.
The research presented in this paper uses the Miller-Reaven diabetes study data [13,14] as a case study for the method of using a neural network to solve a clustering problem and Mapper to visualize and interpret the results.A neural network is constructed that classifies patients of a diabetes study as overt diabetic, chemical diabetic, or not diabetic based on the results of five medical tests.The neural network is trained using the five medical tests as inputs and the diagnosis of diabetes type as the target output.At several intermediate stages of the weight update process during the construction of this neural network, the Mapper method is used to create a topological model of the level sets of the neural network at that stage of its formation.The results are compared to principal component analysis (PCA) as a means to validate the method.The general method presented in this paper for solving and visualizing clustering problems combines the efficacy of neural networks, which are nonlinear functions that have a proven track record for solving a wide variety of clustering problems whenever a training function is available [15], with the clarity and simplicity of the cluster diagrams produced by the Mapper method to make the neural network's solution to the clustering problem readily comprehensible.
The Mapper method has been used previously in the context of unsupervised learning by using functions such as density and eccentricity estimates to study diabetes data, breast cancer data, and RNA hairpin folding [4,5,16,17].Since neural networks employ supervised learning, using neural networks together with Mapper may provide more accurate and precise results than what could be attained by unsupervised learning on the same data.Other techniques for visualization of high-dimensional data sets such as projection pursuit, Isomap, locally linear embedding, and multidimensional scaling are discussed in relation to Mapper in [5].Methods for visualizing the clusters in a neural network have been constructed by a variety of other dimension reduction techniques.Such techniques include linear and nonlinear projection methods [18], principal component analysis [19], Sammon's mapping [20], multidimensional scaling and nonlinear mapping networks [21], and fuzzy clustering [22].These dimension reduction techniques produce useful twoand three-dimensional models of the data set and have varying degrees of success in solving specific real-world problems.Some of these constructions can be quite sensitive to the distance metric chosen, outliers in the data, or other factors.
This paper is organized as follows.In Section 2, background information on neural networks is given, followed by a description of the Mapper method from topological data analysis.Section 3 describes the Miller-Reaven diabetes study, principal component analysis, and the configuration of the neural network and Mapper algorithm used to analyze the diabetes data.Section 4 demonstrates the results of applying PCA and a neural network to the diabetes data and compares the PCA results to the cluster diagram for the neural network produced using the Mapper method.Section 5 summarizes the main results of the case study, the general method of using neural networks to solve clustering problems, and the Mapper method to visualize the resulting clustering diagrams.

Background
This section provides a brief overview of neural networks and the Mapper method from topological data analysis.

Brief Description of Neural Networks.
A neural network is function  : R  → R  constructed via an iterative process in order to approximate a training function  :  →  between two finite sets of points  ⊆  ⊂ R  and  ⊂ R  called the inputs  (which is a subset of a data set ) and target outputs .Neural networks are universal approximators in the sense that for every training function , there exists a globally defined neural network  that approximates  to any desired degree of accuracy [23,24].Even though it is possible to find a neural network  that approximates  to any predetermined degree of accuracy, in practice such a neural network  could have a very large network architecture and be impractical.Thus, it is often desirable to find a moderately sized network architecture for  that approximates  to an acceptable degree of accuracy.This study will examine a neural network with one hidden layer of ℎ 1 nodes.Such a neural network  : R  → R  has the form where  ∈ R  ,  1 is a ℎ 1 ×  weight matrix,  2 is a  × ℎ 1 weight matrix,  1 is a ℎ 1 × 1 bias vector,  2 is a  × 1 bias vector, and   : R → R denotes an activation function.
For classification problems with multiple classes of data, it is common to choose  1 () = tanh() = (  −  − )/(  +  − ) and  2 () =  as activation functions.An activation function is evaluated on a vector by applying the function to each entry of the vector.The iterative process for constructing a neural network  : R  → R  from a training function  :  →  begins by initializing the weights   and biases   with random values.Points  ∈  are sequentially presented to the neural network, and the weights   and the biases   are adjusted to minimize the error between () and ().When a generalizable neural network is desired, only a subset of the points in  are used for adjusting its weights and biases, while the remaining points in  are used for crossvalidation and/or testing to ensure that the neural network does not overlearn its training data.The weights and biases are adjusted by this iterative process until a tolerable level of error is reached for all points in , or all points in the crossvalidation set, or until a predetermined number of iterations is reached.The weights and biases can be adjusted by a variety of methods, including backpropagation via gradient descent, the conjugate gradient method, or the Levenberg-Marquardt method.When the conjugate gradient method or the Levenberg-Marquardt method is used, they generally construct a neural network  in very few iterations, but each iteration is more mathematically intensive and therefore more time intensive.In contrast, the backpropagation via gradient descent method generally requires many more iterations, but each iteration is very fast.Details of how the weight update process is used to construct a neural network from a training function can be found in the neural networks literature [10][11][12].
After a neural network has been constructed and has reached a tolerable level of error, its level sets can be used to solve a clustering or classification problem.In particular, for any connected region  ⊂ R  , the level set  −1 () can be thought of as a set of points in the domain of  that all map to points in the same region .This means that these points in the domain have a classification values close to each other because they all lie in .Thus, a level set  −1 () can be viewed as a cluster (or clusters) of points that solve a classification problem.

Mapper.
Given a function ℎ :  → R  on a finite data set  ⊂ R  , the Mapper method from topological data analysis uses the level sets of ℎ to construct a topological model that shows the clusters in the level sets  and how the clusters in adjacent, overlapping level sets intersect.The topological model is a simplicial complex, which is a topological space formed by gluing together vertices, edges, filled triangular faces, solid tetrahedra, and higher dimensional analogues of these convex polytopes according to a few rules about how the gluing is allowed to be done [25].The Mapper method abstracts ideas from Morse theory, in which a smooth realvalued function ℎ :  → R on a manifold  is used to construct a cell decomposition of the manifold.
The Mapper method for a finite data set  ⊂ R  and a real-valued function on that data set produces a onedimensional topological model (i.e., a graph) for  as follows.(3) Cover the image of ℎ by ℓ overlapping intervals where  1 = ,  ℓ = , and  +1 <   for all 1 ≤  < ℓ.
(4) Form the level sets (5) Apply the clustering algorithm to each level set.Let  , be the th cluster in the th level set   .
(6) Construct a graph with one vertex V , for each cluster  , .
(7) Construct an edge connecting vertices in V , and V +1, , for all 1 ≤  < ℓ and all  and all , whenever  , ∩  +1, ̸ = 0.That is, an edge is constructed whenever a pair of clusters  , and  +1, from adjacent level sets   and  +1 have nonempty intersection.
The resolution of the model changes from coarse to fine as the number of level sets ℓ increases.The amount of overlap between intervals [  ,   ] and [ +1 ,  +1 ] determines whether the level sets   and  +1 will have nonempty intersection, which in turn determines the number of edges in the graph.When the intervals [  ,   ] all have the same length  and the intersection of every pair of adjacent intervals also has the same length , the percent overlap is said to be (/)%.
More generally, for a function ℎ :  → R  , the Mapper method constructs a topological space called a simplicial complex, of which a graph is a one-dimensional example.In its full generality, the Mapper method applied to a function ℎ :  → R  results in a simplicial complex with one vertex (or 0-simplex) for every cluster, one edge (or 1simplex) connecting a pair of vertices whenever 2 clusters from neighboring level sets have nonempty intersection, one triangular face (or 2-simplex) filling the region enclosed by three edges whenever 3 clusters from neighboring level sets have nonempty intersection, one solid tetrahedron (or 3simplex) filling the region enclosed by four triangles whenever 4 clusters from neighboring level sets have nonempty intersection, and so on.The level sets, and thus the simplicial complex, are determined by the size and shape of the regions used to cover the image of ℎ.There are several common ways to cover bounded regions in R  , such as using rectangles, hexagons, or circular disks in R 2 or boxes or spherical balls in R 3 , and different coverings of the image of ℎ will result in different level sets and thus a different simplicial complex.More details on using the Mapper method to produce a simplicial complex from a function ℎ :  → R  with  > 1 can be found in the paper by Singh et al. [5].
An example of the Mapper method is given in Figure 1.In this example, the data set  ⊂ R 2 is a finite set of points randomly selected on an annulus and the function ℎ :  → R is the height projection ℎ(, ) = .Singlelinkage clustering was used on each of the ℓ = 3 level sets , and [0, 2] with 50% overlap between neighboring intervals.The level set  2 is a disjoint union of two sets  2,1 and  2,2 which have points with negative and positive -coordinates, respectively.Using single-linkage clustering, each of the sets  1 ,  2,1 ,  2,2 , and  3 produces one cluster and thus one vertex in the Mapper model, while each of the nonempty intersections

Methods
This section provides a description of the Miller-Reaven diabetes study data, how the data will be analyzed using PCA, the configuration of the neural network, and how the Mapper method will be used to visualize the results.

Case Study:
The Miller-Reaven Diabetes Data.In [13,14,26], Reaven and Miller describe the results obtained by applying the projection pursuit method to data obtained from a diabetes study conducted at the Stanford Clinical Research Center.The diabetes study data consisted of the (1) relative weight, (2) fasting plasma glucose, (3) area under the plasma glucose curve for the three-hour glucose tolerance test (OGTT), ( 4) area under the plasma insulin curve for the OGTT, ( 5) and steady state plasma glucose response (SSPG) for 145 volunteers for a study of the etiology of diabetes [27].The goal of the study was to determine the connection between this set of 5 variables and whether patients were classified as overt diabetic, chemical diabetic, or not diabetic.
In the study, 33 patients were diagnosed as overt diabetic, 36 as chemical diabetic, and 76 as not diabetic on the basis of their oral glucose tolerance [13].

Principal Component Analysis.
To establish a basis for comparison, the Miller-Reaven diabetes study data will be analyzed using principal component analysis (PCA) to project the data from R 5 to R 3 .PCA is a variance maximizing projection of the data onto a set of orthonormal basis vectors [28][29][30].As PCA is a linear projection, some of the lower variance content of the data will be lost when the dimensionality of the data is reduced.Also, since PCA identifies vectors along which the variance (or spread) of the data is greatest, it is sensitive to outliers.

Neural Networks and Mapper.
The general method for analyzing the Miller-Reaven diabetes study data with a neural network and Mapper is as follows.
( First, the Miller-Reaven diabetes study data were preprocessed by normalizing each of the five data inputs by finding -scores.Since this normalization is an invertible affine transformation, it has no effect on the neural network's ability to solve the classification problem.The target output values for the neural network were set to −1 for overt diabetic, 0 for chemical diabetic, and 1 for not diabetic.A generalizable neural network was constructed by using 67% of the data for training and holding out 33% for testing, and these sets were stratified so that each class (overt, chemical, and not diabetic) appeared in the same proportion as in the entire data set.No extra measures were deemed necessary to denoise the Miller-Reaven data set before constructing a neural network.
Second, a feedforward, multilayer perceptron neural network was constructed with 5 input nodes, 4 hidden nodes, and 1 output node, and the method of backpropagation via gradient descent was used for weight updates.Many different numbers of hidden nodes were considered, and four hidden nodes were chosen by using mean square error on the training and testing sets as a criterion for determining whether a neural network underfits or overfits the data.The activation functions chosen were  1 () = tanh() = (  −  − )/(  +  − ) and  2 () = .The weights and biases in the neural network were initialized by random values between −0.5 and 0.5.This study emphasizes visualizing how the clusters in a neural network evolve during the weight update process.Thus, a learning rate of 0.1 was chosen to be small so that as the weights and biases were updated, changes in the topological model produced by Mapper could be observed.Neural network performance was evaluated after every cycle through the training data (i.e., epoch).The training data were the same (i.e., not reselected) from epoch to epoch, and they were presented to the neural network in random order to expedite learning [31].The implementation of the neural network was written by the author in Matlab/Octave and used the standard backpropagation algorithm by stochastic gradient descent [10,Chapter 11].
Finally, the Mapper method (see Section 2.2) was applied to the neural network to produce a cluster diagram of the level sets in the neural network after several different stages of the weight update process during the formation of the neural

Results and Discussion
This section describes the results of analyzing the diabetes data using PCA and a neural network.Also, the PCA results are compared to the Mapper cluster diagrams for the neural network.

Results for Principal Component Analysis.
The results of principal component analysis on the Miller-Reaven diabetes study data for dimension reduction from R 5 to R 3 are shown in Table 1 and Figure 2. The PCA results in R 3 show that the data consists of a large central cluster of nondiabetic patients (red +), and that clusters of patients diagnosed as overt diabetic (blue ∘) or chemical diabetic (green ×) emanate away from the large central cluster in two different directions.
The PCA results show that the classification problem is not entirely linearly separable in R 2 by two lines, but it suggests that it may be possible to construct two planes in R 3 (and thus also in R 5 ) that separate the data into three categories with a small number of misclassified patients.The PCA results suggest that a neural network which uses a moderate number of separating hyperplanes (i.e., a neural network with one hidden layer and a moderate number of hidden nodes) might be able to solve this classification problem completely.The projections of the PCA results to R 2 shown in Figure 2 show that from left to right there is a progression of diagnoses from not diabetic (red +) to chemical diabetic (green ×) to overt diabetic (blue ∘).The principal values in Table 1 show that almost all of the total variance in the data is captured by the first two principal components, which suggests that the original data set in R 5 could be projected to R 2 , as in Figure 2, thereby effectively compressing the data in the three directions in which it has very little variance.

Results for a Neural Network with Mapper.
The performance of the neural network during the weight update process is given in Figure 3.The number of patients misclassified is determined by rounding the output of the neural network to the nearest integer and then counting the number of times the rounded outputs differ from the target outputs.These performance results show that the classification problem can be solved by a neural network for the entire data set.Figure 3 shows that the mean square error (MSE) on the testing set is almost always less than on the training set and that MSE on the testing set rarely increased while the MSE on the training set decreased, which indicates that the neural network did not overlearn the training set.The spikes in Figure 3 likely occur because different classes of input points are very close to each other, and thus small changes in decision boundaries (i.e., separating hyperplanes) for the neural network could lead to sudden changes in the amount of error.The positive performance results in Figure 3 after epochs 12, 32, 57, and 107 indicate four interesting neural networks which misclassified 7, 2, 2, and 0 patients.The neural network after epoch 12 would be a good choice for a compromise between performance and training time since it had a small number of misclassifications and it trained in only a few epochs.The neural network at epoch 12 had an observed success rate of 138/145 = 95.17%, and thus with 95% confidence the true success rate is between 90.37% and 97.64%.It should be noted that the data set is relatively small, so the true success rate of the resulting neural network has a somewhat large confidence interval.The results for using ℓ = 3 and ℓ = 10 intervals (i.e., level sets) in Mapper are shown in Figures 4 and 5, respectively.These results show how the cluster diagrams in the neural network evolve as the number of weight updates increases.The color of each node (i.e., vertex) indicates the average neural network output value of all of the points in that node.Output values of the neural network are encoded using a color gradient in which dark blue indicates values near −1 (overt diabetic), light blue/green indicates values near 0 (chemical diabetic), and dark red indicates values near 1 (not diabetic).The size of each node is proportional to the number of patients in that cluster, and the number in each node is the number of patients in that cluster.Note that the results in Figures 4 and  5 are free-form cluster diagrams in the sense that the absolute position of each node is not important, but the adjacency of nodes connected by edges is important.Further, chains of nodes connected by edges reveal a partial ordering given by the neural network to patients in different nodes, who are assigned different output values by the neural network.4 and 5 show that the graph is connected until the error becomes very low, at which point it may split into several connected components.With only three level sets and 25% overlap of intervals in Figure 4, there are only a few clusters in the neural network and they each have a large number of patients.In contrast, using ten intervals and 50% overlap in Figure 5 produces a higher resolution picture that displays chains of vertices linked by edges for much of the evolution of the neural network.The chains of vertices in Figures 4  and 5 progress from red (not diabetic) to green (chemical diabetic) to blue (overt diabetic), just as the PCA results in R 2 do in Figure 2. The large clusters in Figure 5 are useful because they identify homogeneous groups of patients who Using projection pursuit instead of PCA in [13,14], Miller and Reaven showed that in R 3 this diabetes data looks like a central cluster of nondiabetic patients with two different "flares" of clusters of overt and chemical diabetic patients emanating from this central cluster, which is very similar to the PCA results in Figure 2.This is not surprising since PCA can be viewed as an example of projection pursuit [29].Further, analysis of the Miller-Reaven data using Mapper with a kernel density estimator in [5], instead of a neural network, also produced a topological model for the data with a central cluster and two "flares" analogous to the projection pursuit results.Examination of the PCA results suggests that while a kernel density estimator might work well for overall shape, it might not be very accurate in differentiating between red (non-diabetic) and green (chemical diabetic) in Figure 2 because they are interspersed to some extent.Viewing the projection pursuit and PCA results in R 3 shown in Figure 2 as a central cluster with flares, it would appear that the green (chemical diabetic) is connected to red (not diabetic) which is connected to blue (overt diabetic).However, viewing the PCA results in R 2 shown in Figure 2 suggests that the clusters should be connected to each other in the order red to green to blue, as the neural network has done in many of the cluster diagrams in Figures 4 and 5.

Discussion. The Mapper results in Figures
According to Halkidi et al. [34], visualization of a data set is crucial for verifying clustering results.The PCA results in Table 1 indicate that the inputs in R 5 can be projected to R 2 without much variance being lost, so the data is very close to being two-dimensional.Further, the results of projecting the data to R 2 shown in Figure 2 make this data set ideal for the purpose of validating a clustering method by visual comparison.The neural network performance results in Figure 3 show that the neural network was able to solve the Miller-Reaven diabetes classification problem.Visual comparison of  the neural network and Mapper results in Figures 4 and 5 with the PCA results in Figure 2 reveal that the cluster diagram and PCA results convey the same information in compressed (i.e., clustered) and noncompressed ways, respectively.Thus, these results serve to validate the cluster diagrams generated by a neural network and Mapper.

Conclusions
Neural networks and the Mapper method have a symbiotic relationship for solving clustering problems and modeling the solution.The level sets of a neural network can be used to solve a clustering problem for high-dimensional data sets, and the Mapper method can produce a low-dimensional cluster diagram from these level sets that shows how they are glued together to form a skeletal picture of the data set.
Using neural networks and the Mapper method together simultaneously solves the problem that visualizing the level sets of a neural network is difficult for high-dimensional data and the problem that the Mapper method only produces useful results when applied to a function that solves a clustering problem effectively.Together, they combine the efficacy of neural networks at solving clustering problems with the clarity and simplicity of cluster diagrams produced by the Mapper method, thereby making the neural network's solution to the clustering problem much easier to interpret and understand.Further, the Mapper method allows the neural network's solution to a clustering problem to be viewed at different resolutions, which can help with developing a model that shows important features at the right scale.
The results of the case study provide evidence in support of the conclusion that using a neural network to solve a clustering problem and the Mapper method to produce a clustering diagram is a valid means of producing an accurate low-dimensional topological model for a data set.In particular, the most important pattern observed in the scatterplot of the PCA results, which was progression classifications from non-diabetic (red +) to chemical diabetic (green ×) to overt diabetic (blue ∘) in Figure 2, was also observed at a finer resolution in the cluster diagram for the neural network in Figure 5. Further, the linear chains of nodes connected by edges in the clustering diagrams in Figures 4 and 5 provided a partial ordering on the neural network results that made the results easier to interpret.In order to firmly establish the validity of using a neural network with Mapper for a wide variety of applications, it is evident that in the future this

( 1 )
Choose a real-valued function ℎ :  → R on the data set, a clustering algorithm (e.g., single-linkage clustering), and a positive integer ℓ for the number of level sets.(2) Find the image (or range) of the function ℎ.Let  = min{ℎ() |  ∈ } and  = max{ℎ() |  ∈ }.The image of ℎ is then a finite subset of the interval [, ].
and  2,2 ∩  3 produces one edge in the Mapper model.

Figure 1 :
Figure 1: Illustration of the Mapper method applied points on an annulus with ℎ(, ) = , single-linkage clustering, and ℓ = 3 level sets.Top row: (a) the annular data set, (b) its level sets, and (c) intersections of adjacent, overlapping level sets.Bottom row: (a) the topological model produced by Mapper, (b) its vertices showing clusters in level sets, and (c) its edges showing adjacent level sets that intersect nontrivially.

)
Preprocess the data set and divide it into stratified training and testing sets.If necessary, preprocess the data to reduce noise.(2) Use the training data to construct a neural network ℎ : R  → R, meanwhile evaluating the error of the neural network on the testing set to prevent overlearning and overfitting.(3) Apply the Mapper method (see Section 2.2) to the neural network function ℎ to produce a diagram of the clusters formed by the neural network.

Figure 2 :Figure 3 :
Figure 2: Principal component analysis of the Miller-Reaven diabetes data.The diagnosis is color coded with a red + for not diabetic, a green × for chemical diabetic, and a blue ∘ for overt diabetic.

Figure 4 :
Figure 4: Mapper visualization of clusters in the neural network for the Miller-Reaven diabetes study data using 3 intervals with 25% overlap.From left to right: clusters in the neural network after 12, 32, 57, and 107 epochs.The neural network classification is color coded with dark blue for overt diabetic (class −1), light blue/green for chemical diabetic (class 0), and dark red for not diabetic (class 1).

Figure 5 :
Figure 5: Mapper visualization of clusters in the neural network for the Miller-Reaven diabetes study data using 10 intervals with 50% overlap.From left to right: clusters in the neural network after 12, 32, 57, and 107 epochs.The neural network classification is color coded with dark blue for overt diabetic (class −1), light blue/green for chemical diabetic (class 0), and dark red for not diabetic (class 1).

Table 1 :
[33]cipal values and their percentages of the total variance in the Miller-Reaven data.Using Mapper to visualize the clusters in the neural network as the neural network develops shows how the clusters in the neural network change as the training data is learned.Mapper was implemented in Matlab/Octave[32]and utilized GraphViz[33]to produce the graphs.The clustering algorithm used for Mapper was single-linkage clustering.The clusters in the level sets were viewed at different resolutions by varying the number of level sets and the amount of overlap between them.Decreasing the number of level sets can be used to reduce sensitivity to noise.The diagram of clusters in the neural network will be validated by visual comparison to the PCA results.