Schizophrenia EEG Signal Classification Based on Swarm Intelligence Computing

One of the serious mental disorders where people interpret reality in an abnormal state is schizophrenia. A combination of extremely disordered thinking, delusion, and hallucination is caused due to schizophrenia, and the daily functions of a person are severely disturbed because of this disorder. A wide range of problems are caused due to schizophrenia such as disturbed thinking and behaviour. In the field of human neuroscience, the analysis of brain activity is quite an important research area. For general cognitive activity analysis, electroencephalography (EEG) signals are widely used as a low-resolution diagnosis tool. The EEG signals are a great boon to understand the abnormality of the brain disorders, especially schizophrenia. In this work, schizophrenia EEG signal classification is performed wherein, initially, features such as Detrend Fluctuation Analysis (DFA), Hurst Exponent, Recurrence Quantification Analysis (RQA), Sample Entropy, Fractal Dimension (FD), Kolmogorov Complexity, Hjorth exponent, Lempel Ziv Complexity (LZC), and Largest Lyapunov Exponent (LLE) are extracted initially. The extracted features are, then, optimized for selecting the best features through four types of optimization algorithms here such as Artificial Flora (AF) optimization, Glowworm Search (GS) optimization, Black Hole (BH) optimization, and Monkey Search (MS) optimization, and finally, it is classified through certain classifiers. The best results show that, for normal cases, a classification accuracy of 87.54% is obtained when BH optimization is utilized with Support Vector Machine-Radial Basis Function (SVM-RBF) kernel, and for schizophrenia cases, a classification accuracy of 92.17% is obtained when BH optimization is utilized with SVM-RBF kernel.


Introduction
The reason for schizophrenia occurrence is generally not known, but researchers believe that it is a combination of the environment and brain genetics which contributes a lot to the development of this disorder [1]. The signs and symptoms usually involve disorganized speech, delusions, impaired functions of organs, and hallucinations [2]. Symptoms generally vary with type and severity depending on time, sometimes remission of symptoms can occur and sometimes the existing symptom can worsen to a great extent. The symptom of schizophrenia in teenagers is more or less the same as those in adults as they experience a drop in performance at school, lack of motivation, irritability, depressed mindset, withdrawal from friends and family, and also suffer from trouble sleeping [3]. People affected with schizophrenia generally lack awareness that their difficulty originates from a mental disorder which requires careful medical screening [4]. Suicidal thoughts and behaviour too are very common symptoms of schizophrenia. Certain naturally occurring brain channels such as neurotransmitters when altered or disturbed may contribute to schizophrenia [5]. Though the precise cause of schizophrenia is not known, the common risk factors are having a family history of schizophrenia, birth complications, exposure to toxic elements, malnutrition, and psychotropic drugs which alter the state of mind [6]. If schizophrenia is left untreated, then it can result in a plethora of problems such as anxiety disorder, depression, alcohol abuse, social isolation, aggressive behaviour leading to victimization, social isolation, and financial issues followed due to various health and medical problems [7]. There is definitely no sure method to prevent schizophrenia, but considering and taking the treatment plan effectively can help in preventing relapses [8]. Characterized by relapsing episodes of psychosis, it is a serious disorder/mental illness. To determine the neural dynamics of human cognition, EEG recording acts as a sensitive tool as it can provide a millisecond-level resolution [9]. EEG data are complex, and at the same time, they are dimensional too as they are dependent on an event of time series [10]. As EEG signals provide electrical activity of the brain, it is easy to analyze the schizophrenia patient data.
Some of the most common works related to schizophrenia EEG signal analysis and classification reported in the literature are as follows. An EEG-dependent nonlinearity analysis technique for schizophrenia diagnosis was conducted by Zhao et al. [11]. The abnormal EEG complexity in patients with schizophrenia and depression was performed by Li et al. [12]. Fractal dimension was used by Raghavendra et al. [13] to analyze the complexity of EEG in schizophrenia patients. The complexity measures and entropy for EEG signal classification of both schizophrenia and control participants were performed by Sabeti et al. [14]. A preliminary report on the reduced nonlinear complexity and chaos during sleep in the first episode schizophrenia was given by Keshavan et al. [15]. A machine learning-based diagnosis of schizophrenia using combined sensor-level and source level EEG features was proposed by Shim et al. [16]. A preliminary data analysis of comparing the EEG nonlinearity in deficit and nondeficit schizophrenia patients was conducted by Cerquera et al. [17]. The nonlinear analysis of EEG in schizophrenia patients with persistent auditory hallucination was performed by Lee et al. [18]. For a schizophrenia patient, the estimate of the first positive Lyapunov exponent of the EEG signal was performed by Kim et al. [19]. A magnetoencephalography (MEG) study on the LZC in schizophrenia patients was conducted by Fernandez et al. [20]. A multiscale entropy analysis for the abnormal EEG complexity signals was performed by Takahashi et al. [21]. As a diagnostic test for schizophrenia, the spectral EEG abnormality was analyzed by Boutros et al. [22]. An in-depth analysis of the utility of quantitative EEG in unmedicated schizophrenia was conducted by Kim et al. [23]. For schizophrenic and healthy adults, the machine learning identification of EEG features which helps in predicting working memory performance was done by Johannesen et al. [24]. A data-driven methodology for resting state EEG signal classification of schizophrenia with control participants using random matrix theory was developed by Liu et al. [25]. Deep-learning methods along with random forest and voting classifiers was performed by Chu et al. for the individual recognition in schizophrenia using resting-state EEG streams [26]. Convolution Neural Networks (CNNs) along with the Pearson Correlation Coefficient (PCC) to represent the EEG channel relationships were used to classify the schizophrenic and healthy patients using EEG signals by Naira and Alamo [27]. Random Forest Machine learning algorithm was used to identify and diagnose the schizophrenia EEG signals by Zhang [28]. A higher order pattern discovery was used to classify the schizophrenia EEG by Sheng et al. [29]. The complexity of the EEG signals in schizophrenia syndromes was analyzed by Kutepov et al. [30]. A fractal-based classification of EEG signals for both schizophrenic and healthy patients was performed by Namazi et al. [31]. A new approach for EEG signal classification using Linear Discriminant Analysis (LDA) and Adaboost was found for schizophrenic and control participants by Sabeti et al. [32]. In this work, with the advent of some features, optimization techniques and classifiers, the schizophrenia EEG signal classification is performed, and this attempt is the first of its kind in schizophrenia EEG signal classification. The organization of the work is as follows. Section 2 explains the materials and methods, Section 3 explains about the feature extraction techniques, Section 4 explains about the swarm intelligence computing techniques, Section 5 explains about the classification techniques, Section 6 explains the results and discussion followed by the conclusion in Section 7.

Materials and Methods
The EEG dataset for 14 healthy subjects and 14 Schizophrenic subjects was collected from the Institute of Psychiatry and Neurology, Warsaw, Poland, and the details are explained clearly in [33]. For the subjects, 7 males with an average age of 27.9 + 3.3 years and 7 females with an average age of 28.3 + 4.1 years were selected. The standard 10-20 International system was used for the acquisition of the EEG data. The patients' data were obtained in a relaxed state and with their eyes closed. The segmentation of the acquired EEG signals was performed where it is considered to be stationary. A very simplified pictorial representation of the work is given in Figure 1.
Over the duration of 15 minutes, 19 channel EEG signals are obtained. Every channel of EEG signals comprises of 2,25,000 samples which are, then, divided into groups of 5000 sample segments. Therefore, the data matrix of [5000 × 45] is framed per channel. For the preprocessing of EEG signals, Independent Component Analysis (ICA) was utilized in this work.

Feature Extraction Techniques
To describe a very large amount of data, feature extraction is necessary as it involves in mitigating the number of resources to a certain limit. Once the preprocessing of the EEG signals is conducted, the following features are extracted from the EEG signals as follows: (i) DFA: to trace the self-similarity characteristics of the EEG signal, DFA [34] is used. (ii) Hurst exponent: the self-similarity and the predictability estimation of the EEG signal are expressed by the Hurst exponent [35]. If the magnitude value of the Hurst exponent is greater, then it denotes that the EEG signal is pretty smooth and less complicated.
(iii) RQA: to measure the complexity of the EEG signals, the total number of times of recurrences is evaluated by RQA [36]. (iv) Entropy: to evaluate the irregularity and uncertainty present in the EEG signal, entropy features are used. When the complexity and the variability of the EEG signal increases, then the entropy of the EEG signals is higher. Sample entropy [37], Shannon entropy [38], approximate entropy [39], Renyi entropy [40], permutation entropy [41], Kolmogorov-Sinai entropy [42], and fuzzy entropy [43] are the types of entropy usually used for the analysis, and in this work, only sample entropy has been extracted. (v) FD: to compare and analyze the complexity of details in the EEG, FD is used, thereby enabling the detection of EEG signal patterns [44]. The Fractal Dimension of a signal is expressed through the Katz method as follows: where between the successive points, H represents the sum of distances and d represents the diameter estimated. (vi) Kolmogorov complexity: the characteristics of the EEG signal are easily explained by this parameter [45]. If the signals are more random, then the description length is also longer. (vii) Hjorth: to quantify the morphology of the EEG signal, the important Hjorth parameters utilized here are complexity, mobility, and activity [46]. (viii) LZC: to assess the repetitiveness of the EEG signals, LZC is used [47]. If the LZC values are higher, then it shows that the signal is more repetitive.
(ix) LLE: to assess the degree of chaos present in the EEG signals, an estimate of it is located by the LLE [48]. If the complexity of the signals is high, then the value of LLE is also high.
The feature extraction is initiated using DFA at first among the nine features. The attained feature matrix per method per channel is in the form of [5000 × 10]. Then, four types of optimization producer such as AF optimization, GS Optimization, BH optimization, and MS optimization are utilized to further extract a better represented feature column matrix as [5000 × 1]. This procedure is repeated for all the channels among the subjects. This feature extraction method is repeated for all the other eight features such as the Hurst exponent, RQA, entropy, fractal dimension, Kolmogorov complexity, Hjorth, and LLE, and thus, the feature extraction is performed for each data segment as such.

Swarm Intelligence Computing Techniques
To determine and understand a certain subset of initial features, feature selection is required. The features which are selected will usually have the useful and most relevant information from the original data so that, using the reduced form of representation, the desired task can be performed easily. The following four optimization techniques are utilized in this work.

Artificial Flora Algorithm.
Developed by Cheng et al. [49], four basic elements are comprised in this algorithm such as original plant, offspring plant, location of the plant, and the distance of propagation initially. The plants that are used to spread seeds are called original plants. The seeds of original plants are called offspring plants, and at that moment, they cannot spread seeds. Plant location is the specific location of the plant, and the distance of propagation refers to how long a seed can spread. The three major patterns are present here such as evolution behaviour, spread behaviour, and select behaviour. The probability that the evolvement of the plant adapts to the environment behaviour is called evolution behaviour. The movement of seeds is referred by the spreading behaviour, and select behaviour refers to the survival or death of the flora due to the environment. The main rules here are as follows: Rule 1. Due to the environmental or any external factors, a random distribution of a species in a region is performed; in that region, no such species were found earlier, and so, it becomes the primitive original plant.

Rule 2.
As the environment changes, the plants will adopt to live in the main environment. Therefore, a complete inheritance to the parent plant does not depend on the proper distance of offspring plants.
Rule 3. When the seeds are spread around the original plant autonomously, the range is a circle where radius is the maximum propagation distance. Anywhere in the circle, the offspring plants can be distributed.   Here, the adaptability of the plants to the environment can be referred by fitness. Or in other words, the survival probability of a plant in a specific position is termed as fitness. If the fitness is higher, then the probability of survival is greater.
Rule 5. The survival probability will be lower if the distance is further from the original plants because the basic difference between the current environment and the previous environment will be much greater.
Rule 6. If the seeds are spread in an external margin, then the spread distance cannot cross the maximum area of limit because of other constraints.

Evolution
Behaviour. The seeds are spread by the original plant around in a circle which is nothing but the propagation distance and is evolved for the respective propagation distances of the parent plant and the grandparent plant and is represented as where d 1h is the propagation distance of the grandparent plant, d 2h is the propagation distance of the parent plant, c 1 and c 2 are the learning coefficients, and rand(0, 1) denotes the independent uniformly distributed number in (0, 1).
The normal grandparent distance of propagation is expressed as The standard deviation between the respective position of the original and offspring plant is the new parent propagation distance and is given as

Spreading
where the maximum limit area is represented by d, and rand(0, 1) is an array of random numbers that have a uniform distribution between (0, 1). With the help of the following propagation function, the position of the offspring plant is generated as follows: where the number of seeds that one plant can propagate is expressed as m, L i,h×m ′ denotes the offspring position, L i,h is the original plant position, and D i,h×m is random number with the Gaussian distribution with zero mean and variance d h . If the survival of the offspring plant is not guaranteed, then a new plant is generated as shown in the abovementioned equation.

Select Behaviour.
The survival probability is used to assess whether the offsprings are alive or not and is represented as where P (h×m−1) y is P y to the power of (h × m − 1) and P y is the selective problem. This value has to be between 0 and 1, and in our experiment, it is 0.5. When the offspring plant is farther from the original plant, then fitness is lower. P y can assess the exploration ability of this algorithm. To get a local optima solution, P y should be larger. The maximum fitness in the flora is determined by the F max , and the fitness of the h th solution is determined by F(L i,h×m ′ ). To decide whether the offspring plant is alive or not, the roulette wheel selection method is used in our work. The procedure is explained in Pseudocode 1.

Glowworm Swarm Optimization Algorithm.
In this algorithm, initially, the position location and information on data exchange can be carried out by most glowworms by means of sending out rhythmed short beams [50]. Around a particular searching scope, the main intention of GS optimization is to find out the flaring neighbor. The glowworms always move from the first position to a best position and finally into a more extreme value point. The attraction of the glowworm individuals is highly related to its brightness. The attractiveness of a particular individual glow worm is inversely proportional to the distance between the two individuals, so it implies that it has a direct proportion to brightness. Each position of the individual glow worm accounts for the objective function value. The individual search scope is defined by the dynamic decision domain. The individual movement is later updated step by step in the procedure given below.

Procedure
(i) Parameter initialization: ′i′ individuals are initially placed in a random fashion around a feasible region. f 0 denotes the fluorescein value, q 0 indicates the dynamic decision domain, st indicates step, i n expresses domain threshold, the update coefficient of domain is expressed as β, q st denotes the maximum search radius, and the iteration number is expressed as n.
where y j (n) expresses for the individual position j at ′n′ instant of time. (iii) In each q j d (n), the higher fluorescein value is selected, thereby forming a set of neighborhood I j (n). Therefore, (iv) The probability of a particular individual j may progress forward as h and is expressed as where h is chosen by p jh (n). (v) The updation of the position of individual j is expressed as (vi) The updation of the dynamic decision domain is expressed as . (12) 4.3. Black Hole Algorithm. The algorithm is inspired from the black hole phenomenon [51]. In this method, at each iteration, the best candidate is chosen as the black hole and the other normal candidates are chosen as normal stars. The black hole creation is one of the original candidates of the entire population, and it is not random. Depending on one's current location and a randomly generated number, the movement of the candidate towards the black hole is ascertained. The step-by-step details of the algorithm is as follows: (i) It is a very famous population-based algorithm. Therefore, in the search space of some function, some randomly generated population of candidate solutions (stars) is placed. (ii) The fitness value evaluation of the population is conducted after the initialization process. (iii) The best candidate in the population is the one which has the highest fitness value, and it is assumed to be the initial black hole. (iv) While this process is going on, the rest of the candidates do the normal stars. (v) The specialty of the black hole is that it has the capacity to absorb the surrounding stars around it. (vi) Once the stars and the black hole are initialized, the absorption of the stars by the black hole takes place, and therefore, the movement of all other stars is now towards or around the black hole. (vii) The formulation of the capability of the absorption of stars by the black hole is ascertained as follows: Here, Y j (t) and Y j (t + 1) are the locations of the star j in iteration t and t + where G cl represents the black hole fitness value, G j is the fitness value of the star j, S denotes the number of stars which indicates the size of population/candidate solutions, and R denotes the black hole radius. The candidate is usually collapsed when the distance between the best candidate (black hole) and the candidate solution is less than R. In such a case, the value of a new candidate would be carried out, and it is randomly distributed in the search space giving the optimized and the best values.

Monkey Search
Algorithm. Based on the mountain climbing procedure of monkeys, one of the recently developed metaheuristic algorithms is MS algorithm [52]. Initially, for the monkeys, the population size is defined. Each monkey's specific position is denoted by a vector a i , and it is performed randomly. With the help of a step-bystep climbing process, the monkey positions are changed. One of the famous recursive optimization algorithm called Simultaneous Perturbation Stochastic Approximation (SPSA) was used to design the climb process in MS algorithm. The objective function value is improved by the climbing process. Once a monkey arrives on top of a mountain after the climbing process, then it will search for higher points than its current position. A monkey with its sharp and keen eyesight would easily shift its base to a higher point by jumping. The maximum distance watched by a monkey is determined by the eyesight of a monkey. The updation of the position is carried out. Then, the monkeys start to search the novel search domains by utilizing the current position as a pivot. This process is termed as a somersault procedure, and it helps the monkeys to get a new position. The evaluation of the objective function values is conducted, and if the number of iterations is satisfactory, then the process will be stopped. The main process of this algorithm is as follows: (1) Representation of a solution: the population size of monkey M is defined initially. For any monkey, (a j1 , a j2 , . . . , a jn ) which gives the optimization problem solutions with the n dimensions.
(2) Initialization: in a random manner, the initial solution of the monkeys is generated. (3) Climbing process: (i) A vector is generated randomly Δa j � (Δa j1 , Δa j2 , . . . , Δa jn ), where Δa jk is set as s, where s denotes the step length of climb process.
k � 1, 2, . . . , n, and at point a j , the vector is termed as the pseudogradient of the objective function (ii) Assign z k � a jk + α sign(g jk ′ (a j )), k � 1, 2, . . . , n and z � (z 1 , z 2 , . . . , z n ) (iii) Assume a j ←z has given z is feasible or else keep as such a j (iv) Unless the maximum number of allowed iterations N q has reached or there is a little change in the objective function of the iteration, the steps (i) to (iii) of the climbing process are repeated (4) Watch jump process: (i) The real number z j is generated randomly from (a jk − e, a jk + e), k � 1, 2, . . . , n, where e represents the eyesight of the monkey which denotes the maximum distance that can be witnessed by a monkey. (ii) Assume a j ⟵ z given that g(z) > g(a j ) and z is feasible. Unless a certain number of watch times has been obtained or until an appropriate point z is found out, the steps are repeated. (iii) The climb process is repeated by employing z as an initial position. (5) Somersault process: (i) From the somersault interval [q, d], a real number θ is generated randomly. Unless the stopping criterion is met, the abovementioned steps are repeated. As the stopping criteria, the number of iterations are used.
Thus, using these metaheuristic algorithms, the optimal and best solutions are explored and exploited from the initial random solutions.

Classification Techniques
The optimized values or the best selected feature values through swarm intelligence computing techniques are finally fed to the classifiers for classification.

ANN.
Here, a three-layer perceptron comprising of an input layer, hidden layer, and an output layer was utilized [53]. The total number of hidden neurons is expressed by the following equation: where the number of input neurons is expressed as A, the number of output neurons is expressed as B, and d is a constant and its range is from d ∈ [0, 1]. For the ANN classifier, the total number of hidden neurons used here is 50.

QDA.
The ratio of the between class variance is maximized, and the ratio of the within class variance is minimized by QDA [54]. Between classes, it also allows quadratic decision boundaries so that the classifier can perform the classification more accurately providing good classification accuracy. In this QDA, no shrinkage or regularization is used.

SVM.
Due to its good scalability and its high classification performance, SVM is used [55]. Creating a hyperplane to maximize the margins between the classes that can be obtained by mitigating the cost function, so that the maximum classification accuracy is attained is the main idea of SVM. The hyperplanes which are represented by the vectors are known as support vectors. By minimizing the cost function, the optimal solution that maximizes the distance between the hyperplane and the nearest training point is obtained by the SVM as follows: Minimize, where w T , x j ∈ R 2 and f ∈ R ′ , ‖w‖ 2 � w T w, The tradeoff between the margin and the error is denoted as C. The measure of the training data is expressed as ξ j The class label for the j th sample is denoted as z j . SVM can be utilized as a both linear and nonlinear classifier. Various types of kernel functions are utilized to make SVM as a nonlinear classifier. The types of kernels generally used are Polynomial, Radial Basis Function (RBF), and Sigmoid kernels. Here, in our work, only SVM-RBF kernel is used, and this nonlinear SVM is used to get higher classification accuracy.

Logistic Regression. Between an independent variable and a response variable, the relationship is assessed by
Logistic Regression (LR) [56]. An observation is classified into one of the two classes using Logistic Regression in a very simple manner. When the variables are nominal or binary, it can be used. Similar to the Bayesian group, the data are comprehensively analyzed after the discretization process for the continuous variables is performed.

FLDA. The main intention of the Fischers Linear
Discriminant Analysis (FLDA) is to trace a direction in the feature space along which the specific distance of the means relative to the within-class scatter explained by the withinclass scatter matrix S W reaches a maximum [57]. When the within-class scatter matrix reaches a maximum, the class separability is maximized. By maximizing the following criterion with the between-class scatter matrix, this goal can be achieved.

J(W) � W T S B W W T S W W . (18)
To maximize the criterion, the direction w is expressed as where m 1 and m 2 are the means for the two classes. For the two classes, the FLDA acts as a suboptimal classifier when their respective distributions are Gaussian in nature.

KNN.
One of the famous nonparametric algorithms utilized for both classification and regression is KNN [58]. On the underlying data distribution, KNN does not make any assumption. There is also no explicit training phase available here. During the testing phase, the utilization of the training data is carried out where the measurement between the training instance and test instance is performed. The prediction of the class of the test instance is performed by utilizing the majority voting of the any of the K-nearest training instances. The value of K is assumed to be 4 in our work.

Results and Discussion
It is classified with 70% of training and 30% of testing methodology and the performance of it is computed. The experiment was repeated five times to check whether we get the similar results every time when the analysis is done. The mathematical formulae for computing the Performance Index (PI), Sensitivity, Specificity, Accuracy, Good Detection Rate (GDR), and Mean Square Error (MSE) rate is mentioned in literature, and using the same, the values are computed and exhibited. PC indicates Perfect Classification, FA indicates False Alarm, and MC indicates Missed Classification, respectively.
The sensitivity is expressed as Specificity is expressed as Accuracy is expressed as Performance Index is expressed as Good Detection Rate (GDR) is expressed as The Mean Square Error (MSE) is expressed as follows: Computational Intelligence and Neuroscience where O i indicates the observed value at a specific time, T j denotes the target value at model j; j � 1 to 19, and N is the total number of observations per channel for a single optimization method, and in our case, it is 45000. The training of the classifiers was implemented with a zero-training error of MSE. Table 1 shows the average statistical parameters such as mean, variance, skewness, and kurtosis of the nine extracted features through four-optimization process for the normal cases. The higher value of mean indicates the peaked position of the feature selected. Lower value of mean indicates that there exists the peak and valley points in the features. As in the case of AF optimization in Kolmogorov complexity features, peaked value of mean is attained. The variance parameter shows the energy component of the feature. Here also in Kolmogorov complexity in AF optimization method arrived higher value. Skewness depicts the skewed features of the data points, and all the features in Table 1 indicates the same. The flatness is indicated by the higher kurtosis values. In the case of Hurst exponent, Fractal dimension and Hjorth features at AF optimization show higher value of kurtosis. Table 2 demonstrates the average statistical parameters such as mean, variance, skewness, and kurtosis of the nine extracted features through four-optimization process for the schizophrenia cases. The higher value of mean indicates the peaked position of the feature selected. Lower value of mean indicates that there exist the peak and valley points in the features, as all the optimization methods made the mean parameter as a smaller one among the nine features. Skewness depicts the skewed features of the data points and all the features in Table 2 indicate the same. The flatness is indicated by the higher kurtosis values. In the case of Kolmogorov complexity at AF optimization and BH optimization, it shows higher value of Kurtosis.
The correlation among the normal and schizophrenia cases can be established by calculating the Canonical Correlation Analysis (CCA) as shown in Table 3. If the CCA value is greater than 0.5 that indicates the two classes are highly correlated and for lower value it is vice versa. As shown in Table 3 the CCA is calculated for the nine features among the normal and schizophrenia cases. As observed from Table 3, the lower value of CCA indicates the Nil correlation among the features across the classes. Table 4 exhibits the average PCC with different features for normal and schizophrenia cases. The values in Table 2 indicates the nonlinear relation among the features in the same class of the data. Therefore, the uncorrelated and nonlinear features have to be optimized by the optimization process. Table 5 shows the CCA of various optimization techniques with different features for normal and schizophrenia cases. It is observed from Table 5 that the low value of CCA is definitely indicating the Nil correlation among the features of the two classes. Table 6 depicts the Average PCC at various optimization techniques with nine different features for normal and schizophrenia cases. As shown in Table 6, low value of CCA in the normal cases indicates the presence of nonlinearity in the features. The negative value of PCC in the schizophrenia cases mention about the inverse relation among the features as well as the optimization methods. Table 7 shows the consolidated results of accuracy among the classifiers at various optimization techniques with different features of normal cases. As indicated in Table 7, ANN classifier is ebbed at low value of accuracy in the three types of optimization methods such as AF, GS, and BH. The poor performance of ANN is due to over learning and the exhibition of false alarm in the classifier outputs. FLDA classifier arrived at low accuracy in the case of MS optimization method. SVM classifier is outperforming all the classifiers in terms of higher accuracy value of 87.54% in BH optimization method. Table 8 denotes the consolidated results of accuracy among the classifiers at various optimization techniques with different features of schizophrenia cases. As observed in Table 8, LR classifier is ebbed at low value of accuracy of 78.64% in the black hole optimization method. The poor performance of LR is due to the rigid parametric values and the exhibition of missed classification in the classifier outputs. SVM classifier is outperforming all the classifiers in terms of higher accuracy value of 92.17% in BH optimization method. Evenly nature of the performance shows higher accuracy values among the classifiers for MS method. Table 9 represents the average perfect classification among the classifiers at various optimization techniques with different features of normal cases. As observed in Table  9, LR classifier is ebbed at low value of perfect classification of 54.3% in the AF optimization method. The poor performance of LR is due to the nonadaptive parametric values and the exhibition of false alarm in the classifier outputs. SVM classifier is outperforming all the classifiers in terms of higher perfect classification value of 75.093% in BH optimization method. Evenly nature of the performance shows higher perfect classification values among the classifiers for MS method. Table 10 denotes the average perfect classification among the classifiers at various optimization techniques with different features of schizophrenia cases. As shown in Table 10, LR classifier is reached at low value of perfect classification of 57.29% in the BH optimization method. The poor performance of LR is due to the nonadaptive parametric values and the exhibition of missed classification in the classifier outputs. SVM classifier is outperforming all the classifiers in terms of higher perfect classification value of 84.34% in BH optimization method.
Evenly nature of the performance shows higher perfect classification values among the classifiers for MS method. Table 11 depicts the average Performance Index among the classifiers at various optimization techniques with different features of normal cases. As shown in Table 11, LR classifier is reached at low value of PI of 15.63% in AF optimization method. The poor performance of LR is due to the exhibition of missed classification in the classifier outputs. SVM classifier is outperforming all the classifiers in terms of higher PI value of 64.6% in BH optimization          method. The performance of PI parameter for the ANN and QDA classifier among the four-optimization method is poor, and this is due to the more missed classification and false alarms of the classifier outputs. Table 12 signifies the average Performance Index among the classifiers at various optimization techniques with different features of schizophrenia cases. As shown in Table 12, LR classifier is reached at low value of PI of 24.1% in the BH optimization method. The poor performance of LR is due to the exhibition of missed classification in the classifier outputs. SVM classifier is outperforming all the classifiers in terms of higher PI value of 81.59% in BH optimization method. Once again, MS optimization evenly handled the PI parameter among the classifiers. Table 13 exhibits the average performance of parameters among the classifiers at various optimization techniques with different features of normal cases. As shown in Table 13, ANN classifier is reached at low value of PI of 22.01 and PC of 56.7%. FLDA denotes low GDR value of 37.67%. SVM classifier is outperforming all the classifiers in terms of higher PC, PI, and GDR values and lower error rate value of 27.4%. For a significant accuracy and error rate, KNN classifier closely follows the performance of the SVM classifier. Table 14 depicts the average performance of parameters among the classifiers at various optimization techniques with different features of schizophrenia cases. As shown in Table 14, LR classifier is reached at low value of PI of 31.5% and PC of 60.67%. LR denotes low GDR value of 41.9%. SVM classifier is outperforming all the classifiers in terms of higher PC, PI, and GDR values and lower error rate value of 18.11%. For a significant accuracy and error rate, ANN classifier closely follows the performance of the SVM classifier.

Conclusions and Future Work
The disorders in the areas of the lobes of the brain can lead to schizophrenia. As the lobes of the brain are important for information processing and memory management activities, a huge damage occurs to it due to schizophrenia. The diagnosis, classification, and analysis of schizophrenia spectrum disorders are quite challenging. To incorporate the latest scientific techniques to clinical diagnosis, the scientific community is working very hard. For the brain state interpretation and diagnosis, EEG is emerged as a highly useful and beneficial tool. The proposed method in this work explores and utilizes a plethora of features with four different types of optimization techniques before proceeding to classification. The best results show that, for normal cases, a classification accuracy of 87.54% is obtained when BH optimization is utilized with SVM-RBF Kernel, and for schizophrenia cases, a classification accuracy of 92.17% is obtained when BH optimization is utilized with SVM-RBF kernel. We plan to incorporate other optimization mechanisms with deep learning techniques for schizophrenia EEG signal classification in future work.

Data Availability
Data will be provided to genuine researchers upon request.

Conflicts of Interest
The authors declare that there are no conflicts of interest.