A Novel Classification and Identification Scheme of Emitter Signals Based on Ward's Clustering and Probabilistic Neural Networks with Correlation Analysis

The rapid development of modern communication technology makes the identification of emitter signals more complicated. Based on Ward's clustering and probabilistic neural networks method with correlation analysis, an ensemble identification algorithm for mixed emitter signals is proposed in this paper. The algorithm mainly consists of two parts, one is the classification of signals and the other is the identification of signals. First, self-adaptive filtering and Fourier transform are used to obtain the frequency spectrum of the signals. Then, the Ward clustering method and some clustering validity indexes are used to determine the range of the optimal number of clusters. In order to narrow this scope and find the optimal number of classifications, a sufficient number of samples are selected in the vicinity of each class center to train probabilistic neural networks, which correspond to different number of classifications. Then, the classifier of the optimal probabilistic neural network is obtained by calculating the maximum value of classification validity index. Finally, the identification accuracy of the classifier is improved effectively by using the method of Bivariable correlation analysis. Simulation results also illustrate that the proposed algorithms can accurately identify the pulse emitter signals.

e earliest research on the identification of emitter signals began in the 1970s, and it was one of the key technologies in Electronic Warfare systems. At present, the identification of emitter is mainly based on two modeling techniques: syntactic pattern-based methods and parametric pattern-based methods. In [1], Visnevski author had proposed a syntactic model to identify multifunction radar (MFR) signals. In the process of modeling, the MFRs were considered as stochastic discrete event systems that communicated information by use of radar word level modeling, radar phrase level modeling, and radar sentence level modeling.
e radar word was a fixed arrangement of finite number of pulses, the radar phrase was a series of limited number of radar words, and the radar sentence was a combination of limited number of radar phrases. His simulation experiments had shown that the designed principle was effective for identifying MFRs. Based on the syntactic model of MFRs, Alex Wang and Vikram Krishnamurthy had used the stochastic context-free grammar to describe the behaviors of the MFR system, and some good results were obtained. Although the stochastic context-free grammar was a model for capturing the essential features of the MFR dynamics [2], it had some defects in estimating the parameters of stochastic context-free grammar. erefore, the expectation maximization algorithm had been proposed by LP Dai et al. to estimate these parameters, which can be used to further estimate the characteristic parameters of MFR [3]. From this point of view, the ultimate goal of the modeling technique based on syntax was to find the feature parameters of emitters. It highlighted the importance of identification technology based on the parametric pattern. e feature parameter matching technique was a basic method of this pattern. e main characteristic of this method was to identify the emitter signals by matching the measured signal characteristic parameter vector with the corresponding characteristic parameters in the known database (the libraries of radar types).
is method depended on the feature parameter database, and it can only be applied to emitter identification problem with invariant characteristic parameters. Moreover, the libraries of types had also an inherent uncertainty resulting inevitably from data collection methods. erefore, Jan Matuszewski et al. had proposed the knowledge-based techniques to identify emitters [4,5]. ey think that the information of known radar platforms (knowledge), including position, intent, and recent operational history, plays an important role in the identification of emitters. eir approaches had been successfully used to identify some specific emitters. In order to fully utilize this knowledge to identify emitters, Janusz Dudczyk had proposed an idea of constructing Emitter DataBase based on the entityrelationship modeling. e entity-relationship diagram was introduced to realize this idea, which had pointed out a new direction for the construction of a complete and accurate Electronic Intelligence systems [6]. Meanwhile, artificial intelligence techniques and some optimized feature selection methods were used to improve the identification accuracy of emitter signals. In [7], the authors had proposed a vector neural network with a supervised learning algorithm which worked for signal classification and emitter identification.
is network took carrier frequency, pulse width, and pulse repetition interval as inputs to complete the identification. In [8], the authors had proposed an identification method of radar signals based on immune radial-basis function neural network, which can improve the convergence speed and performance of the algorithm. In [9], a multichannel recognition system with an independent distance which was defined on impartiality condition had been proposed to identify the specific emitters. By modifying the distance in each special recognition channel, radio frequency, pulse width, and pulse repetition interval of radar signals can be extracted and classified into an appropriate class. Beyond that, some methods had been proposed to reduce the identification error rate. In [10], wavelet features were used as inputs of neural networks to identify emitters. In [11][12][13], the support vector machine was introduced to identify emitter signals. In [14,15], the fuzzy c-means and probabilistic neural networks (PNN) were used to identify emitter signals. However, the determination of the optimal number of clusters was a major challenge for these methods, and they cannot effectively improve the identification accuracy. erefore, Jawad et al. designed a clustering validity function in the hidden-layer output space of PNN to find the optimal number of clusters. eir methods were successfully applied to the classification of land use [16]. But, the method of determining the range of clustering number was subjective, which may reduce the identification accuracy of the algorithm. To overcome this deficiency and further improve accuracy, a classification and identification scheme of emitter signals based on the Ward clustering method (WCM) and PNN with correlation analysis was proposed in this paper. Its advantages are presented in three aspects: (1) e self-adaptive filtering, Ward's clustering, and clustering validity indexes are skillfully used to determine the scope of the optimal number of clusters. (2) e classification validity index D is flexibly used to find the optimal PNN classifier. (3) e probabilistic neural networks with Bivariable correlation analysis approach are proposed to improve the identification accuracy of emitter signals. e rest of this paper is organized as follows. In Section 2, the classification and identification schemes including adaptive filtering, frequency spectrum, evaluation indexes, WCM and PNN classifier are introduced. e flowchart and the pseudocodes of the classification algorithms are designed in Section 3. e flowchart and the pseudocodes of the identification algorithm are given in Section 4. e identification experiments are also carried out in this section. e comparison discussions of different schemes are proposed in Section 5. Some innovations and applicable conditions of the proposed method are summarized in Section 6.

Classification Model of Emitter Signals
2.1. Self-Adaptive Filtering. In the field of engineering technology, the signal x(n) received at time n usually contains two parts. One is the useful signal s(n), which is what we need, and it enables us to understand the properties of the object to be studied. e other is the interference signal x 1 (n), which is what we do not need, and it prevents us from understanding the properties of the object to be studied. e actual signal will be obtained once the two parts are combined together. at is, x(n) � s(n) + x 1 (n).
Weakening the interference signal x 1 (n) and maintaining or enhancing the useful signal s(n) are an important purpose of signal processing. e usual method is to use a frequency function H(f) to multiply the frequency spectrum X(f) of the signal x(n).
is process is called filtering. Its essence is to weaken the interference signal and highlight the useful signal. Currently, the most widely used filters include Kalman filtering, Wiener filtering, median filtering, sequential statistical filtering, wavelet transform, self-adaptive filtering, etc. In terms of adaptability and filtering performance, one of the best filtering methods is the self-adaptive filtering, which is developed on the basis of Kalman filtering, Wiener filtering, and linear filtering. e most important feature of the self-adaptive filtering is that it can track the time-varying characteristics of input signals and eliminate the unknown interference contained in the signals. Self-adaptive filtering based on the least mean square algorithm is proposed by Widrow and Hoff, and it has been widely used in many fields because of its simplicity, robustness, and easy implementation. e principle diagram of self-adaptive filtering technology is shown in Figure 1. Figure 1 presents the schematic diagram of noise elimination for self-adaptive filter. e actual signal x(n) contains interference signal x 1 (n) generated from signal channel 1. In order to eliminate it, the noise signal x 0 (n) which is independent of s(n) but related to x 1 (n) must be sampled from noise source through the signal channel 2. e main function of the self-adaptive filter is to process x 0 (n) so that the output y(n) approximates to x 1 (n). Under the condition of convergence of filtering algorithm, the output e(n) of the system approximates to s(n) when y(n) approaches x 1 (n). e iterative formulas of self-adaptive filtering algorithm based on the least mean square are defined as follows [17]: where d(n) is the desired signal; X(n) � [x(n), x(n − 1), · · · , x(n − M + 1)] T is the input signal vector at time n; M is the length of filter; and μ is the fixed step size and satisfies 0 < μ < 1/MP in , where P in is the input power of filter. ω(n) � [w 0 (n), w 1 (n), · · · , w M−1 (n)] T is the weight vector of M order adaptive filter at time n.ω(n) � 0 at initial time. y(n) � ω T (n)X(n) represents the actual output signal of filter. In noise elimination applications, x(n) is usually used as the desired signal d(n) and x 0 (n) is used as the input signal of the filter to eliminate x 1 (n). After many iterations, the difference between d(n) and y(n) is the estimate of signal s(n).
is algorithm has the advantages of small amount of computation, easy implementation, and stable performance, but its convergence speed is relatively slow. erefore, the authors proposed a variable step adaptive filtering algorithm based on bell-shaped function [18]. e variable step size is given as follows: where μ max ∈ (0, 1/MP in ) is the maximum step size that can maintain the convergence of the adaptive filtering algorithm. Experiments show that this algorithm can effectively improve the convergence speed and reduce the steady-state error when a � 0.08 and b � 4. In subsection 3, the method is used to eliminate the interference signal of emitters.

e Spectrum of Signals.
In order to classify and identify signals, it is necessary to analyze the frequency spectrum and energy spectrum of the signals. From the electrical knowledge, P � V 2 /R, where V represents voltage and R represents resistance. If the resistance R � 1 and V is replaced by signal x(t), the instantaneous energy is x 2 (t). us, the total energy of the signals can be expressed as +∞ −∞ x 2 (t) dt. According to Parseval's theorem, it can be obtained by the following equation: where is the Fourier transform for signal x(t) and f represents the frequency of signal x(t). |X(f)| is called the amplitude spectrum and argX(f) is called the phase spectrum of x(t). e frequency spectrum of x(t) consists of amplitude spectrum and phase spectrum. |X(f)| 2 is called energy spectrum density. Equation (3) indicates that the energy of x(t) is closely related to |X(f)| 2 , and it can be obtained by calculating its integral on (−∞, +∞). erefore, we can obtain the frequency distribution and energy distribution of each signal by analyzing the spectrum of the signal. en, the key amplitude and the frequency of energy distribution can be further obtained, which provide conditions for directly observing the similarities and differences of paired signals.

2.3.
e Ward Clustering Method. As a hierarchical agglomerate cluster algorithm, the WCM has a wide range of applications [19][20][21]. First, it is started by accepting each node as a separate cluster. en, the clusters with minimum distance between themselves are combined in pairs at each stage of the algorithm. is smallest distance is called the Ward distance and defined as follows: where r and s represent the two distinct clusters, n r and n s represent the number of data points of two clusters, respectively, x r and x s represent the center of the corresponding cluster, and || · || is Euclidean norm. e centers and cardinal numbers of the new cluster are updated according to the following equations: n r ′ � n r + n s .
Ward's clustering algorithm has the following steps: Step 1. Each sample point is treated as a cluster. At this time, the sum of squares of deviations for each cluster is equal to 0.
Step 2. Two arbitrary clusters are merged, and the sum of squares of deviations is calculated from Equations (5)- (7). If we assume that there are N clusters in total, it must be calculated N(N − 1)/2 times.
Step 3. e two clusters with the smallest squared sum of deviations are combined into one class. e method eventually aggregates all sample points into one class when the number of clusters is unknown.

Computational Intelligence and Neuroscience
If the number of clusters is known, the WCM can be directly used to classify the signal data after removing the noise. Otherwise, it can be obtained by analyzing the dendrogram of clustering. is is a rather subjective approach, which is difficult to help us finding the true number of clusters for a given data set. In recent research, the clustering validity indexes, such as Calinski-Harabasz (CH) index, Gap index, Silhouette (Silh) index, and Davies-Bouldin (DB) index, have been demonstrated to be the best validation tools for determining the optimal number of clusters [22][23][24][25][26][27].

Calinski-Harabasz Index.
For a given set Y � y 1 , y 2 , · · · , y N , assume that the dimension of each entity y i is v, i � 1, 2, · · · , N. K nonempty disjoint cluster sets S � S 1 , S 2 , · · · , S K around the centroid set C � c 1 , c 2 , · · · , c K can be obtained by minimizing the within-cluster distance W K : where d(y i , c k ), k � 1, 2, · · · , K represents the squared Euclidean distance between the entity y i and the centroid c k , that is, en, the CH index is defined as follows [24]: where W K is defined as in (8), and T 1 can be calculated by e CH index can reflect the compactness of the cluster by means of the overall within-cluster variance. e separation degree of the clusters can be reflected by the overall between-cluster variance. erefore, a good clustering scheme corresponds to a higher value of CH index.

Silhouette Index.
For each entity y i , its silhouette value measures the similarity between the entity y i and the points in its own cluster, when compared to the points in other clusters. is similarity is reflected by measuring the distance between the entity y i and the points derived from different clusters. e silhouette value of the entity y i is defined as follows [24]: where a(y i ) is the average distance from the entity y i ∈ S k to all other points y j ∈ S k , b(y i ) is the minimum distance from the entity y i to all other points y j , which satisfies is close to zero, the entity y i could be assigned to another cluster. A negative value of s(y i ) indicates that the corresponding assignment seriously damages cluster cohesion, and the clustering result of y i is not advisable. y i is well matched to its own cluster when s(y i ) is close to 1. Finally, the validity of the whole clustering can be quantified by Silh index, and it is defined as follows: e Silh index can be used with any distance metric, including the Manhattan distances and Euclidean distances.

Davies-Bouldin Index.
A good partition should have a larger intercluster separation degree and stronger withincluster homogeneity and compactness. e DB index is proposed based on this idea [26]. More concretely, it is constructed by a ratio of within-cluster and between-cluster distances. e DB index is defined as follows: where represents the average distance between each point y i in cluster k and the centroid of clusterk. |S k | is the number of points in cluster k. If q � 1, d k is the average Euclidean distance between the points in cluster k to the centroid of cluster k. If q � 2, d k is the standard deviation of the distance of points in cluster k to the center of cluster k. When k in d k is replaced by j, d j can be obtained. In addition, d kj can be calculated according to the following equation: The actual signal x (n) = s (n) + x 1 (n)

Self-adaptive filter
Noise signal sampling x 0 (n) It represents the distances between the centroids of the kth and the jth clusters. c kh is the hth component of the centroid of cluster k and d kj is the Minkowski metric of the centroids which characterizes clusters k and j. Specifically, p � 1, d kj is the Manhattan distance between centroids p � 2, d kj is the Euclidean distance between centroids. e DB index can reflect the degree of within-cluster dispersion and between-cluster separation. So, the true number of clusters may be determined according to the minimum value of the DB index. Robert Tibshirani et al. proposed the gap statistic method for estimating the number of clusters in a set of data [27]. A graph of the within-cluster dispersion versus the number of clusters k for a clustering procedure shows that the within-cluster dispersion decreases monotonically as k increases, but from some k onwards, the decrease becomes flatter obviously. Such position is called 'elbow', it often implies the appropriate number of clusters. e gap criterion gives an approach to estimate the number of clusters by locating this 'elbow'. erefore, under this criterion, the optimal number of clusters occurs at the largest gap value. e Gap index is defined as follows:

Gap Index.
where N represents the number of points, K represents the number of clusters that are evaluated, U K defined in (17) represents the within-cluster dispersion degree.
where N k is the number of points in cluster k, H k is the sum of the distances of any two points in the kth cluster. e expected value E N log(U K ) is determined by Monte Carlo sampling from a reference distribution. e Gap index can also be used for any distance metric. e WCM belongs to unsupervised categorization technique, which can help us find the centroid of each cluster. However, the classification accuracy of this method is limited, which makes the method not able to be used directly for signal recognition. Comparatively, PNN can effectively improve the accuracy of classification and identification [28,29].

Probabilistic Neural Network Classifier.
As a method of nonparametric Parzen windows estimation, PNN is first proposed by Specht. It is a nonlinear classification technique and essentially a parallel algorithm based on Bayesian minimum risk criterion [30]. Given a sample to be identified x, its posterior probability P(S k |x) can be obtained by PNN classifier. However, if the probability densities of the classes to be separated are unknown, the training samples with known identity need to be used to estimate them. Finally, the trained PNN is used to determine the identity of x. A typical PNN classifier consists of an input layer, a pattern layer (hidden layer), a summation layer and a output layer. e flowchart of the PNN is shown in Figure 2. e input layer neurons are used to receive values from training samples and send data to the neurons in the pattern layer, which is fully connected to the input layer. e number of neurons in the input layer is equal to the length of the input vector. e number of neurons in the pattern layer is the same as the number of training samples. Here, all neurons are collected into different groups, and the ith neuron in group k corresponds to a Gaussian function f (k) i (x, σ), i � 1, 2, · · · , m k , where m k represents the number of neurons in group k, k � 1, 2, · · · , K. Gaussian function which is also called probability density function is defined as follows: where v is the dimension of the input vector ij is the jth component of the ith neuron in class k. e so-called smoothing parameter σ ∈ (0, 1) determined experimentally by comparing their corresponding classification accuracy plays an important role in estimation error of the PNN classifier. e outputs of pattern layer are connected to the summation units depending on the class of patterns.
ere is one neuron for each group, and each neuron in summation layer sums the outputs derived from the pattern layer neurons as follows: Finally, the output layer neuron output a number 1 and multiple numbers 0. e value of 1 corresponds to the classifier's decision result for input vectors. More specifically, the input vector x belongs to class k if p k (x) > p k′ (x) for all k ′ � 1, 2, · · · , K and k ≠ k ′ .
Hence, the main purpose of training PNN is to find the optimal estimate of probability density function according to the training samples and their labels, to ensure that the classifier works at the condition of minimum error rate and risk. When the samples to be identified are sent to the pattern layer, the output of each neuron is calculated according to the trained density function. Finally, the identified results are obtained through computations in the summation layer and output layer. Due to the following advantages, it is a wise choice to use PNN as a further classifier to classify signals [31]: (1) It has a simple structure, and it is easy to train. In the PNN based on probability density function estimation, the weight of the neuron in pattern is directly taken from the input sample value. (2) e training process of the network is simple, and there is no need to retrain for a long time when adding or reducing the number of groups. (3) It is not easy to produce local optimal solution, and its precision is higher than that of other classification approaches. No matter how complex the classification problem is, as long as there are enough training Computational Intelligence and Neuroscience samples, the optimal solution under the Bayes criterion can be obtained.

Flowchart and Algorithms of Classification
Scheme. e flowchart of the proposed classification algorithms is shown in Figure 3. Figure 3 indicates that the proposed scheme is composed of four modules, that is, data processing module, preclassification module, evaluation module, and accurate classification module. In the evaluation module, the clustering validity indexes are used to determine the range of K if it is unknown. For each K ∈ [K Min , K Max ], the classification validity index D is calculated as follows [16]: where N 1 is the number of input vectors, p kj is the element of the matrix Q of size K × N 1 in the output of PNN's pattern layer representing the membership of the jth input vector to the cluster k. When N 1 � 1, p k1 is equal to p k (x) presented in (19). When N 1 > 1, matrix Q � (p kj ) K×N 1 can be obtained by PNN. max 1≤k≤K p kj is the largest element of the jth column in the matrix Q. Equation (20) indicates that D(K) is a nonlinear function related to K, when N 1 ≠ N 1 j�1 max 1≤ k≤K p kj . erefore, K corresponding to the maximum value of D(K) is the optimal number of clusters K * . e pseudocodes are listed in Algorithm 1 if K is known.
If the classification number K is unknown, WCM and clustering validity indexes are used to determine the range of K. e corresponding pseudocodes are listed in Algorithm 2. e algorithms show that the supervised learning PNN is used to classify samples. erefore, the training (teaching) samples must be selected first. By Ward clustering method, we have obtained a preliminary classification of all samples.
at is, the identities of some samples have been determined, except for some boundary points, which need to be further determined by the trained PNN. erefore, some labeled proximity points x kj (k � 1, 2, · · · , K, j � 1, 2, · · · , J k ) around the center c k can be selected to train PNN, where J k represents the number of samples selected in class k and it should be preset, such as J k � ⌈a · |S k |⌉, a ∈ [0.6, 0.8].

Classification Experiments.
A signal set Rs sampled from some pulse emitters is used to test the effectiveness of the proposed algorithms. Each emitter emits continuous signals in the pulse state. After a period of time, the receiver will receive multiple signals from all emitters. ese signals are converted into digital signals by the analog-digital converter. e sampling frequency is 1.01 MHz. Signal samples y i (i � 1, 2, · · · , 500) are randomly extracted from these digital signals. e signal set Rs � [y 1 , y 2 , · · · , y 500 ] T and the dimension of y i is 1024. Considering that each y i is disturbed by signals from other emitters, the mean value of all signals is used as the noise signal Is � [Is(1) Is(2) · · · Is(1024)], where Is(n) � 500 i�1 y i (n)/500, n � 1, 2, · · · , 1024. First, the self-adaptive filtering is applied to process these signals. In this algorithm, the actual signal x(n) corresponds to y i (n) and the noise signal x 0 (n) corresponds to Is(n).
en, Fourier transform is used to obtain the amplitude spectrum of all processed signals. e amplitude spectrum of the thirteenth signal is shown in Figure 4. Figure 4(a) shows the sampling signal contains obvious white noise, which makes the feature of the useful signal unclear. However, most of the noises are removed after using the self-adaptive filtering. Moreover, the characteristics of the signals are highlighted so that the amplitude spectrum of  them can be analyzed correctly. For these transformed signals, the clustering dendrogram can be obtained by using the WCM, and it is shown in Figure 5. When the signals are divided into 3, 4, 5, and 6 classes, the intercluster distances are 75.5, 57.5, 27, and 14, respectively. e increments of the distance between the clusters are 13, 30.5, and 18 in turn. If the number of elements in each class is required relatively close, the ideal number of classifications is 3, 4, and 5. However, these numbers need to be further determined by the clustering validity indexes. e number of clusters K corresponding to different evaluation indexes is shown in Figure 6. Figure 6 shows that the optimal number of clusters is 3 when the DB index and the Silh index are used, and it is 2 when the CH index is used. However, it becomes to 5 when the Gap index is used. erefore, the optimal number of classifications should belong to the interval [2,5]. In this case, the PNN needs to be used to obtain more accurate results.
For each K in this interval, seventy (a � 0.7) sample points nearby each c k are selected to train PNN classifiers. en, the optimal classifier is obtained by calculating the maximum value of D(K).
e results have shown that D(5) � 1, while the other values are less than D (5). at is to say, the optimal number of classifications is 5. Since the size of matrix Rs is 500 × 1024, three columns are randomly selected as the X-axis, the Y-axis, and the Z-axis to plot the classification results diagram. Let Ac k � y 1k (1) y 1k (2) · · · y 1k (1024) y 2k (1) y 2k (2) · · · y 2k (1024)  Computational Intelligence and Neuroscience is the jth sample in class k, and j � 1, 2, · · · , M k . us, Ac � [Ac 1 Ac 2 · · · Ac 5 ] T is a matrix of size 500 × 1024. If the data on columns a 1 , a 2 , a 3 of Ac are selected to form a matrix Ac of size 500 × 3, each row of Ac is a three-dimensional array, that is, a point in the coordinate system. When a � 0.5, σ � 1, scatter plots of these data are shown in Figure 7. e first column of Ac corresponds to the X-axis, the second column of Ac corresponds to the Y-axis, and the third column corresponds to the Z-axis. erefore, all signals can be thought to come from five emitters, and each emitter emits 100 signals. Although only three distribution figures of all classified signals are presented in Figure 7, in fact, in our experiments, we have obtained more than 1000 scatter plots which are drawn by randomly selecting three columns from matrix Ac. In these classification results, the distributions of sample sets are

Flowchart and Algorithms of Identification Scheme.
Besides classification, data identification is an important function of PNN. Since PNN is based on the maximum posterior probability, it will give the optimal solution under the Bayesian criterion, whether or not the samples to be identified Ix i belong to the five determined classes that have been obtained in Section 3. So, if a sample belongs to one of them, it will accurately identify it. However, it may lose its function when the sample do not belong to them. At this time, the amplitude spectrum of all samples can be analyzed before, so that they can be identified whether they belong to the determined classes in advance. Considering that it is difficult to comparatively analyze the amplitude spectrum of the signals to be identified and each signal in every class, we adopt the curve fitting method to find the feature sequences of each class and the signals to be identified. Finally, the correlation degree of these sequences is calculated to get the preliminary identity information of the samples to be identified. is method is called Bivariable correlation analysis, and it is introduced as follows.
Step 1. Simplifying amplitude spectrum. Let L i represent the length of the signal Ix i , Fs i represent the Fourier transform of . For each training sample x kl in class k, the corresponding sequence F kl can be obtained according to the same method, where k � 1, 2, · · · , K * , l � 1, 2, · · · , L, i � 1, 2, · · · , p, and L represents the number of training samples in class k, and p represents the number of samples to be identified.
Step 2. Fitting curve. e fitting curve z i can be obtained by fitting the sequence F i and the fitting curve y kl in class k can be obtained by fitting the sequence F kl .
Step 3. Constructing the feature sequence. First, for different k, y kl (t), t � 1, 2, · · · , T can be calculated after giving the upper bound T. en, the tth signal feature of the class k can be obtained by the following equation: Finally, Cs(k) � Y k (1), Y k (2), · · · , Y k (T) is the feature sequence of the class k. Similarly, the feature sequences Z i � Z i (1), Z i (2), · · · , Z i (T) , (i � 1, 2, · · · , p) of the samples to be identified can be obtained.
Step 4. Correlation analysis. e correlation coefficient between the Z i and the Cs(k) can be calculated as the following equation: where represents the sample covariance of Cs(k) and Z i , are the sample standard variation of Cs(k) and Z i , respectively. e performance of the correlation test indicates that the sample to be identified belongs to the corresponding class when r ≥ 0.95. All samples that do not satisfy this condition should be removed. Finally, the remaining samples can be effectively identified by the trained PNN. e flowchart of the proposed identification scheme is shown in Figure 8. Figure 8 shows that the flowchart mainly includes two blocks. One is the preidentification module, and the other is the identification module. e role of the previous module is to eliminate the sample with a small correlation to the determined classes. When the correlations between the samples to be identified and several different classes are high, PNN will accurately identify it based on Bayesian criteria. e pseudocodes are given in Algorithm 3, which is called identification algorithm.
rough the methods proposed in Section 3, all signal samples have been classified to five classes. at is to say, the identity of each sample has been determined.
us, the identification algorithm consists of two parts. On the one hand, some proximity points x kl around the center c k of class k are selected to train PNN, k � 1, 2, · · · , K * , l � 1, 2, · · · , L. x kl is the input vector of the network, and its label is the output of the system. On the other hand, the trained PNN classifier is used to determine the identity of Ix i , i � 1, 2, · · · , p. If the dimension of Ix i is greater than x kl , the method of adding time window can be used to adjust their dimensions. Let D x represent the dimension of x kl and D Ix represent the dimension of Ix i . e time window Ts � ºD Ix /D x Ø is used to reduce the value of D Ix . us, the adjusted samples to be identified Ix i � [Ix i (1), Ix i (1) + Ts, · · · , Ix i (1) + (D x − 1)Ts]. When D Ix < D x , the same method can be used to reduce the dimension of x kl and make it consistent with D Ix . At this moment, D Ix and D x need to be exchanged.

Identification Experiments.
By adjusting some parameters of emitters, such as pulse width and output power, signals that are different from y i (i � 1, 2, · · · , 500) can be obtained. Suppose that sample Ix i (i � 1, 2, 3, 4) is randomly selected from these signals for identification and the dimension of Ix i is 10240. x kl represents the lth training sample in class k, Bx kl is the label matrix. First, Ix i � [Ix i (1), Ix i (1) + 10, · · · , Ix i (1) + 10230] can be obtained according to the method of adding time window.
en, the single-sided amplitude spectrum of center c k for class k is shown in Figure 9; k � 1, 2, · · · , 5. e double-sided amplitude spectrum of Ix i (i � 1, 2, 3, 4) is shown in Figure 10. Finally, the identity of Ix i (i � 1, 2, 3, 4) is preliminary determined by observing the similarity between these amplitude spectrum.
It is observed that the amplitude spectrum of Ix 1 is similar to that of the signals in class 2 or class 3, and Ix 2 is likely to belong to class 5. But, the class of Ix 3 and Ix 4 is difficult to determine only by observation. At this time, it is necessary to carry out correlation test for these samples. e test results are shown in Table 1 when N 2 � 300 and T � 100. Table 1 shows that r 12 � 0.9704 and r 13 � 0.9945, which are all greater than 0.95. So, Ix 1 might be the element of class 2 or class 3. Ix 2 should be classified into class 5. ese results are consistent with the observations. Ix 3 and Ix 4 should be removed before formal classification. erefore, it is  Computational Intelligence and Neuroscience necessary to identify Ix 1 and Ix 2 by using PNN. L points are selected as training samples in each class to train PNN, L � 70. Finally, e trained PNN is used to judge the identity of Ix 1 and Ix 2 . Since Ix 1 corresponds to Ix 1 , Ix 2 corresponds to Ix 2 , we can obtained the identity of Ix 1 and Ix 2 by identifying Ix 1 and Ix 2 . e identified results are shown in Figure 11 when σ � 1.
In order to obtain intuitive results, the data on columns 41, 256, 682 of Ix 1 are selected to form a three-dimensional array [Ix 1 (41) Ix 1 (256) Ix 1 (682)]. Data on the same columns of Ac are selected to form a matrix of size 500 × 3. All data are drawn in Figure 11(a). Obviously, Ix 1 should be the element in class 3 (black solid ball) and Ix 2 should be the element in class 5 (red solid ball). e same results can also be obtained when the data on columns 35, 709, and 929 of Ix 1 and Ac are selected. e corresponding results are drawn in Figure 11(b).

Comparative Experiments
e performance of the proposed algorithm can be reflected through comparative experiments. erefore, it is necessary to compare it with some usual identification methods, such as support vector machine based schemes [32], particle swarm optimization and support vector machine based schemes [12], artificial neural networks and intelligent filter based schemes [33], PNN and simplified fuzzy adaptive resonance theory map neural networks based schemes [15], and fuzzy c-means based schemes [34]. When the class label of each signal is known, the performance of the above methods can be compared by calculating the classification accuracy and the identification accuracy. e comparison results are shown in Table 2. e samples that cannot be correctly judged are shown in Figure 12. Table 2 shows that the classification accuracy obtained from Liu's and our methods is 100%. e value of identification accuracy obtained from the proposed method also reaches 100%. Figure 12(a) shows that some algorithms fail to give the correct classification results for some signals. In terms of signal classification, the number of samples without correct classification for each method is 9, 0, 18, 11, 27, and 0. By comparison, the identification accuracy derived from Zarei's and Cannon's scheme is poor, and they fail to identify the sample 3 and 4 ( Figure 12(b)). is is mainly because artificial neural networks and fuzzy c-means algorithm can give a judgment result for each signal, whether or not the signal belongs to the determined classes. Although the PNN has some similar problems, they can be skillfully solved by the method of Bivariable correlation analysis. It makes the proposed method have unique advantages in signal identification.

Concluding Remarks
It is an indisputable fact that the probabilistic neural networks can be used to classify and identify patterns. It has a wide range of application, including the identification of emitter signals. In this paper, a novel classification and      identification scheme, which is designed by the WCM and the PNN with correlation analysis, has been proposed for emitter signals. e scheme starts with self-adaptive filtering processing and spectrum analysis, then the WCM and clustering validity indexes including CH, Silh, DB, and Gap are utilized to determine the range of the optimal number of clusters. For different classification number K, 70 samples nearby each center are selected as training samples to establish PNN classifiers. Finally, the optimal PNN classifier which is used to identify signals is determined by the maximum of the classification validity index D(K). At this stage, the method of Bivariable correlation analysis is cleverly used to improve the identification accuracy of PNN classifier. Experiments show that the proposed method can obtain higher accuracy, and it is more stable than other schemes in identification problems.
Finally, it should be pointed out that the scheme presented above is mainly used to classify the obtained signals, which are derived from some pulse emitters. e classification and identification of signals derived from some continuous wave emitters or the mixed emitters of these two types are the next topic to be studied. In addition, the proposed method can also be used to identify other signals, such as, biomedical signals and monitoring signals of digital virtual assets. When the data set to be classified are not numeric, they can first be converted to binary string and then converted to the required format.

Data Availability
Data involve secrets and need to be kept confidential.

Conflicts of Interest
e authors declare that there are no conflicts of interests regarding the publication of this paper.