Automatic Recognition of Communication Signal Modulation Based on the Multiple-Parallel Complex Convolutional Neural Network

School of Electrical and Electronic Engineering, Wuhan Polytechnic University, Wuhan, Hubei 430070, China School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou, Henan 450002, China MOE Key Laboratory of Image Processing and Intelligence Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China


Introduction
Secure and efficient transmission of information is the basic requirement of wireless communication. In the actual communication system, the baseband signal cannot be transmitted directly due to the channel spectrum characteristics and the modulation is usually used to load the code element information carried by the baseband signal to the digital characteristics of the sinusoidal signal and then transmit it through the antenna [1]. Depending on the specific channel type and occupancy, the choice of modulation type can vary, and in turn, the signal after different modulations can exhibit different structural and statistical characteristics in the timefrequency domain. Modulation recognition refers to the use of mathematical models such as machine learning to select the correct modulation type for a received signal, from a given number of modulation systems, after supervised training. MR is an important application of pattern recognition in the field of signal processing and has significant practical importance in both collaborative and noncollaborative communication [2]. With deep learning networks being widely used in speech recognition, image feature learning, etc. and achieving many practical results, it has led researchers to introduce deep learning into the process of modulation pattern recognition of communication signals, hoping to make communication devices capable of self-learning and selfrenewal, so that they can better cope with the problems and challenges brought about by the complex electromagnetic environment and the increase of modulation patterns in the future [3].
Modulation pattern recognition is one of the key technologies for software radio, communication countermeasures, and illegal spectrum monitoring, which has important military and civil values. In the military field, the future war is an information-driven war and electronic countermeasures are an important part of information warfare and modulation pattern recognition is one of the functions that must be considered in the receiver design process, where the receiver automatically demodulates the intercepted signals through modulation pattern recognition technology. In electronic reconnaissance and jamming, modulation technology is the key to implement precise jamming, which in turn disrupts enemy communications. In civil applications, modulation pattern recognition technology is a key technique to identify the illegal use of frequency bands to prevent spectrum abuse and interference with normal communications [4]. Traditional modulation pattern identification methods include maximum likelihood hypothesis testing methods and statistical pattern recognition-based methods, the former with high complexity and poor robustness to models and parameters and the latter with performance strongly correlated with human-selected feature parameters. In recent years, machine learning (ML) technology is hot and ML is one of the most important branches of artificial intelligence, which can classify and predict data intelligently and has excellent recognition performance, and the research of modulation pattern recognition applying ML is receiving more and more attention and interest. Therefore, this paper studies deep learning-based algorithms for digital communication signal modulation and recognition.
This thesis focuses on the deep learning-based modulation pattern recognition method for communication signals. The scheme adopts a multibranch CNN architecture to realize the convolutional mapping of the input signal in the complex domain and complete the preprocessing work of signal denoising and channel equalization to improve the input for modulation recognition; it investigates the impact of abstract features learned by CNN and artificially designed expert features, multiple machine learning classification models on modulation recognition, and modulation recognition algorithms; with the help of a general-purpose software radio platform, a variety of modulation signal sequences are collected with the help of a general-purpose software radio platform, the data sets used for training and testing are established, and the algorithms are designed and coded and validated by a deep learning framework and software platform. The first chapter is the introduction part of the thesis, which introduces the background and significance of the thesis and finally gives the research content and structural arrangement of the thesis. Chapter 2 is the related work section, which systematically describes the research status and analyzes the advantages and disadvantages of domestic and foreign technologies in modulation identification, signal denoising, and channel equalization. The third chapter analyzes and studies the communication signal feature processing and explains the specific implementation of the algorithm, and finally, the design study of the modulation identifier is carried out. Chapter 4 is the analysis of the results. By analyzing the performance of the algorithm proposed in this paper and simulation tests, the method can identify the modulation patterns of communication signals well under low signal-to-noise ratio, which proves the feasibility and effectiveness of the method. Chapter 5 summarizes the full text of the work and provides an outlook.

Related Work
The maximum likelihood hypothesis testing method based on decision theory theoretically ensures that its decision results are optimal under the Bayesian least-cost criterion and can guarantee the performance of the method in a certain low signal-to-noise environment. However, the main drawbacks of the method are as follows: more a priori knowledge is required, such as signal-to-noise ratio, carrier frequency, symbol rate, oversampling multiplier, and other parameters, and secondly, the existence of unknown parameters leads to a complex computational push-to process and high computational complexity, which is difficult to implement in practical production [5]. The classification function is relatively single, and all common modulation types cannot be identified by one framework; in the actual channel, the noise is not necessarily Gaussian white noise and there are multipath fading, time delay, Doppler effects, and other effects [6]. The maximum likelihood hypothesis testing method based on decision theory is more susceptible to these influences and less robust [7]. Mishra et al. implemented deep learning-based denoting by improving the deep network structure. In the literature [8], the authors used ResNet to remove rain noise from photographs and the experimental results showed that the model has superior denoting performance [9]. Gupta et al. utilized a single-layer, stacked LSTM network to identify seven signals such as 2ASK, 4ASK, BFSK, 4FSK, BPSK, QPSK, and 16QAM [10]. Patnaik et al. used CNN and DBN to identify CPFSK, BPSK, and 16QAM signals [8]. The cascade structure is poorly parallelized; the model is too complex, and the training is time-consuming. In addition, the temporal signal loses some timing features after the convolutional layer [11].
In order to take full advantage of the temporal and structural features of the signal, this paper studies CNN and LSTM parallel modulation-style recognition algorithms. In order to further improve the signal recognition performance, the integrated learning algorithm of the heterogeneous basis classifier is also studied [12]. Since deep learning can automatically extract features from large-scale data and perform learning, it can be quickly developed and widely used [13]. Although the deep learning parameter design still lacks some theoretical proofs and has some shortcomings, practice has proved that applying deep learning of the modulation recognition problem 2 Wireless Communications and Mobile Computing is a very effective method and it is worthy of in-depth research and exploration.

Machine Learning Decision Theory-Based Communication Signal Modulation Pattern Recognition
3.1. Communication Signal Feature Processing. With the increase in computer processing speed and storage capacity, the design and implementation of CNN has gradually become an exhibition trend [14]. In this chapter, a convolutional neural network-based communication signal modulation pattern recognition method is used to determine the modulated signal [15]. Firstly, the received modulated signal is preprocessed by normalization and time-frequency feature image generation to generate the test set of the training set required for network training; secondly, the classifier for modulated pattern recognition of communication signal, CNN, is designed and built and the training set is input to CNN for training to obtain the CNN network model; finally, the modulated signal to be recognized is preprocessed to generate the test set of the dataset and the CNN network model is obtained. Finally, the modulated signal to be identified is preprocessed to generate a test set in the dataset and the training set is input into the CNN network model to identify the modulation pattern of the communication signal. The method takes the time-frequency domain map as the input and the modulation pattern of the signal as the output, and the specific algorithm flow chart is shown in Figure 1. The digital signal (−1, +1) to be transmitted is mapped to N carriers (IFFT) and superimposed together and then sent. For example, there is a bandwidth of 1000 M and the size of the IFFT is 1024. In this case, the bandwidth of 1000 M is divided into 1024 parts. The purpose is to reduce the bandwidth and thereby reduce the ISI. Then, the frequency of the first carrier is 1000 M/1024, the frequency of the second carrier is 2 * 1000 M/1024, and so on. The function of this part is to make N carriers orthogonal to each other to eliminate interference between carriers.
The effect of noise and the channel makes the communication signal amplitude vary very much. If the acquired time domain signal is used directly for time-frequency analysis, it will cause some difficulties in processing the time-frequency map after time-frequency analysis. Therefore, the signal has to be normalized [16]. In this paper, the acquired time domain signal will be normalized by the min-max normalization method, which is indicated below. The amplitude of the modulated signal is simply scaled by the min-max normalization method, and the modulated signal amplitude is taken to be compressed between the interval [0, 1], so that the amplitude values of the modulated signal are relatively close to each other and the effect of channel fading on the signal amplitude is reduced.
A feature is an abstract representation of an object or a class of objects. Relative to objects, features use a set of lowdimensional tensors to express the focal properties of the original object and are the key to distinguishing multiple objects. The features together with the training set data determine the theoretical upper limit of the machine learning task, and the models and algorithms are intended to approximate this limit as closely as possible. Therefore, the selection of features should be cantered on the task [17]. For the modulation recognition task, the selection of features can be divided into two stages: one is based on manually designed export features; the other is based on CNN self-learning abstract features. This section will elaborate and analyze these two kinds of features [18].
In the convolutional layer, each convolutional kernel can be considered as a linear system for extracting a certain feature, but before the training of the network, the operational parameters of the whole system are unknown, so the weight parameters of the convolutional kernels are randomly initialized, and the parameters of this system can only be updated by the BP algorithm to continuously optimize by reducing the value of the objective function, and when the training is completed, the system can be used to extract the input features. The cascade of convolutional layers allows the input signal to be mapped by layers of abstraction to obtain the feature vector needed by the classifier.
In the encoder stage, it is downsampled by the pooling layer to compress the size of the output feature map, after the signal is mapped by the convolution layer of CAE, and after several identical operations, it reaches the bottom convolution layer and the output of the bottom layer can be considered as the abstraction of the original input signal h (n); in the decoder stage, these abstract features are continuously up sampled by the deconvolution layer and finally reach the top output layer. The goal of CAE is to make the output consistent with the input, and the computational process can be characterized as "compression before reconstruction" [19].
The result obtained after the time-frequency analysis of the modulated signal is a representation of the modulated signal in the time-frequency plane, which cannot be precisely input into the deep neural network model for processing. Therefore, it is necessary to convert the time-frequency map of the signal to generate a digital image first and then use the deep learning algorithm to identify the modulation of the signal [20]. Usually, the color image can be greyed out by using the maximum value method or the component method or the average method. Among them, the component method uses one of the three components of R, G, and B in the color image as the gravy value; the maximum value 3 Wireless Communications and Mobile Computing method uses the maximum value as the gravy value; the average value method uses the average of the three components of R, G, and B in the color image as the gravy value. Different gray scale processing methods will produce different gray scale feature images [21]. In this paper, we select the gray scale processing method of the average method and its gray scale value calculation formula is By generating gray scale feature images of modulated signals with different signal-to-noise ratios, it is found that when the signal-to-noise ratio is low, the modulated signals are more affected by the background noise, which makes them appear more disordered in the gray scale feature images and the feature information becomes somewhat blurred [22]. When the signal-to-noise ratio is high, the modulated signals are less affected by the background noise and their gray scale feature images are clearer and more regular. By comparing the binary feature image and gray scale feature image, it is found that the medium gray scale feature image can retain almost all the original feature information and has certain noise immunity.
For noise, Gaussian white noise can be used, because the noise of general signals is mainly divided into two categories-one is external noise and the other is internal noise of the receiver. The general noise characteristics of the collected signal are very similar to Gaussian white noise, so it is appropriate to replace the internal noise of the receiver with Gaussian white noise. For the amplitude of the noise level, in order to ensure a certain detection probability, the signal-to-noise ratio is required to be greater than 10 db.

Research on Modulation Recognition
Algorithm Based on the Neural Network. Decision trees can be divided into classification trees and regression trees based on the nature of the data labels. When the data labels are continuous values, we call the decision tree in a regression tree; when the data labels are a series of discrete values, it is referred to as a classification tree. Each leaf node of the classification tree represents a classification result, and the branches of the tree are equivalent to the features on which the classification is founded. Since the digital signal modulation identification in this paper is a classification problem, the decision trees discussed refer to classification trees without separate emphasis. The creation of a classification tree can be summarized as follows: training data is input to the decision tree model and new branches are derived from the root node to the leaf nodes using a recursive approach based on the direction of data flow determined by the judgment conditions in the internal nodes until a leaf node is generated. Classification trees can be generated using a variety of algorithms [23].
Classification trees are generated by discriminating branches according to internal node conditions, which are essentially a feature selection process. In algorithms such as ID3, information entropy is used to perform feature selection [24]. The so-called information entropy can be understood as a quantitative indicator of data uncertainty. For example, for the training data set S, the information entropy is calculated using the following formula: In the CART algorithm, the concept of information first is not continued but replaced by the GINI value [25]. If there is a data set S, the actual set of categories to which the data set belongs is L (N):   Wireless Communications and Mobile Computing CNN consists of a series of convolutional layers, pooling layers, and fully connected layers, of which the convolutional layer is the core of CNN [26]. Convolution is a mathematical operation performed on two functions, a linear operation that satisfies the exchange law of addition and the union of multiplication, and other operational properties can be divided into the following notation.
The above equation is a convolution operation on a onedimensional continuous time system, where V ðxÞ represents the input to the model; W ðxÞ is called the kernel function, and U ðxÞ is the model output. The two-dimensional discrete convolution can be expressed as follows: In the convolution operation of the input, a large number of feature maps will be obtained; if it was directly input to the next layer, it will make the input signal of the next layer too large. Generally, before the feature map is input to the next layer, the output feature map pooling operation, on the one hand, can reduce the number of parameters and training time; on the other hand, it can reduce redundancy and enhance generalization. The pooling function generally uses the overall statistical features of the neighbouring outputs at a specified location to replace the network's output at that location. Before the final output, CNN changes the obtained feature map into a one-dimensional form before the classification output [27].
Communication signals not only have temporal characteristics but also have different constellation maps for different modulation signals. This suggests that communication signals have strong spatial characteristics, so this paper explores the CNN-based modulation pattern recognition method [28]. For the received baseband I and Q signals, they are concentrated to form a 2 * 256 size matrix, which is denoted by I 2 * 256 , and the elements in I 2 * 256 are denoted by Iðm, nÞ. For signals, different processing of I and Q information will produce distinct recognition effects, and specifically for CNN networks, different kernel sizes will directly affect the classification performance.
After completing the training, the algorithm can integrate the base classifiers by linear weighting. In the first step, the initial value of the weight is determined here. It should be pointed out that H ðXÞ is called the weight of the base classifier, which essentially means the importance of each classifier. The specific definition is as follows.
Next, DX is updated, and finally, the base classifiers of X iterations are summed to form the final classifier H. In the process of updating the parameters of the network iteratively, the stochastic gradient descent method is used because the descent gradient of the optimized objective function does not depend on a single sample for calculation during the network training process but each iteration samples a random portion of data from the training data set and passes it into the neural in the network training process. The gradient of the decline of the optimization objective function does not depend on a single sample for calculation. The specific flow of the modulation recognition algorithm in this article is shown in Figure 2. 3.3. Modulation Identifier Design Study. Simulation analysis shows that the time-frequency map of the signal can describe the modulation pattern characteristics of the communication signal well. In this paper, the time-frequency map is used as the input of the CNN and the size of the time-frequency image is generally 32 * 32, 64 * 64, and 128 * 128. First, this paper will compare the performance of neural networks with a different number of convolutional layers; then, it will compare the neural networks with different input sizes; finally, it will compare neural networks with different convolutional kernel sizes to get an optimal network model.
Considering the characteristics of the modulation pattern of the communication signal and the size of the timefrequency map, the CNN contains a convolutional layer and a pooling layer to extract the effective feature vector; the nonlinearity of the network model is provided by the activation function Re LU used after each convolutional layer, and using the Re LU function as the activation function can suppress the gradient disappearance or explosion that occurs during the training of the network; the last layer is the final layer and is a fully connected layer used to integrate local features to obtain global features of the input data; finally, the global features are classified and identified by using the SOFTMAX activation function [29][30][31][32][33]. The following will illustrate the effects of different neural network structures on the classification recognition of modulated signals from three aspects: the number of nonconvolutional layers, the input size, and the convolutional kernel size.
In this paper, convolutional kernels of size lama are used extensively, which is equivalent to convolutional kernels to map the real and imaginary parts of the input signal h ðnÞ separately. The h ðnÞ can only characterize the amplitude information of the signal, while the frequency and phase information need to be jointly characterized by the real and imaginary parts. However, according to the convolutional layer algorithm, if a convolutional kernel of size 2 × m is used, the size of the output feature map will be reduced to 1 × 1024, so only one convolutional layer of 2 × m is used in RESTNET. In order to obtain the resonant feature mapping of the convolutional layer for the frequency and phase of the input signal, this paper proposes using MR-Net in series with the traditional sequential structure to build a CNN as a modulation recognition network. Where the input signal s ðnÞ is the recovered signal from the received signal y ðnÞ input to the pretrained Recover-Net, finally, the classification probability vector is obtained using the SOFTMAX layer output.
This paper proposes to verify the communication modulation awareness algorithm in a real environment by building 5 Wireless Communications and Mobile Computing a wireless communication transceiver platform. NI-USRP is based on the public version of the software radio platform USRP Radio by NI Instruments, and some of the external circuitry is modified. NI-USPR hardware has a common software-defined radio (SDR) architecture, and in its FPGA digital signal processing logic, the communication transmitter modulates user data into digital baseband data and the output becomes analogy baseband signal I/Q by high-speed digital-to-analogy converter DAC, and after filtering out high harmonics and spurious by a low-pass filter, the analogy baseband signal is up converted to transmit carrier frequency 915 MHz by analogy RF quadrature, and the communication transmission is completed by power amplification and antenna; at the communication receiver, the antenna receives the faint at the receiving end, the weak wireless communication signal received by the antenna is adjusted by low-noise amplification, then, the analogy RF quadrature downconversion is completed, and the I/Q baseband signal is output to the high-speed analogy-to-digital converter ADC, and the quantized IQ data stream is sent to the FPGA digital signal processing logic. The FPGA digital signal processing logic is used to implement the digital downconversion (DDC) at the receiver side and digital upconversion (DUC) at the transmitter side. In the figure, the upsampling and downsampling logic adjusts the sampling rate of the complex baseband signal to achieve efficient processing of the nonbroadband communication signal and to facilitate further processing via Ethernet to the host computer.

Results and Analysis
4.1. Algorithm Performance Analysis. The test set of 10,000 data samples in the radio signal data set was tested on the classic convolutional neural network and the improved convolutional neural network, and the final prediction results were obtained. The modulation pattern recognition accuracy results of different algorithms are shown in Figure 3. It can be seen in Figure 3 that the improved convolutional neural net-work has a 3.5% higher modulation pattern recognition accuracy than the classic convolutional neural network. The results show that the improved convolutional neural network and classical convolutional neural network proposed in this paper have advantages in the accuracy of modulation pattern recognition.
The running time of the CNN algorithm with different numbers of convolutional layers is given in Figure 4. It can be seen that the running time of the CNN algorithm increases with the increase of the number of convolutional layers. Considering the ACRP and the running time of CNN algorithm, this paper determines to use two convolutional layers, with the first layer having 14 dimensions and the second layer having 5 dimensions.     Wireless Communications and Mobile Computing Figure 5 shows the correct recognition rates of the proposed CNN algorithm for BPSK, QPSK, 8PSK, 16APSK, 16QAM, 32QAM, and 32APSK signals. As can be seen from the figure, the correct recognition rate of all the seven signals is higher than 92.13% when the signal-to-noise ratio is greater than 2 dB; the CNN algorithm has the highest recognition rate for 16QAM signals, followed by QPSK signals; the CNN algorithm has the worst recognition performance for 32QAM signals, followed by 8PSK signals.
The individual signal recognition rates of the CNN algorithm show the seven signals when SNR = 6 dB are shown in Figure 6. It can be seen that at the SNR of 4 dB, the CNN algorithm has good recognition performance and only 40 samples are misconceived overall and the ACRP reaches 98%; the recognition rates of 16APSK, 16QAM, and QPSK signals all reach 100%, and the recognition rates of 32APSK and BPSK signals reach. The recognition rates of 16APSK, 16QAM, and QPSK signals are 99.99%, 32APSK and BPSK signals are 99.7% and 98.91%, respectively, and the recognition rates of 32QAM and 8PSK signals are 93.9%.

Communication Signal Simulation Test Analysis.
For the simulation test, a test set containing 5 modulated signals of 2ASK, 2FSK, 2PSK, AM, and FM was used, of which the number of samples in the test set was 3000. The average recognition accuracy of the five types of signals from −10 dB to 10 dB is tested, and the test results are presented in Figure 7(a). Recognition accuracy of each signal-to-noise ratio from −10 dB to 10 dB for each of the five types of signals is shown in Figure 7(b). As the S/N ratio increases, the average recognition rate of each type of signal also increases gradually, and finally, the average recognition rate of all five types of signals at 10 dB is greater than 97.1%. The recognition rates of 2FSK and AM are 99.99%; the recognition rate of FM increases rapidly from 99.3% to 99.76% at the beginning and then remains unchanged; while the recognition rates of 2ASK and 2PAK are relatively low at −10 dB, 84.3% and 90%, respectively. With the increase of SNR, the recognition rate of 2FSK and the recognition rates of 2FSK and FM signals also increase as the S/N ratio increases. However, when the signal-to-noise ratio is small, the energy of the noise is higher than the energy of the signal, so that 2ASK and 2PSK are drowned in the background noise, leading to a decrease in the recognition rates of 2ASK and 2PAK. Finally, the recognition rates of both 2ASK and 2PAK reach 97.2% at 10 dB. It can be concluded that the higher the signal-to-noise ratio, the better the recognition effect of the convolutional neural network, and at a lower signal-to-noise ratio, the recognition effect slightly decreases but it can still perform classification recognition; the convolutional neural network  7 Wireless Communications and Mobile Computing has good performance in modulation recognition and also has certain antinoise performance.
The database required for the network is produced by preprocessing the measured signals with the same number of samples as in the simulation test, which is 3000, and then tested. The test results are shown in Figure 8. In Figure 8, it can be seen that the recognition rate of the measured signals is higher than 95.36% after the classification and recognition by the modulated recognizer designed in this paper. It also verifies the conclusion that the convolutional neural network has good recognition performance in modulation recognition.
The accuracy of the training set and the accuracy of the validation set change with the number of iterations, and the loss value of the training set and the loss value of the validation set change with the number of iterations. Figure 9 shows the loss of the training set. The relationship between the value and the loss of the validation set and the number of iterations are shown. Through comparison, it is found that the transformation trends of the training set accuracy and the validation set accuracy are basically the same, while the trans-formation trends of the training set loss value and the validation set loss value are not consistent. It can be seen in Figure 9 that after 10 iterations of training, the loss value of the training set continues to decrease, while the loss value of the validation set shows a trend of oscillating transformation. It shows that the network has an overfitting phenomenon and the complexity of the network needs to be reduced.
For continuous phase modulation signal, this paper considers the GMSK signal, which is a commonly used communication modulation signal with high spectrum utilization and high noise and channel interference immunity. Unlike modulation types such as MPSK and MQAM, GMSK signals do not have fixed theoretical constellation points in the complex plane, as shown in Figure 10, where we demonstrate the effect of Recover-Net on GMSK signals before and after recovery from the time domain waveform; the channel is still an indoor channel simulated by MATLAB with SNR = 10 dB. It can be observed that the amplitude and phase recovery of the time domain waveform by Recover-Net is good and there is only a small deviation in the details at some peaks, with

Conclusion
In this paper, the problem of communication signal modulation pattern recognition based on deep learning is studied. Firstly, the mechanism of communication signal generation and the related theory of deep learning are introduced to provide the theoretical basis for the identification of modulated signals. Secondly, from the time-frequency domain of the signal, multiple time-frequency analysis methods are compared to select the time-frequency analysis method that better characterizes the modulated signal and the feature image generation algorithm is used to convert the time-frequency image into a database that can be used by the neural network. Finally, through the design of the modulation recognizer based on the convolutional neural network and the analysis of the performance of the convolutional neural network, a convolutional neural network model is established to realize the fast and accurate recognition of modulation patterns of communication signals in a complex electromagnetic environment and it can have good recognition effect under the condition of low signal-to-noise ratio. Compared with classical convolutional neural network modulation pattern recognition, recognition accuracy is improved; the convolutional neural network can be improved by changing the network layer structure, using a sequential convolutional module structure or using small convolutional kernel to extract the fine details in the radio signal; the characteristics of the radio signal will be clearer, compared with classical convolutional neural network modulation pattern recognition; the improved convolutional neural network not only shortens in training time but also improves the recognition accuracy. Theoretical analysis and computer simulation results show that the proposed algorithm based on the machine learning decision theory for communication signal modulation and recognition is practical, effective, and easy to implement and has the value of application in practical engineering.

Data Availability
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Consent
Informed consent was obtained from all individual participants included in the study references.

Conflicts of Interest
We declare that there is no conflict of interest.