CNN and DCGAN for Spectrum Sensors over Rayleigh Fading Channel

. Spectrum sensing (SS) has attracted much attention in the ﬁ eld of Internet of things (IoT) due to its capacity of discovering the available spectrum holes and improving the spectrum e ﬃ ciency. However, the limited sensing time leads to insu ﬃ cient sampling data due to the tradeo ﬀ between sensing time and communication time. In this paper, deep learning (DL) is applied to SS to achieve a better balance between sensing performance and sensing complexity. More speci ﬁ cally, the two-dimensional dataset of the received signal is established under the various signal-to-noise ratio (SNR) conditions ﬁ rstly. Then, an improved deep convolutional generative adversarial network (DCGAN) is proposed to expand the training set so as to address the issue of data shortage. Moreover, the LeNet, AlexNet, VGG-16, and the proposed CNN-1 network are trained on the expanded dataset. Finally, the false alarm probability and detection probability are obtained under the various SNR scenarios to validate the e ﬀ ectiveness of the proposed schemes. Simulation results state that the sensing accuracy of the proposed scheme is greatly improved.


Introduction
In recent years, the spectrum resource has been more and more scarce due to the great demand for wireless communication, Internet of Things (IoT), Artificial Intelligence (AI) [1][2][3], etc. One of the most important issues of wireless communication technology is to improve its spectrum efficiency in the near future. As a possible scheme to improve spectrum efficiency, cognitive radio (CR) [4] has attracted much attention.
The core idea behind CR is to realize dynamic spectrum allocation (DSA) and spectrum sharing by spectrum sensing and the intelligent learning ability of the system [5]. The most important technology behind CR is to periodically monitor the absence or the presence of the registered users within the observed bands, named spectrum sensing (SS) [6]. In SS, the registered users are the primary users (PU) of the observed bands and have the priority to the spectrum. The purpose of CR is to opportunistically access the registered spectrum when PU is absent. As a result, the cognitive users are the secondary users (SU). Once the PU is back, the SU will release the spectrum at once and wait for the other opportunity.
Classical SS contains matched filtering, energy detector (ED) [7], cyclic spectrum detection [8], covariance matrix detection [9], etc. The sensing performance of the matched filtering is optimal if the prior knowledge of the primary signal is known in advance. ED is the optimal blind detector considering both sensing performance and sensing complexity. However, it suffers from noise uncertainty under the low signal-to-noise ratio (SNR) regimes. The sensing performance of cyclic spectrum detection and covariance matrix detection is improved in the low-SNR case compared with ED at the expense of a higher complexity. However, these traditional SS schemes either have poor performance or have high complexity.
Recently, with the wide application of the deep learning (DL) in the field of computer vision [10], the wireless communication based on DL has been a hot topic [11][12][13]. The essence of DL is to provide a method of automatically learning pattern features and combine the features, thus, reducing the incompleteness caused by artificial design features. In [14], a stacked autoencoder based spectrum sensing approach (SAE-SS) was proposed to relieve the influence from carrier frequency offset (CFO), timing delay, and noise uncertainty. A deep learning based signal detector was considered to exploit the underlying structural information of the modulated signals in [15]. The transfer learning strategies were used in [16] to improve the performance for real-world signals. In [17], the convolutional neural network-(CNN-) long short term memory network (LSTM) detector was proposed to extract the spatial and temporal features of the input.
Motivated by the mentioned above, DL is applied to SS in this paper, where the covariance matrix of the received signal is converted into the true color picture. Then, an improved deep convolutional generative adversarial network (DCGAN) is proposed to expand the training set for the issue of data shortage.
After that, the LeNet [18], AlexNet [19], VGG-16 [20], and a novel network are trained based on the extended data. Finally, the simulations are made to validate the effectiveness of the proposed schemes. The main contributions of this paper are concluded as follows.
(1) The two-dimensional dataset of the received signal is established under the various SNR conditions. Each SNR contains 4000 samples from -10 dB to 2 dB (2) An improved DCGAN network is proposed to expand the obtained two-dimensional dataset. In the expanded dataset, each SNR contains 8000 samples (3) The LeNet, AlexNet, and VGG-16 networks are trained on the expanded dataset. The corresponding false alarm probability and detection probability are given under the various SNR scenarios (4) Based on the sensing performance of the LeNet, Alex-Net, and VGG-16 networks, an improved network is provided in this paper to balance the sensing performance and the sensing complexity The reminder of the paper is organized as follows. Section 2 introduces the related work and gives the system model. In Section 3, the improved DCGAN scheme is discussed to solve the issue of data shortage. The SS with the LeNet, AlexNet, VGG-16, and an improved network is conducted. Finally, conclusions are drawn in Section 5.

Related Work
In this section, three classical convolutional neural networks (CNN) and the deep convolutional generative adversarial network (DCGAN) [21] are reviewed. In addition, the system model of this paper is provided.   As it is shown in Figure 1, the LeNet network contains three convolutional layers with the size 5 × 5, two pooling layers with the size 2 × 2, and three fully connected layers [18]. The AlexNet network contains five convolutional layers, three pooling layers with the size 3 × 3, and three fully connected layers [19].
As it is shown in Figure 2, the VGG-16 network contains 13 convolutional layers with the size 3 × 3 and three fully connected layers [20]. The maximum polling is considered in VGG-16 network with the size 2 × 2. The first two pooling layers are followed by two convolutional layers while the rest pooling layers are followed by three convolutional layers.

DCGAN.
The generative adversarial networks (GAN) [21] have attracted wide attention in the field of machine learning because of its great potential to imitate highdimensional and complex real data. For scenarios where there is a lack of data, it can be used to generate more sample data. In order to solve the problem of high acquisition cost of training set samples, this paper utilizes GAN to generate more training set samples. Generative adversarial networks include generating network (Generator) and discriminating network (Discriminator). The generating network learns the real data distribution to generate new data under the guidance of the discriminating network. Deep convolutional generative adversarial networks are one of the more effective and stable networks based on GAN. The basic framework of DCGAN [22] is shown in Figure 3.
Assume the input of the Generator is the random Gaussian noise z and its output is the fake sample GðzÞ: The true sample x and fake sample GðzÞ are input to the Discriminator, respectively, and the corresponding outputs are DðxÞ and DðGðzÞÞ:DðxÞ denotes the probability that the input x of the Discriminator is a real sample. The Discriminator is to make DðxÞ tend to 1 and DðGðzÞÞ tend to 0. At the same time, the Generator is to make DðGðzÞÞ close to 1. The loss function of DCGAN is shown as where P r represents the real sample distribution and P g represents the fake sample distribution.

Wireless Communications and Mobile Computing
The objective functions of the Discriminator and the Generator are, respectively, written as (3) can be further rewritten as 2.3. System Model. In this paper, the signal is received by a multiantenna system, and then the covariance matrix of the signal is calculated. After that, the covariance matrix is transformed into a true color graph as a data set. The system model of this paper is shown in Figure 4. An M-element antenna system is considered to receive the signals based on N observation vectors. Let s i ðnÞ, n = 0, 1, ⋯, N − 1 denote the n − th discrete-time sample at the i − th antenna. Generally, the spectrum sensing can be regarded as a binary classification [21], where w i ðnÞ denotes the background noise and x i ðnÞ denotes primary signal vector with the Rayleigh fading [7]. H 0 and H 1 , respectively, signify the absence and the presence of PU.
Let s i = ½s i ð0Þ, s i ð1Þ,⋯s i ðN − 1Þ denote the sampling sequence at the i − th antenna and its average can be expressed as The time series matrix of the received signal S can be formulated as The corresponding covariance matrix is For the real part and the imaginary part of S, the covariance matrix is considered as the two channels of the true color image, together with a zero matrix as the third channel of the true color image.
The samples are divided into training set and test set. Then, the training set is expanded by DCGAN. After that, CNN is trained by the expanded training set. Finally, the sensing performance is validated by the test set.   Figure 4: System model.

Wireless Communications and Mobile Computing
The false alarm probability and the detection probability can be formulated as where φ½RðNÞ denotes the feature extraction operation of the proposed network and λ denotes the sensing threshold.

Data Enhancement with DCGAN
In this section, the data enhancement scheme with DCGAN is discussed and an improved DCGAN scheme is proposed, where the python3.7 and pytorch1.5 machine learning libraries are used to implement generative adversarial network and convolutional neural network. The hardware CPU is Inter(R) Core(TM) i5-6300HQ and the GPU is NVIDIA GeForce GTX 960 M.

Data Generation.
In this subsection, the original training set and test set are generated, where the OFDM signal [23] is considered as the primary signal, and the Rayleigh fading [24] is regarded as the propagation channel. The size of the sampling covariance matrix is 10 × 10 × 2. The number of antennas in the multiantenna system used in this paper is M = 10. The channel number of a true color image is 3, where the channel number of the sampling covariance matrix is 2 (the real part and the imaginary part), and the third channel is set to a zero matrix. The matrix size depends on the antenna number of the multiantenna system. In the process of data generation, 8 datasets are generated, whose SNR varies from -10 dB to 2 dB with the step 2 dB. Each dataset is divided into two parts, H 0 and H 1 . For the H 0 part, the real and imaginary parts are, respectively, sampled to obtain 3000 sets of data, where each set of data sampling points N = 1000 and contains two matrices. Then, this set of data is a 10 × 1000 × 2 sampling time series matrix. Calculate the sampling covariance matrix of each matrix according to (7). Then, 3000 sets of dual-channel sampling covariance matrices can be obtained. After that, these 3000 sets of data are randomly sorted and converted into true color images, where the first channel of the true color image is the first channel of the sampling covariance matrix, the second channel is the zero matrix, and the third channel is the second channel of the sampling covariance matrix. Finally, take the first 1000 groups as the data of the H 0 category in the test set, and take the last 2000 groups as the data of the H 0 category in the original training set.
For the H 1 part, the same operations are conducted as the H 0 . As a result, the 1000 sets of test set data and 2000 sets of original training set data for the H 1 part. Figure 5 exhibits the obtained true color image in the H 0 and H 1 case under 2 dB, where the left image corresponds with the H 0 case and the right image corresponds with the H 1 case. From Figure 5, the color of the H 0 case is very dark except for the diagonal, which indicates that the value of the corresponding two-channel sampling covariance matrix is very small. While in the H 1 case, the color of the image is uneven and somewhat chaotic, which indicates that the corresponding two-channel sampling covariance matrix values are also chaotic.

Data Enhancement.
The sampled data is enhanced in this subsection based on the classical DCGAN scheme, where the sampled data is doubled, from 4000 images to 8000 images for each SNR.
In Figure 6, the loss of the training with the iterations is exhibited, where G denotes the Generator, D denotes the Discriminator, and the number of training cycles is 16. According to Figure 6, when it runs to the 10th cycle, a large loss occurs due to the gradient explosion. As a result, how to effectively reduce the loss determines the quality of data enhancement.
In Figure 7, the initial weights of the convolution kernel are adjusted, from the Gaussian distribution with the mean 0 and the variance 0.02 to the Gaussian distribution with the mean 0.05 and the variance 0.02. As a result, the loss function is enlarged, and the trend of the loss function can be better observed. From Figure 7, to increase the mean of the initial weight will not only increase the loss function value but also slow down the training speed and reduce the training gradient. Figures 8 and 9, respectively, exhibit the training result with the mean of initial weight 0.05 and 0.02, where the left image denotes the original image and the right image denotes the generated image with DCGAN. From Figures 8 and 9, the smaller the mean of initial weight is, the better the generated image quality is. It can be seen that the initial weight average still has a great influence on the convergence speed of the model. The larger the initial weight of the convolution kernel,

Wireless Communications and Mobile Computing
the slower the convergence speed may be, and the slower the gradient will be. But at the same time, more useful information can be obtained.

Improved DCGAN.
According to the discussion mentioned above, to increase the mean of the initial weight reduces the training gradient. However, it sacrifices the generated image quality. The improved DCGAN scheme is made to solve the above issue in this subsection.
The change of DðGðzÞÞ before and after the gradient explosion is shown in Figure 10 with the mean of initial weight 0. From Figure 10, when the gradient explosion occurs, the loss functions of the generating network and the discriminating network become extremely large while DðxÞ and DðGðzÞÞ are both 0. At this point, both networks will be collapsed. Motivated by this, an adjustment scheme is proposed as it is shown in Table 1. Figures 11 and 12 show the improved DCGAN result with the mean of initial weight 0 and the improved DCGAN loss with the mean of initial weight 0 when the algorithm runs to the 40th training loop. It can be seen that the generated map has been able to reach the point of being fake. Moreover, from the point of view of the loss function, since the 100th training, the generating network and the discriminating network have also shown an obvious upward and downward jitter trend, that is, adversarial evolution. At the same time, it can be concluded that the mean value of the initialization parameter weight changes from 0.2 to 0, which speeds up the convergence of the model.

SS with CNN Network
For SS, two factors determine the sensing performance, detection probability (PD), and false alarm probability (PFA). As a result, the PD and PFA are provided with LeNet, AlexNet, VGG-16, and the proposed CNN-1 network in this section. Figures 13 and 14, respectively, show the PD [25] and PFA [26] under various SNR conditions. From Figures 13 and 14, the detection performance of the LeNet network is obviously different under different SNRs. When the SNR is small, PD is small and PFA is high due to the obtained nonsignificant features. However, when SNR ≥ −4 dB, PD can be maintained at or above 0.9, and PFA is basically lower than 0.1. In addition, after data enhancement with the proposed scheme, the mean value of PD becomes higher,       Figure 15, the average value of PD is 0.95 or more, and the minimum value is 0.8 or more when −2dB ≤ SNR ≤ 4dB: The average value of PD is close to the maximum, which means that most PD values are close to 1, and only a few are close to the minimum values. In addition, after the data enhancement, the average value of PD is also improved slightly.

LeNet.
The PFA is provided with the AlexNet network in Figure 16. From Figure 16, the mean value of PFA is less than 0.05 and its curve almost coincides with the minimum curve. At the same time, after the data enhancement, the maximum value of PFA decreases obviously, which indicates that the vibration amplitude of PFA decreases obviously.
As a summary, when −2dB ≤ SNR ≤ 4dB, PD is close to 1 and PFA is close to 0, which indicates that the detection performance of the AlexNet is better than that of the LeNet. After the data enhancement, PD is improved slightly, and PFA has become more stable. Figures 17 and 18, the PD and PFA are, respectively, exhibited with the VGG-16 network. According   Figure 17, when SNR = −10 dB or SNR = −4 dB, the average value of PD is improved by nearly 0.1. When SNR ≥ −4 dB, the average value of PD is above 0.9. After the data enhancement, the minimum value of PD is also significantly improved.

VGG-16. In
From Figure 18, the average value of PFA is less than 0.2 and decreases with the increase of SNR. After the data enhancement, the mean and maximum values of PFA decrease significantly.
To sum up, the detection performance of VGG-16 is better than that of LeNet-5 and AlexNet. In addition, the improvement of data enhancement on the performance of the model is also obvious, where the vibration amplitude of PD and PFA decreases obviously. However, the performance of VGG-16 is at the expense of computational complexity due to a large number of network parameters.

CNN-1.
According to the sensing performance of the LeNet, AlexNet, and VGG-16 network, the network depth of VGG-16 is too large and the depth of Lenet-5 is too small to achieve the optimal result. Therefore, this paper designs a novel convolution neural network with the appropriate depth, named CNN-1. The network parameters of the proposed CNN-1 network are shown in Table 2.
As described in the system model, R x is a real matrix with the dimension M × M × 3: Let R x ði, j, τÞ denote the element at position ði, jÞ of the τ-th dimension for R x :     (1) Convolutional Layer C 1 . C 1 contains 32 feature maps, and each feature map is gained by convolution operation with the kernel size of 3 × 3. Thus where C 1 ði, j, τÞ denotes the element at position ði, jÞ of the τ − th feature map in C 1 layer, and k C 1 τ denotes the kernel of the τ − th feature map in C 1 layer. φ ReLU = max ð0, xÞ denotes ReLU function, where max ð⋅Þ denotes the maximum value, x is the independent variable of the function.
(2) Pooling Layer S 1 . S 1 contains 32 feature maps, and maximum pooling is conducted for each feature map (3) Convolutional Layer C 2 . C 2 contains 64 feature maps, and each feature map is gained by convolution operation. Thus (4) Pooling Layer S 2 . S 2 contains 64 feature maps, and maximum pooling is conducted for each feature map (5) Convolutional Layer C 3 . C 3 contains 128 feature maps, and each feature map is gained by convolution operation (6) Pooling Layer S 3 . S 3 contains 128 feature maps, and maximum pooling is conducted for each feature map (7) Convolutional Layer C 4 . C 4 contains 128 feature maps, and each feature map is gained by convolution operation (8) Pooling Layer S 4 . S 4 contains 128 feature maps, and each feature map is gained by convolution operation   Figures 19 and 20, the detection probability of the CNN-1 with the proposed data enhancement scheme is higher than that without the data enhancement while the false alarm probability of the CNN-1 with the proposed data enhancement scheme is lower than that without the data enhancement. In addition, when SNR ≥ −8dB, the mean value of PD is more than 0.8 and the mean value of PFA is less than 0.2, which is much better than that of the LeNet and the same as that of the VGG-16. Figure 21 gives the performance comparisons of various SS schemes including the LeNet based scheme, the AlexNet based scheme, the VGG-16 based scheme, and the proposed CNN-1 based scheme. As it is shown for the black line, it, respectively, denotes -2 dB, 0 dB, 2 dB, and 4 dB from bottom to top, and the other lines are the same as the black one. From Figure 21, the sensing performance of the CNN-1 based SS scheme is highest compared to the other SS schemes under the same SNR, which corresponds with the performance in  In Figure 22, the computation time comparison of different CNN networks is discussed, where we use the time it takes for a model to process a true color image to represent computational complexity and evaluate the computational complexity of different CNN algorithms. From Figure 22, the computation time of the LeNet, AlexNet, CNN-1, and VGG-16 is about 6 ms, 12.5 ms, 16 ms, and 33 ms, which indicates that the LeNet is the simplest and the VGG-16 is the most complex.
As a summary, the detection performance of CNN-1 is similar to that of VGG-16, but the computation time of CNN-1 is nearly half that of VGG-16. This indicates that CNN-1 is a better convolution neural network model for SS.
As a supplement, performance comparisons are made between the proposed CNN-1 and the schemes in [16,17]. As it is shown for the black line, it, respectively, denotes -2 dB, 0 dB, 2 dB, and 4 dB from bottom to top, and the other lines are the same as the black one. From Figure 23, the sensing performance of the proposed CNN-1 scheme is slightly higher than that of the scheme in [16]. Meanwhile, the    10 Wireless Communications and Mobile Computing sensing performance of the proposed CNN-1 scheme is significantly higher than that of the scheme in [17]. This states that the proposed CNN-1 scheme with data enhancement is more suitable for the detection of spectrum state.

Conclusions
In this paper, the deep learning based spectrum sensing is discussed for sustainable cities and society, where the LeNet, AlexNet, VGG-16, and the proposed CNN-1 network are considered. First, the two-dimensional dataset of the received signal is established and expanded by the proposed DCGAN scheme. Then, the four networks are trained on the expanded dataset. Finally, the test is made under the various SNR conditions. The experiment results show that the sensing performance is greatly improved by the proposed data enhancement scheme and the novel CNN network.

Data Availability
The data supporting the results of my study can be found at https://figshare.com/articles/dataset/__part1_rar/14245763.

Conflicts of Interest
The authors declare that they have no conflicts of interest.  [16,17].