Channel Noise Optimization of Polar Codes Decoding Based on a Convolutional Neural Network

Polar code has the characteristics of simple coding and high reliability, and it has been used as the control channel coding scheme of 5G wireless communication. However, its decoding algorithm always encounters problems of large decoding delay and high iteration complexity when dealing with channel noise. To address the above challenges, this paper proposes a channel noise optimized decoding scheme based on a convolutional neural network (CNN). Firstly, a CNN is adopted to extract and train the colored channel noise to get more accurate estimation noise, and then, the belief propagation (BP) decoding algorithm is used to decode the polar codes based on the output of the CNN. To analyze and verify the performance of the proposed channel noise optimized decoding scheme, we simulate the decoding of polar codes with different correlation coefficients, different loss function parameters, and different code lengths. The experimental results show that the CNN-BP concatenated decoding can better suppress the colored channel noise and significantly improve the decoding gain compared with the traditional BP decoding algorithm.


Introduction
As an excellent channel coding technology, the polar code can be theoretically proven to reach the channel capacity based on the phenomenon of channel polarization [1]. The polarization phenomenon is the result of channel merging and channel splitting. During the polarization process, multiple separate uses of the input channel are converted into the continuous use of a set of binary composite channels. After the channel is polarized, some of the channels are nearly perfect, and the channel capacity is close to 1, while the other part is close to 0. Although channel polarization is based on binary discrete memoryless channels (B-DMCs), the polarization phenomenon is widespread. Correlation noise is everywhere in the channel, and the signal is always affected by noise when it is transmitted through the channel.
At present, the two most common polar code decoding algorithms are the successive-cancellation (SC) algorithm [2][3][4] and belief propagation (BP) algorithm [5,6]. The SC decoding algorithm has low computational complexity and very good error correction performance. However, its serial arithmetic structure will cause a longer decoding delay [7]. Compared with SC, the BP algorithm has a parallel data processing structure, which can be repeatedly propagated in the factor graph and has a higher decoding throughput [8]. There are many operations in the improved BP algorithm mentioned in [9][10][11][12]; once a short loop appears, it will seriously affect the decoding performance. The research of polar codes is usually based on the ideal channel environment of Gaussian white noise. There are few references on using the knowledge of machine learning to decode polar codes under correlated noise. Correlated noise is the most common noise in actual communication systems, and it is also a major aspect that seriously affects channel coding performance [13][14][15]. The traditional method of dealing with this noise is to whiten it, that is, to convert the colored noise into white noise or to estimate the probability density distribution of the noise, and then use the estimated joint distribution to construct an optimal decoder [16]. However, the complexity of these two methods is relatively high, which is not conducive to optimize the spectrum efficiency [17] and energy efficiency [18,19] of the whole wireless communication network. In particular, the process of whitening involves matrix multiplication, so this is undoubtedly highly complicated for polarized codes with longer codewords.
With the application of deep learning algorithm in face recognition, visual localization, and image processing becoming more and more mature, researchers also began to try to use deep learning algorithm such as convolutional neural network (CNN) in the field of channel coding [20,21]. The trained CNN is combined with the standard BP decoding algorithm to optimize the noise in the decoding process of low-density parity check (LDPC) code. This idea has achieved good decoding effect in suppressing channel noise [20]. The existing experimental results show that the proposed algorithm framework further reduces the bit error rate (BER) of iterative decoding. In addition, relevant studies show that the application of long-and short-term memory neural network decoder to polar codes in Markov Gaussian memory impulse noise channel can not only reduce the decoding complexity but also obtain more optimized BER performance [22,23].
Based on the above algorithm ideas, this paper implements a polar code noise optimization algorithm based on a CNN for channel decoding under correlated noise. We use the powerful prediction model of CNN to extract and train the relevant characteristics of noise [24,25], estimate and whiten channel noise more accurately, calculate a more reliable log-likelihood ratio (LLR), and reduce the impact of noise on decoding performance. Compared with the literature mentioned above [20], the main contribution of this paper is to apply the concatenated algorithm of CNN and BP to the decoding of polar codes. By offline training the CNN, linear simulation is carried out according to the generated values, and the specific decoding gain is analyzed, so as to optimize the channel noise. The experimental results show that the noise optimized concatenated structure based on CNN has obvious gain in BER performance.
The remainder of the paper is organized as follows. Section 2 briefly introduces the basic principles of the traditional BP decoding algorithm for polar codes. Section 3 proposes a CNN-based polar code noise optimization decoding structure. Section 4 shows the performance analysis of this algorithm and its comparison with other methods. Finally, Section 5 summarizes the paper and presents the research prospects.

Traditional BP Decoding Algorithm
As an iterative algorithm based on the idea of information transmission, BP decoding algorithm has corresponding theoretical research under different types of coding methods [26]. At present, BP decoding algorithm has been widely used in the decoding of LDPC codes, because it has the characteristic of keeping the confidence value of nodes convergent all the time. The process of BP decoding is bidirectional. In some specific cases, it can stop the iteration before reaching the artificially set maximum number of iterations.
In [27], Professor Arikan proposed a factor graph to represent the polar code, and the BP algorithm was used to decode the polar code [28]. If a coset code is expressed as ð N, K, A, u A c Þ, where N is the code length and N = 2 n , K is the dimension, A is the information set determined by the channel polarization, and u A c is the fixed bit [29]. The total number of nodes in the factor graph is Nðn + 1Þ, and each node is represented by an integer pair ði, jÞ. Figure 1 shows the factor diagram of the polar code when N = 8. There are three stages, and each stage includes four processing units [20,30]. The information L n+1,j (e.g., L 2,1 to L 2,8 ) in the figure is allocated by the LLR of the channel output, and the information R 1,j in the first column on the left is the predecoded value of the LLR. The information transfer process of each basic processing unit is shown in Figure 2.
As shown in Figure 2, in each basic processing unit, the transfer and update of information are carried out between adjacent nodes [31]. The symbol L i,j in the figure represents the update of the information node to the left; R i,j indicates the update of the information node to the right. One iteration of the BP decoding algorithm is that the information is first transmitted from the rightmost node to the leftmost node and then from the leftmost node to the rightmost node [32,33]. In this process, the transmission of information is expressed in the form of LLR. The traditional iterative formula is as follows: where 1 ≤ i ≤ n, 1 ≤ j ≤ 2 n−1 , and gðx, yÞ = 2 tanh −1 ðx/2Þ• tanh −1 ðy/2Þ. The information from the channel L n+1,j ð1 ≤ j ≤ NÞ is defined as follows: The initialization information R i,j ð1 ≤ j ≤ NÞ is defined as follows: The steps of the BP decoding algorithm based on polar codes are as follows: (a) Calculate the value L n+1,j in Equation (2) and use it as the initial channel information, where ð1 ≤ j ≤ NÞ (b) In each iteration, the node information of each processing unit is updated by the above formula 2 Wireless Communications and Mobile Computing according to the iteration rules of the BP algorithm of the polar code [34] (c) When the number of iterations reaches the set value, a decision is made: if jϵA c or jϵA and L i,j > 0, the decision u i is 0; otherwise, the decision is 1 Although the effect of BP decoding algorithm has been widely proved to be excellent, it is still very difficult to implement in specific hardware. Some scholars have made many improvements on the basis of traditional BP decoding algorithm, including early termination strategy based on mean and variance and approximate replacement with minimum sum algorithm [5][6][7]. Here, we mainly introduce the traditional BP decoding algorithm in detail.

Proposed CNN Noise Optimization Algorithm
In general, the research of polar codes is usually based on the ideal channel environment of additive white Gaussian noise (AWGN) channel. But in the actual communication system, due to the interference of atmospheric environment, the actual error of dynamic model does not belong to Gaussian white noise sequence, but belongs to the colored noise with certain temporal characteristics. The common solution to this kind of colored noise is whitening process. Since the whitening process involves matrix multiplication, it has a high complexity for polar codes with long codewords. Although the typical SC decoding algorithm has good bit error rate performance, but because of its bit by bit decoding characteristics, when the code length is large, the decoding delay will be a big problem. BP decoding algorithm does not have the above problems because of its parallel data processing structure, but its bit error rate performance under the influence of channel noise is not satisfactory. Therefore, the optimized channel noise CNN-BP concatenated system of polar codes is introduced in this chapter. Firstly, the framework of the whole concatenated decoding system is briefly introduced, and then, the Gaussian colored noise that may appear in the channel is modeled. In addition, the generation of noise sample data, the design of convolution neural network model, and the training of the network are also discussed.

System
Framework. The flowchart of the optimized structure of channel noise based on a CNN is shown in Figure 3. It should be noted that this architecture mainly refers to the iterative BP-CNN architecture for channel decoding proposed earlier [20]. The information symbol u is encoded as x by the encoder. After binary phase-shift keying (BPSK) modulation and mapping, the symbol vector s is obtained. The modulated s is transmitted through the channel with additive noise.
In Figure 3, n is the autocorrelation matrix of Gaussian random vectors. At the decoding receiving end, the received signal y can be expressed as y = s + n, and the result s ′ is obtained through prejudgment. n ′ represents a preliminary estimate of noise, and the estimated value can be expressed as n ′ = y ′ − s ′ . Taking n ′ as the input of the CNN, we can obtain a more accurate noise estimation and make the residual noise follow a Gaussian distribution as much as possible.

Wireless Communications and Mobile Computing
n′′ is used to represent the output of the CNN, and the vector y′ after noise suppression can be expressed as y′ = y − n′′.
Before decoding, we also corrected the LLR value according to the received value and residual noise distribution.
where P 0,i and P 1,i represent the probability that the ith element of y′ is 0 or 1. Finally, the more reliable LLR is input into the polar codes decoding to obtain the final information symbol estimate u′.

Establishment of Channel Noise Model.
After the encoding operation, the code information of the transmitter is transmitted by BPSK modulation. In the process of transmission, the signal is destroyed because of the inevitability of noise, and then, the signal at the transmitter is interfered by noise to form the received signal. In this paper, the colored noise in the channel is modeled by a first-order impulse response filter. Through an infinite impulse response (IIR) digital filter, AWGN be regarded as the colored noise commonly encountered in communication system, and its Z transform is as follows: where a represents a gain factor, which generally refers to the amplification factor, i.e., output divided by input. b represents a correlation coefficient, −1 < b < 1. When b = 0, the digital filter becomes an all-pass filter without any correlation, and the result of filtering is still white Gaussian noise. As can be seen from Equation (5), the IIR filter associates each white Gaussian noise sample with the preceding noise sample. Additive colored Gaussian noise (ACGN) can be obtained when the white Gaussian noise passes through an IIR filter. The ACGN of the jth sample can be expressed as follows: It can be seen from Equation (6) that as the value of b gets larger, the interference between the noise samples before and after will be greater, so the filter coefficient b is generally called the correlation coefficient. If the power of the white noise before filtering needs to be the same as that of the col-ored noise after filtering, then the square deviation of the two noise samples needs to be the same. Therefore, it is necessary to limit the value range of gain factor a, which is generally set to a = ffiffiffiffiffiffiffiffiffiffiffi ffi 1 − b 2 p . In this way, the power of noise can be expressed as follows: where SNR represents the signal-to-noise ratio (SNR) in dB, E represents the energy, and σ 2 w and σ 2 c represent the square deviation of AWGN and ACGN, respectively. Figure 4 shows the amplitude frequency response of F C ðzÞ for different values of b. It can be observed that when b > 0, with the increase of the value of b, F C ðzÞ becomes higher pass, which is a good noise for high-frequency signal, and the energy of low-frequency signal is smaller while that of high-frequency signal is more concentrated. When b < 0, F C ðzÞ becomes lower pass as the absolute value of b increases. At this time, the colored noise will become more and more close to the red noise, which will selectively absorb the signal with higher frequency. To avoid confusion, we only consider the system performance when b > 0 in the following simulation experiments.
3.3. CNN Structure. In the system architecture proposed in this paper, the network structure adopted is shown in Figure 5. Assuming the CNN structure has l layers (the input layer is not counted), the number of convolution kernels in each layer is f 1 , f 2 , f 3 , ⋯, f l , and the number of feature maps is k 1 , k 2 , k 3 , ⋯, k l . The structure of the CNN is determined by the number of convolutional layers, the size of filters, and the number of feature maps in each layer. These parameters need to be determined before training the network. Our CNN structure contains a total of four convolutional layers. The first layer uses 64 filters, and the size of the convolution kernel is 9. The second layer uses 32 filters, and the size of the convolution kernel is 3. The third layer uses 16 filters, and the convolution kernel size is 3. The last layer uses 1 filter, and the convolution kernel size is 15. Unlike the network structure used for low-level tasks in image denoising and superresolution, our network input is a one-dimensional vector instead of a two-dimensional image.
The performance of neural networks largely depends on the choice of loss function. The loss function measures the difference between the actual output and the expected output of the CNN, which needs to be carefully defined according to the specific tasks of the network. In the framework of this paper, CNN is mainly used for channel noise estimation,

Wireless Communications and Mobile Computing
and its output will affect the performance of the next decoding. We introduced the loss function of the normal test [20,30] so that the possibility of residual noise following the Gaussian distribution can be measured. The loss function is defined as follows: The first term on the right side of the equation is used to measure the power of residual noise, and the second term uses the Jarque-Bera test. In statistics, λ is a scale factor that balances the two terms. It should be noted that the choice of λ value is very important for the training model and the final decoding result. Therefore, in the experiment, we will use different λ values as much as possible to compare and analyze the decoding performance of polar codes. In addition, S and C are skewness and kurtosis, respectively. S and C are defined as follows: where r i represents the ith element in the residual noise vector and n represents the sample mean. Although the normality test method is not optimal, its function is derivable, and training is relatively simple.
The power of noise is reduced by subtracting the estimated value of the channel by the received value, which is equivalent to improving the signal-to-noise ratio (SNR).  Figure 5: The CNN structure of the system model in this paper.

Wireless Communications and Mobile Computing
The residual noise following the Gaussian distribution is equivalent to whitening the noise, which is more convenient and reliable for calculating LLR. Experiments also show that the loss function further reduces the BER of iterative decoding compared with the standard quadratic function.

Performance Analysis and Comparison
In this section, we compare the performance of polar code decoding algorithms under different conditions. The simulation experiments in this paper are based on Python development environment, implemented in TensorFlow version 1.10.0 framework, and data simulation analysis is carried out on MATLAB. The CNN structure is {4; 9, 3, 3, 15; 64, 32, 16, 1}, and the random gradient descent method is used to train CNN until the value of loss function does not decrease. At the same time, Adam optimization method is used, and the training data is generated under different channel SNR values, such as {0, 0.5, 1, 1.5, 2, 2.5, 3} dB. Except for the special statement, the length of the polar codes in the simulation experiment in this paper is 256 and the code rate is 0.5. In order to understand the influence of different parameter settings on decoding performance, in our simulation experiment, multiple correlation coefficients b (0.5, 0.6, 0.7, 0.8, 0.9) and corresponding different scale factors λ were tested. For example, when b = 0:6, a wide range of λ values are tested, ranging from 0.01 to 100.

Decoding Performance with Different Correlation
Coefficients and Scale Factors. The effects of different correlation coefficient b and scale factor λ on the performance of CNN-BP concatenated decoding and traditional BP decoding are compared to select the optimal parameter values. At present, the selection of λ is still at the experimental level, and the most appropriate value has not been determined. A very small λ may not guarantee that the residual noise follows the Gaussian distribution, while a very large λ may not reduce the power of the residual noise. Therefore, the selection of scale factor λ will indirectly affect the decoding performance of polar codes.
By fixing the correlation coefficient b, CNN models are trained for different λ values to obtain different training models. Then, the decoding performance of different models is simulated to select the corresponding λ value with the lowest BER. Figure 6 shows the simulation results of CNN-BP concatenated decoding with different λ values when the correlation coefficient b is 0.6 and standard BP when the number of iterations is 12. It can be observed that the decoding performance of CNN-BP concatenated scheme is not optimal when the λ is large or small. For example, when λ = 10 or λ = 100, the decoding performance of CNN-BP is poor, and its BER is higher than that of standard BP algorithm. When λ = 0:1 or λ = 1, the performance of CNN-BP decoding is better than that of standard BP. In addition, when λ takes other values, such as 0.01, 0.03, or 0.07, the corresponding BER is higher than that when λ = 0:1 or λ = 1. Therefore, selecting an appropriate λ value will help to improve the decoding performance of CNN-BP concatenated scheme.
According to the above conclusions, only the simulation results of CNN-BP when λ = 0:1 and λ = 1 are considered in the following experiments. Figure 7 shows the BER performance of CNN-BP concatenated decoding scheme with these two scale factors. The experimental data with correlation coefficients of 0.5, 0.6, 0.7, 0.8, and 0.9 are given here. It can be observed that the BER will decrease with the increase of correlation coefficient. In addition, when the correlation coefficient is small, the BER of CNN-BP concatenated decoding scheme with λ = 1 is lower than that with λ = 0:1. However, when the correlation coefficient is large, the BER of CNN-BP concatenated decoding scheme with λ = 1 is higher than that with λ = 0:1.
It can be noted that as the noise correlation intensity increases, it tends to choose smaller λ. This is due to that there is more information to reduce the input error of neural network when the channel noise has strong correlation. Therefore, choosing a smaller λ can make the network focus on reducing the power of residual noise. However, when the noise shows weak correlation, relatively little information can be used. In this case, the neural network needs to pay more attention to the distribution of residual noise, so it tends to choose a larger λ.
In addition, to reflect more differences, a separate simulation experiment is carried out on the BER performance of the standard BP decoding with different correlation coefficients. Figure 8 shows the performance of the standard BP decoding algorithm.
With the increase of channel noise correlation intensity, the standard BP decoding algorithm cannot resist the noise interference, resulting in the increase of BER of decoding. In the decoding algorithm using concatenated CNN, the influence of channel noise can be adjusted by the scale factor λ.

Wireless Communications and Mobile Computing
When the channel noise has strong correlation, there can be more information to reduce the input error of neural network, so as to reduce the BER of decoding. In addition, comparing Figure 7 with Figure 8, it can be observed that when the correlation coefficient b ≥ 0:6, the performance of CNN-BP concatenated decoding is significantly improved compared with the standard BP decoding. Table 1 shows the gain of CNN-BP concatenated decoding compared with standard BP decoding when the BER reaches 10 −3 . For example, when b = 0:6 and λ = 0:1, CNN-BP concatenated decoding has a BER gain of 0.5 dB compared with the standard BP decoding. It can be observed from Table 1 that when the correlation coefficient is relatively large, CNN-BP concatenated decoding has obvious gain compared with standard BP decoding. It should be noted that in the standard BP decoding, the gain cannot be compensated by more iterations. When the correlation coefficient is small, such as b = 0:5, the BER of CNN-BP concatenated decoding is higher than that of standard BP decoding. This is due to that when the correlation of colored noise is weak, only a few characteristics of the channel noise can be extracted, and the advantage of CNN is not obvious. When the correlation of colored noise is strong, CNN network can better extract the characteristics of noise, which makes CNN-BP concatenated decoding can significantly improve the decoding performance of polar codes.

Decoding Performance with Different Code Lengths.
In this subsection, the simulation experiments of polar codes with different code lengths are carried out. In the experiment, the bit rate of polar codes is 0.5, the number of iterations is 12, and the correlation coefficient of channel noise is 0.8 (colored noise is strongly correlated). Considering the comparison and analysis with the standard BP decoding algorithm, two scale factor values λ = 0:1 and λ = 1 are still selected for simulation experiments. For further comparison, the decoding when λ = 0 and b = −0:8 are simulated. Figure 9 shows the BER performance when the code length of polar codes is 64, 128, and 512, respectively. It can be observed that the decoding performance of CNN-BP concatenated decoding is about 2 orders of magnitude better than that of standard BP decoding when the code length is 64. With the increase of code length, the improvement of decoding performance is more obvious. In addition, with the increase of SNR, the decoding performance of CNN-BP concatenated decoding is obviously improved.
In addition, the decoding performance of the standard BP algorithm is almost the same when b = −0:8 and b = 0:8 with the same code length. However, the results of CNN-BP concatenated decoding are significantly different in these  Wireless Communications and Mobile Computing two cases. This shows that the standard BP algorithm cannot distinguish different noise scenes with the certain correlation intensity, and the latter can distinguish. In addition, this subsection also simulates the CNN-BP concatenated decod-ing scheme when λ = 0. λ = 0 means that the normal loss function has become a common loss function in neural network regression. It can be observed that no matter what kind of loss function is used to form CNN-BP concatenated

Performance Comparison with Other Polar Codes
Decoding Algorithms. To show the difference between the proposed polar codes decoding algorithm and other related algorithms, the prior decision CNN (PD-CNN) decoding algorithm is selected to compare the performance. The PD-CNN decoding algorithm uses deep learning to decode the polar codes. It mainly makes a simple prediction of 0 or 1 for the information sequence at the receiving end and then uses CNN for decoding [35]. Figure 10 shows the performance comparison between CNN-BP concatenated scheme and other decoding algorithms when the correlation coefficient is 0.8 and the code rate is 0.9. It can be observed that the BER of CNN-BP concatenated decoding scheme is lower when SNR is small. This is due to that the noise is relatively large when the SNR is low, and the prediction error rate of information sequence in PD-CNN decoding algorithm is relatively high. When SNR is more than 5 dB, the decoding performance of CNN-BP concatenated algorithm of polar codes is slightly lower than that of PD-CNN algorithm. The two CNN-based methods have better performance than the standard BP decoding algorithm, and in general, the proposed CNN-BP concatenated scheme has stronger SNR adaptability.

Conclusions
With the advent of 5G era, mobile communication technology is accelerating, and the decoding algorithm of polar codes has attracted more and more attention of researchers. At present, the processing method of colored noise in the channel is not perfect, and there are some problems, such as high BER and high time delay, which are difficult to meet the actual needs in specific applications.
In this paper, we use the regression function of CNN to realize a polar code decoding structure that can more accurately estimate channel noise and more reliably calculate LLR. The comprehensive results show that with the increase in SNR, the BER curve of the decoding algorithm proposed in this paper has a good downward trend. CNNs can be trained offline, so this architecture has significant advantages in reducing complexity. Since the proposed architecture is a general solution based on CNN to optimize noise operation, it can also be applied to any other type of CNN-based polar code decoders.

Data Availability
The data used to support the findings of this study are included within the article.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.