Predicting Wireless MmWave Massive MIMO Channel Characteristics Using Machine Learning Algorithms

This paper proposes a procedure of predicting channel characteristics based on awell-knownmachine learning (ML) algorithm and convolutional neural network (CNN), for three-dimensional (3D) millimetre wave (mmWave) massive multiple-input multipleoutput (MIMO) indoor channels.The channel parameters, such as amplitude, delay, azimuth angle of departure (AAoD), elevation angle of departure (EAoD), azimuth angle of arrival (AAoA), and elevation angle of arrival (EAoA), are generated by a ray tracing software. After the data preprocessing, we can obtain the channel statistical characteristics (including expectations and spreads of the above-mentioned parameters) to train the CNN.The channel statistical characteristics of any subchannels in a specified indoor scenario can be predicted when the location information of the transmitter (Tx) antenna and receiver (Rx) antenna is input into the CNN trained by limited data.The predicted channel statistical characteristics can well fit the real channel statistical characteristics. The probability density functions (PDFs) of error square and root mean square errors (RMSEs) of channel statistical characteristics are also analyzed.


Introduction
The fifth generation (5G) wireless communication networks have lots of novel requirements, such as the 1000 times the system capacity with respect to the fourth generation (4G) networks, wide frequency range (covering millimetre wave (mmWave) bands, e.g., 450 MHz-100 GHz), increased data rate, reduced latency, energy, and cost [1][2][3][4][5][6]. To satisfy the above-mentioned requirements, several advanced technologies, such as mmWave and massive multiple-input multiple-output (MIMO), have been proposed and brought new challenges on channel modeling. Since the performance bound of wireless communication systems is determined by channel characteristics [7], an accurate channel model plays an important role in designing, evaluating, and developing wireless communication systems. The 5G wireless communication channel models, such as mobile and wireless communications Enablers for the Twenty-twenty Information Society (METIS) channel model [8], Millimetre-Wave Evolution for Backhaul and Access (MiWEBA) channel model [9], ITU-R IMT-2020 channel model [10], COST 2100 channel model [11,12], IEEE 802.11 ay channel models [13], millimetre-wave based mobile radio access network for fifth generation integrated communications (mmMAGIC) channel model [14], quasi deterministic radio channel generator user manual and documentation (QuaDRiGa) channel model [15,16], and a general three-dimensional (3D) nonstationary 5G channel model [17], can be classified as deterministic and stochastic channel models. As the most important technologies of 5G wireless communication networks, massive MIMO and mmWave have also attracted great attentions. According to the massive MIMO and mmWave indoor channel measurement in [18], authors in [19] did the massive MIMO and mmWave channel parameter estimation.

Wireless Communications and Mobile Computing
Authors in [20] summarized recent massive MIMO channel measurements and models. The above-mentioned models are complex and hard to use. So a revolutionary channel model is necessary.
The explosive increase of frequencies/bandwidths, antennas, and new services/scenarios will generate massive data and bring the research of 5G wireless communications to the era of artificial intelligence (AI) [21,22]. Machine learning (ML), as an important branch of AI, has received extensive attentions due to its capability of digging the valuable and hidden rules from enormous unknown channel information. It can take advantages of both the low complexity of stochastic channel models and the accuracy of deterministic channel models. As a conventional ML algorithm, convolutional neural network (CNN) exhibits excellent performance on compressing and processing redundant channel information [23].
Until now, there are two kinds of applications of AI to 5G wireless communication channels. One is measurement data preprocessing based on statistical learning methods, e.g., clustering algorithms. The Kernel-Power-Density algorithm proposed in [24] used the kernel density and only considered the neighboring points when computing the density. Authors in [25] proposed a novel clustering framework based on Kernel-Power-Density algorithm and took elevation angles into consideration. The Kuhn-Munkres algorithm was proposed to solve the tracking problem in [26]. The Kalman filter in [27] was used to track the clusters and to predict the cluster positions. Furthermore, several other algorithms were used for clusters identification in measurement data preprocessing, such as KPowerMeans algorithm [28] and hierarchical tree [29]. The above-mentioned clustering algorithms play a significant role in conventional cluster-based stochastic channel models, such as COST 2100 channel model and WINNER channel models, but it cannot predict the channel characteristics. The other one is to predict the channel characteristics based on ML algorithms which can dig the mapping relationship between physical environment information and the channel characteristics. The function between frequency, distance, and path loss (PL) was modeled by two types of artificial neural networks (ANNs), i.e., multilayer perceptron (MLP) and radial basis function (RBF) [30][31][32][33][34]. In [35], PL was also modeled as a mapping relationship between delay and the atmosphere by MLP. Authors in [36] and [37] modeled Doppler frequency shift by RBF and MLP, respectively. The mapping relationship between channel characteristics and geographical location was modeled by a feed-forward network (FFN) in [38] and a DeepFi architecture in [39]. In-vehicle wireless channels at 60 GHz were modeled by a FFN and a RBF network [40,41]. Author in [42] proposed a three-layer structure based on ML -"wave, cluster-nuclei, and channel". Most of the existing research works can only obtain the mapping relationship between a single channel characteristic and physical channel environment information but cannot predict comprehensive channel characteristics. At the same time, the channel characteristics of any subchannels in a specified scenario cannot be predicted until now, while they play an important role on channel estimation and communication quality. CNN can compress and process redundant channel information well, but it has not been applied to channel characteristics prediction.
In this paper, we propose an AI enabled procedure to predict channel statistical characteristics based on CNN to obtain the mapping relationship between the location information of transmitter (Tx) and receiver (Rx) antennas and almost all the characteristics of amplitude, delay, and angles. The main contributions of this paper are summarized as follows: (1) A procedure of predicting channel statistical characteristics based on a specified CNN for 3D mmWave MIMO indoor channels is proposed in this paper. (2) This is the first time to compare five different wireless channel characteristic datasets, which are collected by different ways. By comparing their training results, we can obtain better rules of data generation and collection. Therefore, it has a profound guiding significance for data generation and collection.
We have organized the rest of the paper as follows. The AI enabled procedure to predict channel statistical characteristics is shown in Section 2. In Section 3, we describe the two indoor scenarios of data collection and the principle of data preprocessing. The five datasets are also given in this section. The proposed CNN is shown in Section 4. In Section 5, we discuss and analyze the results. Conclusions and future work are given in Section 6.

System Model
The flowchart of AI enabled procedure to predict channel statistical characteristics is shown in Figure 1. Firstly, we set up the indoor scenario and obtain simulated channel information. At this time, we construct two 3D indoor scenarios by setting the sizes and materials of rooms and objects in a ray tracing software. Then we can obtain the multipath component parameters (amplitude, delay, azimuth angle of departure (AAoD), elevation angle of departure (EAoD), azimuth angle of arrival (AAoA), and elevation angle of arrival (EAoA)). We do the data preprocessing to obtain the channel statistical characteristics (PL, DM, DS, AAMA, AASA, AAMD, AASD, EAMA, EASA, EAMD, and EASD) to build the dataset. The dataset of the specified indoor scenario is built to be separated as two sets by the proportion of 7:3 randomly. One is the train set, the other is the validation set. Samples in both the train set and the validation set have 3D coordinates of Tx and Rx as input vectors and channel statistical characteristics as output vectors. The train set is used to train the CNN. The input vectors of the validation set are put into the CNN to obtain the predicted output vectors. Whether train the CNN again or not is determined by comparing and analyzing the root mean square errors (RMSEs) and probability density functions (PDFs) between the predicted output vectors and output vectors of the validation set. More detailed information will be shown in the following sections.

Database Generation
The ray tracing software, Wireless InSite [43], is used to build the simulation datasets. Ray tracing is a classical deterministic method used for modeling radio propagations. It is based on the geometrical optic (GO) and uniform theory of diffraction (UTD). The interactions between rays and objects can be classified as reflection, transmission, scattering, and diffraction. By tracing paths in a specified scenario we build in the simulator, all the possible rays can be obtained and we can get the parameter vector , of the -th ( = 1, 2, . . . , , = 250) multipath between the -th Tx antenna and -th Rx antenna; i.e., where , , , , T , , T , , R , , and R , are the amplitude, delay, AAoD, EAoD, AAoA, and EAoA of the -th multipath between the -th Tx antenna and -th Rx antenna, respectively.

The Descriptions of Data Generation.
To verify the general predicted capability of the CNN, we construct two indoor scenarios in Wireless InSite to collect multipath component parameters. One is a virtual classroom scenario shown in Figure 2; the other is a real lab scenario shown in Figure 3. The ceiling is made of concrete. Both floor and walls are made of 3-layered dielectric in [44]. The layout of the classroom is shown in Figure 2  of paths allowed in simulation is 250. For the complexity of simulation, we do not take scattering which is caused by surface roughness into consideration while the power loss caused by rough surface of objects is calculated by the reflection coefficient multiplied with roughness coefficient. The carrier frequency and bandwidth are set to 60 GHz and 2 GHz, respectively. To evaluate the performance of data collection, two datasets are built. In the 10 × 100 random dataset (10100R, R stands for random) of the virtual classroom scenario, we set up 10 Tx isotropic antennas with 0 dBi antenna gain in all directions and 100 Rx isotropic antennas at random location information to obtain multipath parameters of 1000 subchannels. 32 Tx isotropic antennas and 32 Rx isotropic antennas are randomly set up in 32 × 32 random dataset (3232R, R stands for random) of the virtual classroom scenario to obtain channel parameters of 1024 subchannels. In indoor uplink communication scenarios, the Txs can be mobile phones, laptops, iPads, etc., while the Rxs are normally access points (APs) located on the ceilings. In a virtual classroom environment, we assume that the height of Txs is 1.5 m and the height of Rxs is 3 m. The lab supplies such as books and computer monitors on the desk are not modeled in the 3D ray tracing scenario, because their irregular shapes lead to significant increase of computational complexity and they are shadowed by higher clapboards on the desktop usually. Similarly, we also neglect chairs in the scenario modeling because they are about 0.8 m high which is lower than antennas and chairs were positioned nearby the desks. Since the lab scenario is more complex than the virtual classroom, at most 5 orders of reflection, 3 orders of transmission, and 1 order of diffraction are simulated in the ray tracing setup of this lab scenario. The maximum number of propagation paths is 250. The carrier frequency and bandwidth are set to 60 GHz and 2 GHz, respectively. To evaluate the performance of data collection, three datasets are built. In the 30×30 random dataset (3030R, R stands for random) of the lab scenario, we set up 30 Tx isotropic antennas and 30 Rx isotropic antennas at random location information to obtain channel parameters of 900 subchannels. 30 6

Wireless Communications and Mobile Computing
After data preprocessing, the details of datasets are shown in Table 1. We separated the total samples into train sets and validation sets by the proportion of 7:3 randomly.

Architecture of the Proposed CNN for Channel Characteristics Prediction
The architecture of the CNN is presented in Figure 4. It includes two main stages: first stage configuring two convolutional layers and second stage configuring four dense layers which are also called fully connected layers. It requires a large number of iterations to obtain the neural network convergence to fit the thresholds nodes and the weights of connections for the least loss. The input vector is 3D coordinates of the -th Tx antenna and the -th Rx antenna. The output vector is the channel characteristic vector of the subchannel between the -th Tx and the -th Rx . They can be expressed as The first convolutional layer filters the 1 × 6 input vector with 16 kernels of size 1 × 3. The second convolutional layer takes the output of the first convolutional layer as input and filters it with 32 kernels of size 16 × 3. Both of the two convolutional layers take with a stride of one node. We zero pad the activation to match the number of features. After each convolutional layer, batch normalization in [48] and rectified linear unit (ReLU) are placed to speed up the model convergence. The output of the second convolutional layer is then fully connected to 16 neurons. The following dense layers have 16, 32, 64, and 1 neurons, respectively. In order to obtain the optimized training result, we train the 11 channel characteristics individually by the CNN. Each time we input all 6 elements of into CNN and output 1 element of . ReLU is placed after each dense layer except the last layer. Unlike in computer vision, we do not place pooling layer between the convolutional layers, because the input of our network which is only 6 nodes is relatively sparser than the image which commonly contains millions of pixels. Pooling layer will lose useful information and make the model convergence at a high loss.
As shown in Table 2, this model has 7280 parameters in total. Most parameters are between the second convolutional layer and the first dense layer. The number of these parameters accounts for 42.20% of the total number of model parameters.
The CNN of one output node was designed. The 11 different labels (PL, DM, DS, AAMA, AASA, AAMD, AASD, EAMA, EASA, EAMD, and EASD) are individually used to train the CNN to obtain the different weights in terms of the least loss. Once the label is determined, the loss function and back propagation are applied end-to-end. The mean square error (MSE) function is used as the loss function in all CNNs of 11 labels. The learning rate is fixed throughout once training. We used an equal learning rate which was initialized at 0.0001 for all layers. The root mean square propagation (RMSProp) in [49] with momentum of 0.9 and smooth factor of 10 −6 is used to optimize the weights of the model. The update rule for weight is where is the iteration index, is the learning rate, is smooth factor, and is the gradient of the current iteration .
Glorot uniform initializer in [50], which is also called Xavier uniform initializer, was used to initialize the weights in each layer. The weight was randomly created from a uniform distribution within [− , ] with where is the number of input units and is the number of output units in the weight tensor. We initialized the neuron biases in both convolutional layers and dense layers with the constant 0. This initialization accelerates the early stages of learning by providing the ReLUs with positive inputs.

Results and Analysis
The target of this section is twofold. The first intention is to verify the CNN in two indoor scenarios. Second, we will carry out comparisons between five different datasets to analyze the influence of dataset in CNN.

Fittings between Predicted and Real Channel Characteristics.
In both the virtual classroom scenario and the lab scenario, all the predicted channel statistical characteristics generated by the CNN are in fairly good agreements with the channel statistical characteristics generated by the ray tracing software. In Figures 5-7, we show the fittings of PL, DM, and AAMA between predicted results and virtual simulation data in the two scenarios, respectively. As we can see, the predicted capability of the CNN is very good, and we can use this method to predict the channel statistical characteristics with limited simulation data in specified indoor scenarios. This shows that AI is meaningful for channel modeling. The massive data in wireless communication should be fully used and explored to make the performance of wireless communication networks better.

RMSE.
To evaluate and compare the performances of the CNN with different datasets, we calculate the RMSE between predicted channel statistical characteristics and virtual simulation channel statistical characteristics; i.e.,  10100R  10  100  1000  700  300  3232R  32  32  1024  717  307   Lab scenario   3030R  30  30  900  630  270  3030G  30  30  900  630  270  211211G  211  211  44521  31165  13356   1x6  32x6  16x6  1x16 1x32 1x64 The first dense layer The first convolutional layer The second convolutional layer The second dense layer The third dense layer The fourth dense layer 16 Figure 4: Architecture of the proposed CNN for channel statistical characteristics prediction. where ( ) is the RMSE of the channel characteristic , such as PL, DM, DS, AAMA, AASA, AAMD, AASD, EAMA, EASA, EAMD, and EASD. and denote the predicted result and virtual simulation result of the channel characteristic, respectively.
The RMSEs of channel statistical characteristics with two datasets in the virtual scenario are listed in Table 3. Train loss (TL) is the RMSE between the channel characteristic generated by the CNN and the virtual simulation channel characteristic in train data. Validation loss (VL) is the RMSE between the channel characteristic predicted by the CNN and the virtual measurement channel characteristic in the test data. In Table 3, VL of channel statistical characteristics of the 10100R is always larger than TL. Similar result is shown in the 3232R. The parameters of the CNN are trained based on MSE optimizer in the train data, and the test data is different for the train data absolutely. So the results in the test data cannot be optimized as good as those in the train data. The performance of the CNN in the 10100R is better than in the 3232R, which is most obvious in the PL. The TL of the PL in 10100R (0.6408) is only 14.26% of that in 3232R (4.4932). The VL of the PL in 10100R (0.9586) is only 20.04% of that in 3232R (4.7832).
The RMSEs of channel statistical characteristics with two datasets in the lab scenario are listed in Table 4. As we can see, the performance of the CNN in the 3030G is better than that in the 3030R, which is most obvious in the PL. The TL of the PL in 3030G (1.0616) is only 34.70% of that in 3030R (3.0590). The VL of the PL in 3030G (1.3186) is only 41.27% of that in 3030R (3.1949). The performance of the CNN in the 211211G is better than that in the 3030G, which is most obvious in the AAMD. The TL of the AAMD in 211211G (6.7187) is only 47.27% of that in 3030G (14.2148). The VL of the AAMD in 211211G (7.1652) is only 35.24% of that in 3030G (20.3331).
There are 1024 samples and 1000 samples in the 3232R and the 10100R, respectively. The sample numbers of the two datasets belong to the same order of magnitudes, and both of them are generated when Txs and Rxs are randomly located. According to the specified Tx antenna locations, there are 100 samples with different Rx antenna locations in the 10100R, while there are only 32 samples with different Rx antenna locations in the 3232R. The former is more various and more robust, which explains that the performance of 10100R is better than 3232R. The performance is determined by the robustness of data even they are in the same order    of magnitudes. It is determined by the robustness of data. The comparison between the performance of 3030G and 3030R shows that the data collection in grid is better than that in random. The comparison between the performance of 211211G and 3030G shows that more robust data generated by the specified collection way results in a better predicted performance. The above-mentioned conclusions have significant meaning on data collection.

PDF of Channel Characteristics Error Square.
For the further analysis of the performance of five different datasets, the PDF of channel statistical characteristics error square which can show the distribution of the channel statistical characteristics error square are given in Figure 8. The PDFs of error square of DM and AAMA are shown in Figures  8(a) and 8(b), respectively. In view of that the train loss and validation loss in the 211211G of lab scenario are only slightly lower than those of the 3030G in lab scenario in Table 4, and the advantage of large dataset is not obvious if we take the time and energy consuming of data collection into account. However, Figures 8(a) and 8(b) show that the core superiority of the 211211G in lab scenario is the PDFs of error square in which the proportion of accurate predicted channel characteristic (error square = 0) is very large. It is better that the lower channel characteristic error square has higher probability and vice versa. In Figure 8(b), the proportion of accurate predicted AAMA of the 3232R in the virtual scenario (error square = 0) is larger than that of the 10100R in the virtual scenario, but the proportion of predicted AAMA with a high error square in the 3232R in the virtual scenario is also larger than that of the 10100R in the virtual classroom scenario. The fleet decline tendency of PDF of channel characteristic error square is what we expected.

Conclusions and Future Work
The AI enabled procedure to predict channel statistical characteristics has been proposed in this paper. The channel parameters of massive MIMO and mmWave indoor channel have been generated by a ray tracing software Wireless InSite. The channel statistical characteristics after data preprocessing, such as PL, DM, DS, AAMA, AASA, AAMD, AASD, EAMA, EASA, EAMD, and EASD, can be predicted by CNN. A virtual classroom scenario and a real lab scenario  have been set up to verify this algorithm. The good fittings between the predicted channel statistical characteristics and the real channel statistical characteristics have been shown in this paper. By comparing between the performance of different datasets, the better data collection rule has also been proposed. The generalization of AI enabled procedure to predict channel statistical characteristics for more scenarios is an important task to be solved in the future.

Data Availability
The data can be made available if requested.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.