Transmit Antenna Selection for Sum-Rate Maximization with Multiclass Scalable Gaussian Process Classification

. Antenna selection techniques are extensively applied to reduce hardware cost and power consumption in multiple-input multiple-output (MIMO) systems. Tis paper proposed a low-cost antenna selection method for system sum-rate maximization based on multiclass scalable Gaussian process classifcation (SGPC) which is capable to perform analytical inference and is scalable for massive data. Simulation results show that the average sum-rate obtained by SGPC is 1. 9bps/Hz more than that obtained by conventional optimization driven user-centric antenna selection (UCAS) algorithm and 1bps/Hz more than that obtained by the up-to-date learning scheme based on a deep neural network (DNN) when signal-to-noise ratio (SNR) is 10 dB, the number of total antennas at BS is 6, the number of selected antennas is 4, and the number of single-antenna users is 4. Te superiority of SGPC over UCAS and DNN is more obvious as SNR, the number of selected antennas, or the number of users increases.


Introduction
Multiple-input multiple-output (MIMO) is a key technology to support massive data transmission and high communication reliability in 5G and 6G wireless networks [1,2]. However, the number of radio frequency (RF) chains associated with available antennas increase dramatically in massive MIMO, which result in expensive hardware cost and high power consumption. One efective solution to address this issue is antenna selection, that is, a subset of total antennas is selected and connected to a small number of RF chains, therefore, considerably improving the system energy efciency with comparable spectral efciency and spatial diversity [3,4].
In general, antenna selection is a nonconvex optimization problem; the optimal solution of which can only be acquired via exhaustive search over available antenna subsets with prohibitive complexity in massive MIMO scenario. To reduce the searching complexity, the authors in [5] introduced an iterative antenna selection algorithm based on variable relaxation and successive convex approximation to maximize the achievable sum-rate. A fast greedy antenna selection method was presented in [6] for capacity maximization with considerable additional channel gain and minimum quantization accuracy loss. Authors of [7] proposed a low complexity antenna selection scheme, usercentric antenna selection (UCAS), which clusters the available antennas into K groups that have the maximum channel norms for the k-th user and the sum-rate can be maximized by antenna selection from these groups, reducing the searching complexity by K times. Aforementioned conventional optimization-driven methods are more efcient than exhaustive search with the sacrifce of obtaining a suboptimal result.
In recent years, emerging machine learning techniques for classifcation and decision-making applications of wireless communications have been proved to achieve excellent performance with feasible complexity, compared to conventional parametric counterparts. As a typical application example, the antenna selection problem can be resolved with some superb multiclass classifer and/or predictor of machine learning tools. Te authors in [8] deployed a deep neural network (DNN) to model the relations between the input features and optimal antenna subsets for sum-rate maximization, which achieves more than 95% of the optimal performance with less than 5% of its computational complexity. Te authors in [9] exploited support vector machine (SVM) classifers to classify channel feature vectors with separating hyperplane into the category representing the antenna subset with maximal channel capacity. Te authors in [10] set forth an antenna selection scheme for channel capacity maximization based on principal component analysis (PCA) which projects the data points representing diferent antennas to principal components and select the data points that have the maximum Euclidean distance in corresponding principal component. In [11], decision tree and multilayer perceptron were adopted as antenna selection approaches to improve bit error rate (BER) performance. Te authors in [12] applied reinforced learning via Monte Carlo tree search (MCTS) to select antennas with maximal channel capacity or minimal BER corresponding to the highest reword as in decisionmaking processes. Te authors in [13] achieved maximum receiver-end signal-to-noise ratio (SNR) by transmit antenna selection with multiclass import vector machine (IVM) which selects a small subset of training data, i.e., import vectors, to approximate the full classifcation model very well. An efcient joint antenna selection and user scheduling method based on stochastic gradient descent learning was devised in [14] to obtain the optimal joint uplink and downlink energy efciency. Tere are still much room for learning-based antenna selection methods to improve in either complexity or accuracy.
Tis paper proposed a novel antenna selection approach based on multiclass scalable Gaussian process classifcation (SGPC) to maximize the system sum-rate. Conversional GPC is a terrifc multiclass classifer with complexity O(CN 3 ), where C is the number of classes and N is the number of training data, facing two main challenges: intractable inference due to non-Gaussian posterior and poor scalability for massive data [15]. While scalable GPC available in the literature [16][17][18] have additional assumptions that may deteriorate their performance, SGPC improves the paradigm of conversional GPC without additional assumptions [19], which provides close-form variational inference and reduces the complexity to O(M 3 ), where M is the number of inducing data much less than total training data. To the best of our knowledge, it is the frst work that SGPC is applied to antenna selection and surpass the state-of-the-art machine learning counterparts.
Te main contributions of this paper are summarized as follows: (1) Conventional antenna selection methods are optimization-driven decision with intractable complexity. Tis paper proposed to tackle the problem of antenna selection for system sum-rate maximization with the up-to-date multiclass classifer SGPC, achieving excellent performance with feasible complexity. (2) Tis paper developed a novel input feature in terms of channel correlation matrix to capture the important properties of interuser interference, which is the main restricting factor in the multiuser system as discriminative characteristics to identify the optimal antenna subsets.
(3) Tis work conducted extensive simulation experiments to evaluate the performance of the proposed method and performed a detail comparison with the conventional optimization-driven UCAS algorithm and the up-to-date learning scheme based on DNN in terms of average sum-rate performance and complexity which demonstrated the superiority of the proposed method.

System Model
Consider a multiuser MIMO system operating in time division duplex (TDD) downlink transmission scenario, as shown in Figure 1, where a base station (BS) with L antennas and K RF chains serves J single-antenna users, J ≤ K < L. Te BS selects a subset of K antennas and sends data streams to users. Suppose the channel between the BS and user j is quasistatic fat fading. By quasistatic, it means that the coherence time of the channel is so long that the whole data stream can be transmitted within this time [20]. By fat fading, it means that all frequency components of the transmitted signal will experience the same magnitude of fading. Denote the channel vector between all antennas at BS and user j by h j ∈ C L , and the channel vector corresponding to antenna subset a c by h j,c ∈ C K . Te received signal at user j is as follows [8]: where w j ∈ C K is the beamforming vector, t j is the transmitted signal, n j ∼ N(0, σ 2 ) is addictive white Gaussian noise (AWGN), and the second term of equation (1) is interuser interference. Te achievable rate of user j is as follows [21]: where D j represents the noise and interuser interference at user j. Te main objective is to select a subset of K antennas at the BS for maximization of the sum-rate of all users, with limited transmitted signal power. Te antenna selection problem can be modelled as the following equation [21]: where a c denotes the selected antenna index vector, A � a 1 , . . . a c , . . . a C ; C � (C K L ) denotes the number of available antenna subsets; P denotes the budget of transmitted signal power.

2
International Journal of Antennas and Propagation After the optimal antenna subset a opt is decided, the optimal beamforming vector w j can be obtained via the scheme of weighted minimum mean square error (WMMSE) [22].

Materials and Methods
From a multiclass classifcation and decision-making perspective, this paper proposed to treat the antenna selection problem in MIMO systems as to classify the input data into one of the possible antenna subsets that meet the maximum sum-rate criteria. Terefore, it is promising to address this issue with some terrifc multiclass classifer and predictor.

Scalable Gaussian Process Classifcation.
Gaussian process classifcation (GPC) is a kind of excellent learning-based probabilistic classifcation; the merits of which distinguishing it from other kinds of classifcation are it provides not only class guess in the form of predictive probabilities, but also a measure of prediction uncertainty [15]. However, conventional GPC performs approximation instead of exact inference because of non-Gaussian posterior. Moreover, conventional GPC has infeasible complexity O(CN 3 ), where C is the number of classes and N is the number of training data; therefore, it sufers from poor scalability to tackle massive data.
Scalable Gaussian process classifcation (SGPC) addresses abovementioned issues of conventional GPC by augmenting its probability space via Gumbel noise variable, which leads to analytical model evidence or evidence lower bound (ELBO) for efcient stochastic variational inference with reduced complexity O(M 3 ), where M is the number of inducing data much less than total training data [19].
Given N training inputs X � x n N n�1 and corresponding outputs y n � [y 1n , . . . y cn , . . . y Cn ], y � vec [y 1 ; . . . y n ; . . . y N ], where y cn � 1 and the rest elements of y n equal to zero denote n-th input sample belonging to the c-th class, SGPC places a GP prior over the latent function T of all N training inputs for all C classes and squash this through the softmax function to predict the class probability π cn , where [K c N ] i,j � k(f ci , f cj ) and k(•) is the kernel function or covariance function of input vectors, which defnes the similarity or nearness between inputs with the assumption that inputs which are close are likely to have similar outputs [15]. Kernel function also projects the inputs from original space to feature space with sortable properties [23].
To label the class of a test input x * , frst compute the posterior of its latent variable f * as equation (6), and then compute the class probability of x * with this posterior and label x * with the class corresponding to the largest class probability.
However, equation (6) is analytically intractable because of the non-Gaussian posterior p(f|X, y) of latent variables. Terefore, approximation is needed.
Consider M inducing variables u c � [u c1 , . . . u cm , . . . u cM ] ∼ N(0, K c M ) as sufcient statistic for f c ; the Gaussian approximate to posterior of f is as follows: where q(u c |X, y) � N(ε c , s c ) is the variational posterior assumed to be a tractable Gaussian, [K c M ] i,j � k(u ci , u cj ), and [K c NM ] n,m � k(f cn , u cm ). Te Gaussian approximate to the posterior of f * is as follows: where [k c * M ] m � k(f c * , u cm ). Te class probability π c * of the testing input is predicted with Markov chain Monte Carlo (MCMC) sampling [15], i.e., sample b latent values of f * according to equations (10)∼(12), softmax them and then take an average according to equations (13) and (14).

International Journal of Antennas and Propagation
Te variational parameters ε c and s c as well as hyperparameters of kernel functions are learned simultaneously by maximization of closed-form ELBO L in equation (15) with the Adam optimizer [24]. where n,n , and c ′ is the class label of n-th input sample.

Antenna Selection with SGPC.
We proposed the following antenna selection method based on SGPC, the effcacy of which is demonstrated by performance evaluation in Results and Discussion section. Figure 2 is the fowchart of this method.

Preparing Training Data
Step (16) Step 2. Normalize x n as equation (17) to mitigate possible signifcant learning bias.
Step 3. Classify H n based on the key performance indicator, the sum-rate of all users as equation (4), label the class c, i.e., antenna subset index, of H n with Algorithm 1, and create the training output y n � [y 1n , . . . y cn , . . . y Cn ], where y cn � 1 and y c ′ n � 0 (c ′ ≠ c).
Step 4. Repeat Steps 2 and 3 for all H n and generate the training dataset T � (x n ′ , y n ) (1 ≤ n ≤ N). Initialize M inducing points with k-means clustering.

Learning SGPC Model.
Maximize ELBO L in equation (15) with the Adam optimizer to learn variational parameters ε c and s c as well as hyperparameters of kernel functions.

Predicting Optimal Antenna Subset.
Apply testing input x * to the learned SGPC model to predict optimal antenna subset corresponding to the largest class probability π c * in equation (14).

Results and Discussion
Extensive Monte Carlo simulation experiments with MATLAB were conducted to evaluate the proposed antenna selection method based on SGPC with comparison to the conventional optimization-driven UCAS scheme [7] and the up-to-date learning approach based on DNN [8].
Te entries of 500 channel matrixes H n are randomly generated as i.i.d. complex Gaussian variables. To avoid large variance of performance evaluation, we employ 5-fold crossvalidation [25] which splits the total 500 H n into 5 equally sized subsets, each containing 100 H n . One subset is used for testing and the remaining subsets for training. Te entire procedure is repeated 5 times such that all subsets are tested once. Te default primary parameters setup is summarized in Table 1 unless otherwise specifed. Te kernel function adopts radial basis function (RBF) [13].
Te system average sum-rate of the three antenna selection methods while varying SNR in the range [0, 20] dB is portrayed in Figure 3. Tanks to the superior multiclass probabilistic classifcation capability of SGPC over DNN and UCAS, it is obvious that SGPC outperforms DNN and UCAS for all SNR in the study; DNN performs moderately and UCAS provides the worst performance. It is also observed that the average sum-rate achieved by the three approaches in the study rises as SNR increases, owing to the reason that higher SNR represents weaker noise and interuser interference which results in higher communication reliability and higher data rate. Figure 4 illustrates the system average sum-rate achieved by the three antenna selection methods for diferent number of selected antennas 2 ≤ K ≤ 5. Te number of single-antenna users J equals to 2 in this experiment. It is indicated that the average sum-rate achieved with SGPC surpasses that with DNN and UCAS no matter how many transmit antennas are selected, which certifes again the advantage of SGPC-based antenna selection over DNN and UCAS to obtain high average sum-rate. It is also observed that the average sumrate achieved by the three methods in the study rises as the number of selected antennas increases because more selected antennas result in more streams transmitted simultaneously (multiplexing gain) that increase data rate or lead to higher SINR (spatial diversity gain) that enhance communication reliability [13].
Te superiority of SGPC over DNN and UCAS is further demonstrated in Figure 5 which displays the system average sum-rate vs. various number of users 1 ≤ J ≤ 4. It is shown that the average sum-rate is the largest for SGPC, medium for DNN, and the smallest for UCAS, regardless of the number of users. It is also observed that the average sum-rate achieved by the three schemes in the study rises as the (1) Initialize the system sum-rate R � J j�1 r j,c � 0. (2) for c � 1: C do (3) Apply WMMSE scheme to antenna subset a c to obtain optimal beamforming vector w j,c . (4) If J j�1 r j,c > R (5) a opt � a c , w opt � w j,c . ALGORITHM 1: Labelling training data.   International Journal of Antennas and Propagation number of users increases as it is intuitive that more users bring about larger average sum-rate under the same circumstance. Last but not the least, Table 2 presents the algorithm complexity comparison in terms of big O notation, which is a theoretical measure of algorithm complexity commonly used in the literature. An algorithm with complexity O(f(n)) means its complexity or feasibility in terms of the asymptotic upper bound of execution time is in the order of f(n), given the problem size n. Te complexity of SGPC is cubic with M [19], where M is the number of inducing data. Te complexity of DNN is linear with the product of the number N i of nodes in every two successive layers [26]. Te complexity of UCAS is O(J 2 L + J 2 K + JL + JK) [7], where J is the number of users, L is the number of total antennas at BS, and K is the number of selected antennas. In the case of the massive MIMO system, M is typically smaller than N i , J, L, and K; therefore, SGPC has the lowest complexity and the most feasibility among the three schemes.

Conclusions
Tis paper formulated the antenna selection problem in MIMO systems from a multiclass classifcation and decisionmaking perspective and propounded an antenna selection method based on multiclass SGPC which outshines the conventional optimization-driven UCAS algorithm and the up-to-date learning scheme based on DNN in terms of average sum-rate performance and complexity. Terefore, SGPC is a very appealing antenna selection technique for the MIMO system. Te future work will exploit the scenario of users with multiple antennas.

Data Availability
Te data that support the fndings of this study are not publicly available, however available upon reasonable request and with permission of Yulin Normal University to access confdential data.

Conflicts of Interest
Te authors declare that they have no conficts of interest.
Note. M is the number of inducing data, I is the number of DNN layers, N i is the number of nodes in the DNN i-th layer (N 0 is the input dimension and N I−1 is the output dimension), J is the number of users, L is the number of total antennas at BS, and K is the number of selected antennas. 6 International Journal of Antennas and Propagation