Downlink Linear Precoders Based on Statistical CSI for Multicell MIMO-OFDM

With 5G communication systems on the horizon, efficient interference management in heterogeneous multicell networks is more vital than ever.This paper investigates the linear precoder design for downlinkmulticell multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) systems, where base stations (BSs) coordinate to reduce the interference across space and frequency. In order to minimize the overall feedback overhead in next-generation systems, we consider precoding schemes that require statistical channel state information (CSI) only. We apply the random matrix theory to approximate the ergodic weighted sum rate of the system with a closed form expression. After formulating the approximation for general channels, we reduce the results to a more compact form using the Kronecker channel model for which several multicarrier concepts such as frequency selectivity, channel tap correlations, and intercarrier interference (ICI) are rigorously represented. We find the local optimal solution for the maximization of the approximate rate using a gradient method that requires only the covariance structure of the MIMO-OFDM channels. Within this covariance structure are the channel tap correlations and ICI information, both of which are taken into consideration in the precoder design. Simulation results show that the rate approximation is very accurate even for very small MIMO-OFDM systems and the proposed method converges rapidly to a near-optimal solution that competes with networked MIMO and precoders based on instantaneous full CSI.


Introduction
Multicell multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) is a promising technology for next-generation telecommunication networks.Both MIMO and OFDM are known to facilitate the wireless networks with excellent capabilities; when MIMO and OFDM are combined together, MIMO-OFDM achieves tremendous capacities that will be enjoyed by 5G systems in the near future.While MIMO-OFDM systems have superb performance, several challenges invoked by MIMO and OFDM characteristics are involved in the system design [1][2][3][4][5][6].Other competing technologies in 5G networks are filtered-OFDM (F-OFDM) and universal filtered multicarrier (UFMC) that are receiving attention in the literature [7,8].
MIMO wireless cellular systems can achieve excellent data-rates if proper coordination among base stations (BSs) is employed to suppress the intercell and intracell interference.The highest system capacity is achieved by networked MIMO where both the channel state information (CSI) and the data streams of all users in different cells are made available at all BSs [9,10].A more practical and less complex form of coordination is through joint linear precoding with only the CSI of the users shared among BSs [11][12][13].In this case, each BS obtains the CSI of its intracell users and shares it with its adjacent BSs.Such precoding approach is very efficient for small number of BS antennas and it can alleviate the interference to a significant degree.As the number of BS antennas increases, the amount of signaling overhead for passing the CSI among BSs becomes taxing [14].This problem is aggravated in frequency-selective channels employing OFDM where the CSI is different on each subcarrier.To overcome this difficulty, the precoders can be designed based on statistical CSI at the transmitter rather than instantaneous CSI [3,[15][16][17][18][19][20].Since the statistical CSI changes much slower than instantaneous CSI, the BSs need to be updated less frequently, hence, much less signaling overhead.While there is a performance loss due to the lack of exact CSI, precoder design based on statistical CSI for large-scale MIMO systems is an efficient approach because the performance loss is negligible [18,20].
In order to design MIMO precoders with statistical CSI, the ergodic rate, which is a function of the covariance matrices of the MIMO channels, must be maximized over the precoders [16].To facilitate the maximization, the ergodic rate can be approximated by a compact deterministic expression when the MIMO system is large-scale.The optimization can then be carried out on this deterministic approximation which is a function of the statistics of the channels [20][21][22][23]; thus, the solution is a set of precoders that are functions of the statistical CSI.Deterministic approximation to the Shannon rate can be achieved based on techniques such as Bai and Silverstein [22,24,25], Gaussian method [26,27], Lindeberg principle [20,26,28,29], or a combination of them [22,Theorem 6.9], [26].All of these methods approximate the Shannon transform of large dimensional random matrices, but they are based on slightly different assumptions.
The deterministic equivalents of the rates for large-scale MIMO systems that are derived in recent works [20-23, 25, 26, 29] are functions of the channel statistics, so they can be used as objective functions for precoder design with statistical CSI.This has been done in [21][22][23] where closed form optimal precoders are derived for simple single-user scenarios.In [20], an uplink multiuser multicell MIMO system is considered and suboptimal linear precoders are found based on the Lindeberg approximation to the ergodic rate.We shall extend this methodology to the problem of downlink multiuser multicell MIMO-OFDM linear precoder design with statistical CSI.Considering the downlink leads to a somewhat different power constraint compared to the uplink, and the OFDM assumption gives rise to concepts such as frequency selectivity [17,30,31], channel tap correlations [17,[31][32][33][34], and intercarrier interference (ICI) [1,[35][36][37][38], all of which must be taken into account in the precoder design.Our derivation of statistical precoding for the downlink multicell MIMO-OFDM system is a unification of previous works [20,29], and while several details are different from these two works, this derivation is not the main contribution.The main point of this work is applying the already established results of random matrix theory to the MIMO-OFDM scenario in order to study the impact of frequency selectivity, tap correlations, and ICI on the statistical precoder design and system performance.
We shall study the downlink multicell MIMO-OFDM linear precoder design with statistical CSI using the deterministic approximation of the Shannon rate, a problem studied before only under instantaneous CSI [12].In particular, with instantaneous CSI, the channel statistics are not used in precoder design and one may try to optimize the instantaneous rate, as is done in [12] where it has been shown that such an optimization problem is nonconvex and a suboptimal solution has been proposed using the Karush-Kuhn-Tucker (KKT) conditions.In this work, however, we assume that there is only statistical CSI available at the BSs and we design linear precoders that maximize the ergodic weighted sum rate.As opposed to the instantaneous case, for analysis of the ergodic rate, we need a statistical model for the MIMO channel which is usually determined by a covariance structure [23,39].We first consider the general correlated channel model with arbitrary probability distribution described in [23] that includes several statistical models such as the independent and identically distributed (i.i.d.) Rayleigh fading channel and the Kronecker channel [39].Then we reduce the results to a simpler form for the Kronecker channel model.
To maximize the ergodic rate, inspired by [20,23,29], we find a deterministic approximation to the Shannon rate of the MIMO-OFDM system using the random matrix theory [21,40].The methodology will be applied to the MIMO-OFDM system with tap correlations and intercarrier interference.Then, assuming that the BSs have the statistical CSI of all users in the form of covariance matrices, we form a weighted sum rate maximization problem and propose a suboptimal solution using the KKT conditions along with the gradient descent method.Our simulations show that the approximation is quiet accurate even for small size MIMO-OFDM systems and the proposed algorithm converges rapidly to a maximum which has a substantial improvement over isotropic precoding.The results only slightly underperform those obtained with perfect instantaneous CSI [12].We then extend to the case where the frequency-selective channel suffers from correlation among channel taps and we show that the precoders become frequency dependent under tap correlations.We study the effects of tap correlation on the precoder design and system performance.Next, we allow ICI among OFDM subcarriers which is caused by carrier frequency offset due to synchronization errors and Doppler shifts [1].The ICI introduces a new source of interference in addition to the intercell and intracell interference.We find the deterministic approximation to the rate under ICI and then study its impact on the precoder design and system performance.Our simulations show that while spatial correlations, tap correlations, and ICI decrease the system sum rate, our method alleviates this performance loss by incorporating the correlation information and ICI intensity information into the precoder design.It must be noted that our statistical CSI based method is applicable to networked MIMO with full cooperation where the BSs share the channel statistics and the transmit data.
The remainder of the paper is organized as follows.In Section 5, we provide the system model and formulate the optimization problem.In Section 3, we obtain the deterministic approximation to the ergodic rate function.In Section 4, we give a gradient-descent-based algorithm to obtain the suboptimal precoders.In Section 5, we discuss the channel model and how the formulas simplify for the separable channel models.In Section 6, we extend to the case where there is ICI.In Section 7, we present simulation results, and finally, conclusions are given in Section 8.
We denote matrices, vectors, and scalars by upper-case bold letters as in X, lower-case bold letters as in x, and nonbold letters as in  and , respectively.Moreover, X * , X  , and X  denote conjugate, transpose, and conjugate-transpose, respectively.The element on the th row and th column of

System Model and Problem Formulation
where the second and third terms represent the intracell and intercell interference, respectively.We assume that x (,), and z (,), are i.i.d.Gaussian with E[x (,), x  (,), ] = I and E[z (,), z  (,), ] = I.We also assume that MIMO channels are independent across users and BSs.
Each user is assumed to know its own instantaneous CSI for detecting the data [16,20,22,23,29 where Θ (,), denotes the total interference plus noise on subcarrier  given by where the first, second, and third terms represent the noise power, intracell interference power, and intercell interference power, respectively.

Problem
Unlike designing precoders based instantaneous CSI as in [12], the precoders in F are restricted to be functions of {Ω   ,(,), }.The above approach to precoder design has several advantages.It is very practical since acquiring full CSI for all   , (, ),  induces huge communication overhead due to CSI exchanges.Also, since the higher-order statistics can be costly to obtain, the second-order statistics often serve as the minimal statistics for various kinds of estimations.
In order to solve the nonconvex problem (P1), we need to find the expectation E H [ (,), ].However, this expectation Wireless Communications and Mobile Computing makes the problem hard to analyse even for the singleuser case [16].Therefore, it is useful to replace the objective function by the analytically tractable approximation r(,), ≈ E H [ (,), ].Thanks to the random matrix theory, as will be shown in the next section, we can find the deterministic approximation r(,), that allows analytical expressions for the ergodic rate [20][21][22][23]29].The approximation is appropriate for our problem as r(,), is a function of second-order statistics.While we have asymptotic convergence r(,), → E H [ (,), ] when the number of BS antennas is large, our simulations show that the approximation is also accurate for MIMO-OFDM systems with very small number of antennas.
We now replace the ergodic rate E H [ (,), ] with its deterministic approximation r(,), and form the analytically tractable problem max While analytically tractable, problem (P2) is still nonconvex.
Since the global optimum to a nonconvex optimization problem is generally hard to obtain, we will develop an algorithm to find the local optimal solution to (P2).In the next sections, we will first derive the asymptotic approximation r(,), and then propose an algorithm to obtain locally optimum precoder matrices.

Asymptotic Approximation of the Rate
In this section we derive the asymptotic approximation to the ergodic rate function.From (2) and (3), it follows that the rate function can be rewritten as which is associated with the total received signal, and the  2  ( − 1) ×  2  ( − 1) covariance matrix of B (,), defined as which is associated with interference plus noise.From the definition of A (,), and statistical independence of channel matrices across BSs, it follows that where Σ   ,(,), is the  2   ×  2   subcovariance matrix corresponding to BS   and given by Similarly, from the definition of B (,), and statistical independence of channel matrices across BSs, it follows that where Σ ,(,), is the  2  ( − 1) ×  2  ( − 1) subcovariance matrix corresponding to BS   and given by Now, we only need to calculate the subcovariance matrices Σ   ,(,), and Σ ,(,), to completely determine the covariance matrices Σ A (,), and Σ B (,), .It is straightforward to show that where Ω   ,(,), is defined in (4).
We remark here that, in contrast to the uplink transmission considered in [20] where each subcovariance matrix is     ×     , in the downlink transmission each subcovariance matrix is  2  × 2  , so each dimension is approximately  times larger.The increased size of the covariance matrix is due to the fact that the downlink transmitter (BS) has  intended receivers (users), while the uplink transmitter (user) has only one intended receiver (BS).In addition to this, there are  covariance matrix groups in the MIMO-OFDM system, one for each subcarrier.
The deterministic approximation will depend on the eigenvalues of Σ A (,), and Σ B (,), , so we shall define two eigenvalue matrices.Define the   ×    eigenvalue matrix Λ A (,), and   ×   ( − 1) eigenvalue matrix Λ B (,), , which are formed by stacking the eigenvalues of Σ A (,), and Σ B (,), in   ×1 sized columns, respectively.Such a definition is required to apply the theorems from random matrix theory.Note that the first dimension of eigenvalue matrices is the receiver antenna number   .It is easy to see that where   ×    eigenvalue matrix Λ   ,(,), and   ×   (−1) eigenvalue matrix Λ,(,), are formed by stacking the eigenvalues of Σ   ,(,), and Σ ,(,), in   × 1 sized columns, respectively.

Approximation of the Rate Function.
In this subsection, we will derive the approximation to the rate based on the random matrix theory.We start by reviewing some concepts from the random matrix theory.For an  ×  random matrix X, the correlation function is defined as Define the kernel of the correlation function by a set of orthonormal eigenfunctions  , (, ) ∈ C satisfying ∑   ,   X (, ;   ,   ) , (  ,   ) =  ,  , (, ) where  , is the kernel eigenvalue [21].We say the kernel is factorable if  , (, ) =   ()V  () for some functions   () and V  ().Also, define Λ X ∈ R × as the eigenvalue matrix which is formed by stacking the eigenvalues of where Λ X denotes the eigenvalue matrix and u ∈ R  , k ∈ R  are solutions to the following fixed point equations: Now based on this approximation technique and the second-order statistics of A   ,(,), and B   ,(,), derived previously, we can obtain the asymptotic approximation to the ergodic rate in the following theorem.
in which r(,), is a deterministic function given by r(,), = − where and  k B (,), ∈ R (−1)  are the solutions to the following fixed point equations: where  and  range according to the vector sizes.
Proof.We will apply Proposition 1 to each term in (8).Consider the first term log |I + A (,), A  (,), | in which A (,), ∈ C   ×   must satisfy conditions (a) and (b) of Theorem 2 in accordance with the same conditions in Proposition 1.Since   /  = , the ratio of dimensions is always fixed as   ≥   → ∞.Condition   ≥   guarantees that the precoders have full column rank so (2) remains valid.Now, from Proposition 1, the first term in (8) divided by   is approximated as In the first term of approximation fA (,), , we can express , ]  using (22).In the second term of approximation fA (,), , we can express , ]  using (21).For the third term of approximation fA (,), , we can write and express [Λ A (,), k A (,), ]  in terms of [u A (,), ]  using (21).So all the terms in fA (,), are expressed in terms of Forming (1/  )r (,), = fA (,), − fB (,), along with some mathematical manipulations completes the proof.Some remarks are in order.It is shown in [23] that many channel models including the unitary-independentunitary (UIU) model, Kronecker model, and independent nonidentically distributed (IND) model satisfy conditions (a) and (b) of Theorem 2. While the approximation is guaranteed to converge as   ≥   → ∞, it is accurate even for very small number of antennas (e.g.,   = 3,   = 3) as will be shown in the simulation section.The existence of positive solutions to the fixed point equations in Theorem 2 is proved in [21].Computationally, the equations can be solved numerically by iteratively substituting the value on the righthand side into the left-hand side.The convergence result suggests that the ergodic rate E H [ (,), ] can be approximated by r(,), .

Weighted Sum Rate Maximization
In this section, we develop an algorithm to obtain a suboptimal solution to problem (P2).We will often need to differentiate one matrix with respect to another matrix; therefore, to facilitate calculations, the following definition is presented [41].Definition 3. Let f(x) be a function mapping R  → R  , define its derivative D x f as the  ×  matrix with elements Furthermore, let F(X) be a function mapping R × → R × ; define its derivative D X F(X) as the  ×  matrix The most important property of this definition is that the chain rule holds, which enables the differentiation of complicated functions.Adopting this definition, we can now seek to solve the optimization problem.
Problem (P2) is nonconvex, so a global maximum is generally hard to obtain.Fortunately, the KKT conditions still serve as the necessary conditions for local optima, so we seek a suboptimal solution that satisfies the KKT conditions.To this end, we define which is the complex-real isomorphism of F (,), .Problem (P2) then becomes max where F is a set that has complex-real isomorphism with respect to F.
To further simplify the problem, we can reformulate it as an unconstrained optimization.To do so, similar to [42], we make a change of variables from f (,), to the spherical coordinates  (,), that belongs to R 2    if (, ) ̸ = (, ) and to R 2    −1 if (, ) = (, ).The transformation is described by where ≺ denotes the lexicographical order of vectors.Under this parametrization, the power constraint is automatically satisfied since the variables are on the surface of a hypersphere with radius √  .Since the local optima occur at the interior of the domain of angle parameters, the KKT condition for problem (P2) is simplified to where    ∈ R (2    −1)×1 holds all the angle parameters corresponding to BS   .The above expression simply states that the gradient vanishes at any optimum point.So finding a local optimum solution is equivalent to pursuing a set of points  opt   , ∀  at which the gradient vanishes.In practice, the equation for setting the gradient to zero is highly nonlinear and it is impossible to solve it directly.However, starting from any initial point for    , we can use a simple gradient search method to increase the objective function after each iteration and gradually approach a point where the gradient is zero.In order to do so, we need to derive the 1×(2    −1) vector D    r(,), , which is the transpose of the gradient vector.The remaining materials of this section will thus be devoted to the derivation of D    r(,), .
Now we need to find D f   , r(,), and D    f   , in order to form the chain rule and obtain (33).We shall find derivatives with respect to the subvectors of f   , , that is, with respect to vectors f (  ,  ), .The components of D    f (  ,  ), can be obtained by differentiating (30) which yields the following: The components of D f (  ,  ), r(,), can be found through the complex-real isomorphism as Substituting ( 34) and ( 35) in (33) There are three derivatives involved in each term of (36).In what follows, we shall calculate each of them.(36).The first chains can be obtained by differentiating r(,), , as given in Theorem 2, which yields

First Chain of
4.1.2.Second Chain of (36).The second chains can be obtained by differentiating the fixed point equations given in Theorem 2. It can be shown that where e   denotes the th standard basis vector in R  and We now need to calculate the terms in the above formulas.Note that the nonzero terms in the above matrices are equal to D F (  ,  ), Λ   ,(,), and D F (,  ), Λ,(,), expressed through the chain rule.
Based on the results in [41], the derivative of the eigenvalues with respect to the matrix can be explicitly written as a function of the eigenvectors, so we obtain where     ,(,), and μ  ,(,), denote the th eigenvector of Σ   ,(,), and Σ ,(,), , respectively.
Finally, differentiating (15) and using the results in [41], we obtain the formulas for D F (  ,  ), Σ   ,(,), and D F (,  ), Σ ,(,), given by in which K , is the commutation matrix defined in [41] satisfying K , vec{X} = vec{X  } for every matrix X ∈ C × , and With all the chains derived, ( 36) is completely characterized which in turn enables an explicit expression for D    r(,), given by (33).

Local Optimum Solution.
With the gradient D    r(,), computed, a gradient search method can be applied.We now propose Algorithm 1 to find a local optimum solution for problem (P2).Algorithm 1 is not limited to multicell networks with partial cooperation, so with small modifications it can be used for precoder design based on statistical CSI for networked MIMO systems.In networked MIMO, in addition to the channel statistics, it is assumed that each BS has all the transmit data, so the whole system can be viewed as a MIMO super-cell with statistical CSI at the transmitter.

MIMO-OFDM Kronecker Channel Model
While the results obtained so far are valid for general correlation channel models, in this section, we will reduce the results to a more compact form with less computational complexity by considering the Kronecker channel model.The Kronecker model arises in practice when the immediate surrounding dominates the spatial correlation and the intermediate scattering clusters exist in a narrow angular range seen from the antennas [34,39].We shall start by describing the MIMO-OFDM channel and expressing it with the Kronecker structure.

Statistical Representation.
A wideband MIMO channel is characterized by  channel taps H0 , . . ., H−1 ∈ C × .The channel matrix on the th subcarrier is then given by H  = ∑ −1 =0 H  −2/ ∈ C × .Now the correlation among channel taps is given by the tap correlation matrix Υ ∈ C × that is defined as [Υ] , ≜ E{[ H ] , [ H ] *   ,  }, ∀, ,   ,   .On the other hand, the Kronecker model assumes that the correlation of transmitter side and the receiver side is separable, so for each channel tap H , we have E{[ H ] , [ H ] *   ,  } = [R] ,  [T] ,  where R ∈ C × and T ∈ C × are receiver and transmitter correlation matrices, respectively.Based on the above definitions, it is easy to show that the channel correlation matrix on subcarrier  is given by Therefore, the channel correlation matrix Ω  is characterized by the Kronecker product of transmitter and receiver correlation matrices T and R, multiplied by the quadratic form  Υ  which depends on the tap correlation matrix and the th Fourier vector.Finally, the following statistical representation can be considered for the channel: where H  ∈ C × is a white random matrix whose elements are uncorrelated with zero mean and unit variance.Through the properties of the Kronecker product, the above statistical representation yields Ω  =  Υ  (T ⊗ R).We shall use (45) to describe the MIMO-OFDM Kronecker channel model.
If the channel taps are uncorrelated, Υ is diagonal and  Υ  becomes independent of ; hence, the channel correlation function Ω  is the same over all subcarriers.However, whenever the channel taps are correlated, Υ is not diagonal and channel statistics are different on each subcarrier.Since the precoders depend on the channel statistics, when there is tap correlation, the MIMO-OFDM precoding matrices are frequency dependent which limits the system performance [17,[31][32][33][34].But when there is no tap correlation, precoders are the same across all frequencies.We shall study the effect of tap correlation on the system sum rate in the simulation results section.

Intercarrier Interference
In this section, we allow ICI among OFDM subcarriers.The case without ICI is then a special case of this scenario.The ICI occurs when there is carrier frequency offset due to synchronization errors and Doppler shifts [1,38,43].This leads to loss of orthogonality among subcarriers which introduces more interference to the system.To accommodate our method to this situation, we will approximate the rate under ICI and then extend the proposed algorithm to facilitate the precoder design with statistical CSI.

System Model and Problem Formulation under ICI.
When there is ICI, the received signal at user (, ) on subcarrier  can be modelled as where |   , | < 1 models the power leaked from subcarrier   to subcarrier  due to the ICI.Following [1], we model the ICI through the normalized frequency offset denoted by 0 <  < 1, which is the ratio of the actual frequency offset to the intercarrier spacing.It is shown in [1] that    , relates to the  by Here at the receiver, in addition to the summations over   ,   that model intercell and intracell interference, the received signal over subcarrier  depends also on all other subcarriers through ICI and thus the summation over   .Note that the formulation reduces to that of the non-ICI scenario when  = 0, or equivalently    , =    , .The achievable instantaneous data-rate under ICI for user (, ) on subcarrier  is then given by Based on this result, we can utilize the same random matrix method employed for the non-ICI scenario to derive the approximated rate rICI The optimization problem we consider is similar to (P1) but using the rate under ICI, that is,  ICI (,), .As before, we replace the rate  ICI (,), by its approximation rICI (,), for tractability.Therefore, the following optimization problem is formed: (P3)

Second-Order Statistics under ICI.
Due to the ICI, in addition to per-carrier statistics, the BSs now share the additional information of cross-carrier correlation.Specifically, the cross-carrier covariance matrices, are assumed to be available at all BSs.This additional information is important for the BSs to suppress the ICI.For Kronecker model, the transmitter, receiver, and tap correlation matrices are sufficient because we have with We will now find the second-order statistics of Ã(,), and B(,), which depend on Ω   ,(,), where Σ ICI   ,(,), and Σ ICI ,(,), are block matrices holding  2 blocks with block ( 1 ,  2 ) given by and     ,(,), and β  ,(,), are the th eigenvectors of W   ,(,), and W,(,), , respectively, and

Simulation Results
In this section, we demonstrate the approximation accuracy and evaluate the performance of our algorithms.We consider a downlink multicell MIMO-OFDM system, where the number of cells and users in each cell is set to be 4, and the number of subcarriers is set to 8, that is,  = 4,  = 4, and  = 8.The number of antennas for users and BSs is set to 3 unless stated otherwise; that is,   = 3 and   = 3.The users are uniformly distributed in cells with 0.5 Km radius.Without loss of generality, the weighting for the sum rate maximization is uniform in the simulations, that is,  (,) = 1/, ∀, .We assume the power constraints are the same for all BSs, that is,   = , ∀.Since the noise power was normalized to unity, the transmitter signal-to-noise ratio is SNR = 10 log 10 .
We consider the Kronecker channel model introduced in Section 5 with correlation matrices in which    ,(,) denotes the distance between BS   and user (, ), and  T ,  R are the spatial correlation factors as defined in [44].The constant  is chosen so that E[(/   (,) ) 2 ] = 1.While not explicitly specified, we also choose  T ,  R to be random and uniformly distributed in the interval [0.5, 1] for each user.The tap correlation matrix is chosen as where  Υ is the tap correlation factor and ∑  =1   = 1 with   / +1 = 3 dB as defined in [34].The number of channel taps is assumed to be 3 throughout our simulations; that is,  = 3.The ICI is modelled according to (51).

Approximation Accuracy.
We compare the exact ergodic sum rate obtained by simulations and its approximation given by Theorem 2 or Corollary 4. The results are depicted in Figure 2, where it is seen that the approximations are extremely accurate.Note that although the approximations are asymptotic, exceptional accuracy is observed for our small size MIMO system (  = 3,   = 3); this fact justifies this approximation approach for practical antenna sizes.It is also seen that the approximation is accurate across a wide range of SNRs.

System
Performance.Now we demonstrate the performance of our proposed method given by Algorithm 1, which is based only on statistical CSI at the BSs.For comparison, we shall also depict the results of two other methods.First, the results of the algorithm are based on perfect instantaneous CSI proposed in [12], which assumes the same system model as this work but with full CSI at the BSs.Second, the results of networked MIMO are based on statistical CSI, where, in addition to the channel statistics, it is assumed that each BS has all the transmit data, so the whole system can be viewed as a MIMO super-cell with statistical CSI at the transmitter.The above two systems obviously require much heavier BS coordination, but as we shall see, our method competes with them.As mentioned before, our proposed algorithm is not limited to multicell networks with partial cooperation, so with small modifications it can be used for precoder design based on statistical CSI for networked MIMO systems.Figure 3 depicts the results where it is observed that although our method uses statistical CSI with limited BS cooperation, the achievable weighted sum rate is comparable to that with full CSI and networked MIMO for a wide range of SNRs; note that statistical CSI incurs a much lower signaling overhead and requires less frequent update compared to full CSI.Here, we also see the results for different tap correlation factors.While tap correlations degrade system performance, using our method to incorporate the correlation information into precoder design alleviates the performance loss as seen in Figure 3.Moreover, it is observed that lack of knowledge about the tap correlations or neglecting them (assuming uncorrelated channel taps) leads to performance loss.When no precoders are employed (identity matrix precoding), there is a huge disadvantage, so precoder design is crucial in the MIMO-OFDM system.

Correlation Factors.
In Figures 4 and 5, we investigate the effect of tap correlation and spatial correlation factors on the system performance.We see that both tap correlation and spatial correlation decrease the weighted sum rate.However, the spatial correlation shows more prominent effect on the sum rate.It is seen that as long as the correlation information is incorporated into precoder design via our proposed algorithm, the performance loss is not significant for a wide range of correlation degrees.It is noteworthy to mention that the performance loss becomes slightly larger at lower SNRs.

Number of Antennas.
Figure 6 shows the sum rate as the number of transmitter antennas   increases; here the number of receiver antennas is fixed to   = 2.The weighted sum rate is seen to increase with   .As it is seen, the slope of performance increase for Algorithm 1 and that of the method in [12] based on full CSI is similar as   increases.This suggests that our proposed method is reliable for arbitrary large MIMO-OFDM systems.interference from other subcarriers, the sum rate decreases as is seen in Figure 7.It is seen that, for nonnegligible FO, there can be a serious decrease in the sum rate.However, smart precoder design implemented by Algorithm 2 can suppress ICI and achieve a reasonable sum rate.It is observed that the performance gain of our method over the nonprecoding scheme increases with SNR.Moreover, the advantages is more pronounced for smaller frequency offsets.However, we note that the gap between ICI-free system and system with ICI also increases with SNR.In Figure 8, we plot the weighted sum rate versus the ICI intensity factor for SNR = 15 dB.We see that the system sum rate is highly sensitive to ICI for small .

Convergence
Rate.Now we compare the convergence rates for the Algorithms 1 and 2 for SNR = 15 dB and  = 0%, 1%, 10%, and 20%.The initial precoders are chosen to be the identity matrix.Under various degrees of ICI, we see from Figure 9 that Algorithm 1 always converges faster than Algorithm 2. When there is ICI, the convergence rate of Algorithm 2 is similar for various degrees of ICI.

Conclusions
We investigated linear precoding for downlink multicell MIMO-OFDM systems based on statistical CSI.The main contribution of this work was applying the already established results of random matrix theory to the MIMO-OFDM scenario in order to study the impact of frequency selectivity, tap correlations, and ICI on the statistical precoder design and system performance.The asymptotic approximations to the ergodic rates in ICI and ICI-free scenarios were derived, based on which, we formulated two nonconvex sum rate maximization problems and proposed locally optimum gradient based solutions to them.Simulation results showed that while spatial correlations, tap correlations, and ICI decrease

4. 1 .
Derivation of D    r(,), .It immediately follows by the chain rule that

Figure 2 :
Figure 2: Approximation accuracy for the ergodic sum rate.
Figure 1: Multicell MIMO-OFDM system model.a matrix is denoted by [X] , , and the th element of a vector is denoted by [x]  .Vectorization, trace, and expected value operators are denoted by vec{⋅}, tr{⋅}, and E[⋅], respectively.The all-one and all-zero vectors of size  and identity matrix of size  ×  are denoted by 1 , 0  , and I  , respectively.The operators Re{⋅} and Im{⋅} represent real part and imaginary part, respectively.