On the Capacity and Transmission Techniques of Massive MIMO Systems

A massive MIMO wireless system is a multiuser MISO system where base stations consist of a large number of antennas with respect to number of user devices, each equipped with a single antenna. Massive MIMO is seen as the way forward in enhancing the transmission rate and user capacity in 5G wireless. The potential of massive MIMO system lies in the ability to almost always realize multiuser channels with near zero mutual coupling. Coupling factor reduces by 1/2 for each doubling of transmit antennas. In a high bit ratemassiveMIMO systemwithm base station antennas and n users, downlink capacity increases as log2m bps/Hz, and the capacity per user reduces as log2n bps/Hz. This capacity can be achieved by power sharing and using signal weighting vectors aligned to respective 1×m channels of the users. For low bit rate transmission, time sharing achieves the capacity as much as power sharing does. System capacity reduces as channel coupling factor increases. Interference avoidance or minimization strategies can be used to achieve the available capacity in such scenarios. Probability distribution of channel coupling factor is a convenient tool to predict the number of antennas needed to qualify a system as massive MIMO.


Introduction
Research and development activities in Multiple Input Multiple Output (MIMO) wireless systems originated more than two decades ago.Nevertheless, the potential of MIMO systems to enhance the capabilities of wireless mobile communication is still to be utilized.Initial research activities in MIMO wireless systems have been for point to point communication between transceivers equipped with comparable number of multiple antennas at both the transmitting and receiving ends.The origin of this work can be traced back to the application of the theory of random matrices in the information theory of mutually coupled random vectors [1,2].The term MIMO is first found in [3] where the authors derived the information capacity of certain MIMO channels with memory.The results from this work have been useful for solving the cross-talk problem in wired communication systems [4] and the cross coupling problem in wireless communications [5].Consequently, the landmark papers on the information theory of matrix wireless channels appeared during 1990s [6,7].Coding methods to realize the capacity of MIMO wireless channel also have been reported around this time period [8][9][10].Published work on experimental MIMO wireless systems followed [11].
Since commercial wireless communication systems have been multiuser systems where typical user devices consist of one or two antennas, the subsequent attention has been on multiuser MIMO systems where base stations with multiple antennas serve user devices equipped with relatively fewer number of antennas.Although one can envision a system with full cooperation among base stations and user devices to form a single matrix channel in this context, theoretical and practical issues to be solved are immense and are not realistic for the state of the art in technology.It is difficult to solve issues related to channel estimation and feedback, synchronization, and precoding methods, to name a few.Somewhat simpler forms of multiuser MIMO systems were incorporated in 3G wireless standards where a base station allocates a few antennas per user and each user device comprises up to two antennas [12].The design of such systems involves the consideration of mutual interference among different MIMO links within a cell site as well as the mutual 2 Wireless Communications and Mobile Computing interference among links from different cell sites [13].These methods are difficult to scale up as the system grows.
In recent years researchers and developers have been working on multiuser MIMO systems where the base stations have large excess of antennas relative to the numbers of users being served and each user device may have a single antenna.This particular form of MIMO system is known as massive MIMO or full-dimension MIMO system [14,15].Massive MIMO systems have several desirable properties.The statistical characteristics make it possible to almost always realize orthogonal or nearly orthogonal Multiple Input Single Output (MISO) channels.In this scenario, the intracell as well as intercell interferences of users become negligible.Thus, the system is scalable with ease as the operational user devices increase.Further, malfunctioning of a few antennas will not affect the performance since the serving antennas are a dynamic group (can be hot swapped) [16].Many of the recent literature on massive MIMO focuses on implementation issues such as carrier synchronization [17] and pilot contamination [18].Increased hardware complexity due to the requirement of an analog radio frequency (RF) chain for each antenna also has been a concern.Nevertheless, low cost implementation is possible as individual RF chains operate at significantly low power.The power per antenna reduces as inversely as the number of transmit antennas.The total transmit power increases only linearly as the number of user terminals being served.Recently published work such as [19] propose hybrid analog-digital techniques in the context of millimeter wave based massive MIMO systems.Massive MIMO systems are based on asymptotic results where the number of transmit antennas is considered large and optimal performance relies on the knowledge of channel covariance.Real-time estimation of such statistical parameters is difficult with large systems.Randomized algorithms have been studied for effective beamforming in massive MIMO systems as an alternative [20].
This paper presents a study on the achievable downlink capacity with ideally uncoupled (orthogonal) and partially coupled multiuser channels where there are multiple antennas at the base station and single antenna at user terminal.Optimal/suboptimal transmission vector design strategies to achieve the capacity are investigated.The analytical and simulation results are obtained for a single cell.It is assumed that the base station has complete knowledge of downlink channels of all users.In Section 2 to follow, expressions of Signal to Interference plus Noise Ratio (SINR) and transmission capacity are scrutinized to draw key conclusions on the trends when channel instantiations are uncoupled.This case characterises ideal massive MIMO system.It is shown that the selection of transmission vectors and multiple access techniques required to achieve the capacity in this case is trivial.Next, Section 3 addresses the necessary strategies when the channel instantiations are partially coupled.More elaborate transmission vector design methods are required to achieve the capacity in this case.Rate regions of optimal and suboptimal methods are studied and compared.Simulation results are presented to illustrate the trends in relative performances of different methods as the channel coupling factor varies.Illustrations exemplify how the performances of various strategies converge to the performance of trivial case as the channels become uncoupled.
The coupling factors between channels of different users are random and good uncoupling is achieved only in the probabilistic sense.The probability distributions of coupling factor are examined in Section 4. The cumulative distributions of coupling factor are derived for varying number of base station antennas in this section.These curves are examined to draw important conclusions on the antenna requirements for a system to qualify as massive MIMO.

Transmission in Uncoupled Channels
In the system studied, there are  antennas at the base station serving  user devices each having a single antenna.Let the 1×  vector channels between the base station and the  users be ℎ  ,  = 1, 2, . . ., .Each of the  elements of ℎ  is the fading coefficient of the wireless channel between a transmit/receive antenna pair.The  elements in each of these vector channels can be modeled as uncorrelated zero mean circularly symmetric complex Gaussian random variables with unit variance [21,22].This model assumes a physical wireless environment rich in scatterers.The total available power   is distributed to the  users.It should be noted that   is the spatial average power available at a receiver when there is a single pair of transmit and receive antenna elements; i.e., the analytical framework accounts for path loss and shadow fading loss.The distribution of power among  transmit antennas for each user is determined by a weighting vector   ,  = 1, 2, . . ., .This vector can be represented in the normalized form,   = √    , where   is the share of total power assigned to user  and   has unit norm, i.e., ‖  ‖ = 1.The norm of a 1 ×  complex vector   here is , where   ,  = 1, 2, ⋅ ⋅ ⋅ ,  are the elements of   .
Received SINR at user  can be written as The notation ∘ stands for the inner product defined as ∘ = , where  and  are row vectors of size 1× and  is the conjugate transpose of .The notation |⋅| implies absolute value.In this expression, the numerator is the received signal power which results from the projection of the weighting vector onto the corresponding channel vector.The denominator is the interference plus noise at the receiver.The weighting vectors   are normalized to the noise voltages at the receiver.Therefore,   is normalized to noise power at the receive antenna and so is   .The maximum achievable bit rate (capacity ) of user  in the information theoretic sense [23] is and hence the maximum sum rate is Since the capacity   achieved by user  is a monotonically increasing function of   , maximization of   given by ( 1) maximizes   .Thus, it can be observed that the capacity of user  depends on the power allocations, the channels, and the weighting vectors of all  users.Power allocation and weighting vectors are under the control of signal processing algorithms, whereas the channels are decided by the propagation environment.If an instantiation of  vector channels ℎ  ,  = 1, . . .,  are mutually orthogonal (uncoupled), for a given power allocation   ,  = 1, . . ., , the weighting vectors   ,  = 1, . . .,  that are aligned to respective channels ℎ  ,  = 1, . . .,  i.e.,  ℎ  = ℎ  /‖ℎ  ‖, maximize the numerator and null the interference term in the denominator (1), thus maximizing   and hence the transmission rate   .The rates achieved are the capacities of user channels for the allocated shares of power.On the other hand, it is possible to allocate the total power to each user for a fraction time frame in turn to implement time shared transmission.The capacities of time sharing and power sharing methods are further discussed in the subsections below.

Time Sharing.
Let the base station sends data by sharing the time between n users and allocating total available power to a single user at a time.The user k achieves the rate   using a fraction   of the time frame; thus, ∑  =1   = 1.From ( 1) and ( 2) the capacity achieved by user  with time sharing is Binomial expansion and omission of higher order terms with the assumption   ‖ℎ  ‖ 2 << 1 results in the approximation: and this is the rate achieved for low bit rate (low power) transmission.The results of ( 4) and ( 5) are independent of level of coupling between channels since transmission takes place for one user at a time.

Power Sharing.
For uncoupled channels ℎ  ∘ℎ  = 0 for  ̸ =  and the optimum weighting vectors are those in alignment with respective channels, i.e.,   = ℎ  /‖ℎ  ‖.From ( 1) and ( 2), these vectors lead to the capacity: Comparison of ( 5) and ( 6) reveals that, at low bit rate transmission, the rate achieved by the user  via time sharing with a fraction   of the time frame converges to the capacity of the system achievable by simultaneous transmission to all users in the entire time frame with shares of power     .This is an elegant and useful result showing the possibility of achieving capacity with simple time sharing, for low bit rate transmission.Figure 1 illustrates this fact for an example case.This figure compares rate regions with time sharing and power sharing for two users ( = 2) at total power levels ranging from -21dB to -12dB in steps of 3dB.These results have been obtained for the case of four base station antennas ( = 4) and cos Here cos 12  is the normalized coupling factor of the two channels.While the number of antennas used in this illustration may not qualify as "massive", the pair of channels were selected by repeatedly generating channel instantiations from zero mean circularly symmetric Gaussian distribution of variance one until the coupling level was significantly small.For high bit rate (high power) transmissions, following results are obtained.Let in (6)   ‖ℎ  ‖ 2 ≫ 1.For sufficiently large number of base station antennas , ‖ℎ  ‖ 2 ≈ .Assuming equal power shares for  users (6) reduces to Two observations are made here.First the achievable capacity of a user reduces by 1bps/Hz for each doubling of the number of users .Second the capacity increases by 1bps/Hz for each doubling of the number of base station antennas . Figure 2 exemplifies these results.The curves in the figure have been obtained with  = 4 and  = 16 for a range of power settings in steps of 1dB.With  = 4 the power levels are in the range from 20dB to 30dB, whereas with  = 16 a range of 14dB to 24dB has been used.The 6dB difference has been kept so as to compensate for the factor of 4 in the number of antennas .Channel instantiations are selected so as to have negligibly small values for coupling factor cos  12 .The two sets of curves for different values of  do not coincide in this case since the number of antennas are not sufficient to satisfy the condition ‖ℎ  ‖ 2 ≈ .It can be observed that the optimal pair of rates are 1bps/Hz below that achievable by allocating total available power to a single user.It is worthwhile to mention here that in a massive MIMO system comprised of a large excess of base station antennas compared to the number of users, the log 2  term in ( 7) can be neglected and the capacity of any user becomes independent of the total number of users.This section has focused on transmission in uncouple multiuser channel instantiations.In a massive MIMO system, most of the channel instantiations are mutually uncoupled.Nevertheless, there exists a finite probability however small, for the channels to have significant level of coupling.This probability increases as the number of active users increase with respect to the number of base station antennas.Thus, a close look at the cases of coupled channels is necessary.In this case, simple alignment of weighting vectors to respective channels is suboptimal.Alternative strategies such as zero forcing, best linear optimization, and Dirty Paper Coding are considered to achieve better performance.Section 3 presents the optimal weighting vector design techniques for coupled channels.

Transmission in Coupled Channels
Maximization of transmission rate implies maximization of SINR.To this end, methods such as Lagrangian based best linear optimization (BLO) and Zero Forcing (ZF) are studied and compared with Dirty Paper Coding (DPC).DPC is known to be optimal for the scenario in concern ( [24] and references therein).Relative performances of these techniques are studied considering two users each equipped with a single antenna and  transmit antennas at the base station.
In the discussion to follow, the two 1 ×  vector channels are denoted by  and h.The primary optimization problem here is to maximize the sum capacity of two users for the available channel instantiations as the fractions of total power   assigned to two users varies.The solution can be obtained by first evaluating the surface of minimum total power required to achieve pairs of rates in the  ℎ −   rate plane.The intersection of the surface of minimum total power and the plane at   parallel to  ℎ −   plane gives the curve of achievable capacities for the available total power.

Best Linear Optimization(BLO).
For two users with channels  and ℎ, the SINR expression of (1) reduces to and the total power is The optimum weighting vectors and power values can be found by solving the Lagrange optimization problem given by The objective function given by this equation is minimum when the partial derivatives with respect to the conjugate transposes   and  ℎ become zero.The optimum Lagrange multipliers   and  ℎ are the solutions to the set of hyperbolic equations ( 10) and (11).
In these equations,  = ‖‖ 2 ‖ℎ‖ 2 −|∘ℎ| 2 (Schwartz's identity),  = ‖ℎ‖ 2 /√,  = 1/ ℎ ,  = ‖‖ 2 /√, and  = 1/  .The solution is two valued and one of the solutions leads to a feasible set of power levels.Accordingly, the minimum total power is achieved for the weighting vectors given by and Thus, the minimum power   is given by ( 14).The expression for power  ℎ can be easily written by replacing each parameter in the numerator by its dual, i.e.,  by ℎ, ℎ by , etc.Hence, the total minimum power to achieve the pair of rates (  ,  ℎ ) is given by (15).

Dirty Paper Coding (DPC).
In the BLO of Section 3.1, the transmission to the two users mutually interferes.It is possible to eliminate interference to one of the users by making the other user 'phantom' via precoding techniques [25], Dirty Paper Coding (DPC) [26] in particular.Assume that the user of channel ℎ is made phantom by precoding the symbol transmitted to privileged user with channel  with respect to the symbol transmitted to phantom user of channel ℎ.The privileged user of channel  does not experience interference from the phantom user of channel ℎ and hence (8) reduces to The phantom user of channel ℎ experiences interference from privileged user of channel .If the power available to phantom user is significantly large compared to the power available to privileged user, then the interference experienced by the phantom user will not significantly degrade the SINR,  ℎ .With such a precoding, the choice of weighting vector  ℎ of the phantom user has no effect on the privileged user.Thus,  ℎ can be selected to maximize phantom user's rate.The choice  ℎ = ℎ/‖ℎ‖ achieves maximum rate.In this context, there is a freedom of choice for the weighting vector   .It can be computed based on one of three different criteria leading to polite, aggressive, and objective versions of DPC discussed below.

Polite DPC.
In this approach, the privileged user's weighting vector is selected such that the interference to phantom user is avoided.The term "polite" emphasizes the fact that the privileged user utilizing precoding to achieve interference free reception from the phantom user chooses the weighting vector to eliminate the interference to the phantom user by zero forcing i.e., by choosing   such that   ∘ℎ = 0. Gram-Schmidt ortho-normalization [27] provides , where cos  = | ∘ ℎ|/‖‖‖ℎ‖ is the channel coupling factor.Thus, total power is

Aggressive DPC.
The rate of privileged user can be maximized by aligning the weighting vector to the channel i.e., with   = /‖‖ and this approach is dubbed aggressive DPC.Here the privileged user achieves the best possible rate leaving the phantom user to achieve the best rate with interference.The total power in aggressive DPC is

Objective DPC.
In the objective mode of DPC, the weighting vectors are selected to maximize the sum rate for the available total power.The choice of weighting vector of the phantom user is similar to the previous two forms of DPC, i.e.,  ℎ = ℎ/‖ℎ‖.The choice of weighting vector   of the privileged user is so that to minimize the total power required to achieve the specified pair of rates.The power minimization process is as follows.Substitution of  ℎ = ℎ/‖ℎ‖ in (16) Hence, the optimal weighting vector is   =  −1 and the corresponding Δ =  −1 .It can be shown that  −1 = −  ℎ ℎℎ/‖ℎ‖ 2 (1 +  ℎ ).Thus, the total power is It can be observed that if the channels  and ℎ are mutually orthogonal, i.e., sin  = 1, the total power requirements for all three cases of ( 17)- (19) reduce to (20).This also is the minimum total power to achieve the pair of rates in the case of uncoupled channels discussed in Section 2.
On the other extreme when channels  and ℎ are highly coupled, sin  → 0 and cos  → 1.Total power requirement for Polite DPC of ( 17) grows without bound in this case deeming it an unsuitable strategy.However, the total power expressions of aggressive DPC of ( 18) and Objective DPC of ( 19) converge to a unique form given by 3.3.Zero Forcing Beam Forming.The approach here is to null out mutual interferences to both users.Graham Schmidt orthogonalization process leads to   = (1/‖‖ sin )( − ( ∘ ℎ)(ℎ/‖ℎ‖ 2 )) and  ℎ = (1/‖ℎ‖ sin )(ℎ − (ℎ ∘ )(/‖‖ 2 )).Total power requirement for such a zero forcing approach becomes When  and ℎ are mutually orthogonal, (22) reduces to (20) as in the case of uncoupled channel discussed in Section 2. This convergence is similar to that of DPC in all three forms.However, for highly coupled channels, i.e., when sin  = 0, the total power required to achieve any given rate pair grows without bound as with Polite DPC.
Figure 3 shows a comparison of achievable rate regions with transmission techniques discussed above when the channels  and ℎ are coupled, i.e., cos  ≫ 0. Rate region of time sharing is included in this figure (labeled TDMA) as a reference.
As can be observed in this figure, DPC (objective type) achieves the best rate region.Curve labeled DPC-order 1 is the rate region when the user with channel ℎ is made phantom.This order provides better performance when power available to user with channel ℎ is relatively larger compared to that of user with channel .The curve obtained with alternative order (DPC-order 2) performs better when the power available to user with channel  increases beyond certain share of total power.Further, there is a range of power sharing, in which adherence to only one order of DPC is not optimal.In this range, optimal sum rate is achieved by time sharing between two specific optimal powers sharing, each with the corresponding optimal DPC order.The solid curve labeled "DPC with Convex Hull" shows the combined optimal rate region achieved by switching between two DPC orders as well by including time sharing whenever appropriate.BLO (labeled best no DPC) results in the second best performance.It is seen that there are two ranges of power sharing; when the asymmetry in power levels of two users is high, time sharing between single user transmission and an optimal power shared simultaneous transmission is better than BLO alone.ZF yields the third best performance.Here too, time sharing is beneficial to achieve better performance.Convex hulls of optimal rate regions are included for BLO and ZF.
Figure 4 illustrates the trends in the rate regions of objective DPC (labeled full DPC), BLO (labeled best, no DPC), and ZF as the coupling level between channels  and ℎ reduces.The three sets of curves have been obtained, respectively, for number of base station antennas  = 4, 8 and 16.Nevertheless, the convergence depends only on the coupling factor cos  as concluded from ( 17)- (22).As the value of cos 2  changes from 0.4752 to 0.1004, the rate regions of different weighting vector design strategies get closer to each other.At a value of 0.0089, the rate regions converge to one and the same region.At such a level of coupling, transmitting using simple power sharing without paying attention to mutual interference would perform as good as any other method, if it is a high bit rate transmission.For low bit transmission, simple time shared transmission is sufficient to achieve rates close to capacity, as discussed in Section 2. Figure 5 exemplifies the performance trends of objective DPC, aggressive DPC, ZF, and time sharing (labeled TDMA) in highly coupled channels, i.e., cos  → 1.Since the power level in this example is high, it corresponds to high bit rate transmission.It can be seen that the rate region of time sharing approaches DPC rate region.As can be concluded from this figure as well as (17) and (22), performances of DPC in polite mode and ZF become far inferior to all other approaches because sin  → 0 as channels become highly coupled leading to large minimum total power requirements to achieve any pair of rates.
Intuitively and as evident from the above example, time sharing is the best approach when channels are highly coupled regardless the power levels (transmission rates).Previously it was concluded in Section 2 that time sharing is the best approach for low bit rate (low power) transmissions in uncoupled channels.

Channel Coupling Statistics
In the previous sections, performances of time sharing, power sharing, and several interference minimization techniques were studied using instantiations of channels drawn from complex Gaussian probability distribution.The instantiations were selected with channel coupling factor (cos ) values to represent low, medium, and high levels of channel coupling so that to investigate the relative performances of signal weighting vector design strategies for channels of different levels of coupling.In this juncture, it is important to explore the probability distribution of channel coupling factor in multiuser wireless channel for different number of antennas at base station.This provides insight on the requirements of massive MIMO system.Figure 6 shows the cumulative distribution of cos  in dB scale.The set of curves has been obtained for number of base station antennas from 4 to 1024, doubling the number of antennas each time.It is observed that, at any given cumulative probability in the range 0-1, the coupling factor reduces by approximately 3dB, i.e., the coupling factor halves, for each doubling of antennas.Table 1 summarizes the variation of 99% coupling factor for quadrupling of the number of antennas.Observing the results in this table and Figure 4, it can be concluded that number of base station antennas in the order of 256 may ensure sufficiently low coupling factors among pairs of channels for 99% of the channel realizations such that a multiuser transmission with simple power shared simultaneous transmission can achieve throughput close to capacity.

Conclusion
This paper presented key results on the downlink transmission capacity of massive MIMO wireless system in terms of the number of base station antennas and number of user devices.The capacity and the simplicity of transmission techniques in achieving the capacity are attributed to the ability of massive MIMO system to naturally uncouple multiuser MISO channels.However, when in a multiuser MISO channel, the ratio of the number of base station antennas to number of users is not sufficiently large, channel instantiations with medium to high levels of coupling can occur with significant probability.In such scenarios, transmission vector can be optimized by selecting the best design technique to match the prevailing channel coupling level.
In other words, a hybrid approach with flexibility to switch between several weighting vector design methods is useful.Interference minimization strategies are efficient in achieving the available capacity when the channel coupling factor is significantly large.Table 2 summarizes signal vector design strategies appropriate for different channel coupling levels and transmission rates.For low bit rate transmission, time sharing with full power can achieve throughput close to system capacity, which is independent of the channel coupling level.However, the coupling factor plays a major role in deciding the capacity for high bit rate transmission.When coupling factor is high, time sharing is the best approach even for high bit rate transmission.When the coupling factor is not so high but not negligible, interference minimization or avoidance methods such as DPC, BLO, and ZF is necessary to achieve optimal performance.With negligibly small coupling factors, simple alignment of the signal transmission vector to respective channels of users is the best approach for high bit rate transmission.A massive MIMO system is one that can almost always realize channels with negligible channel coupling factors due to the excessive number of base station antennas compared to the number of users to be served.This ability simplifies many of the design and implementation tasks.
As highlighted in this paper, massive MIMO system increases the capacity and simplifies signal transmission vector design process to achieve throughput reaching the capacity.Massive MIMO system boasts many other merits which are not addressed in this paper.Examples are robustness, reliability, and use of low cost hardware.Among the challenges faced in realizing the full potential of massive MIMO are computational complexity, efficient distributed processing algorithms, and synchronization of the antenna units.

Figure 1 :
Figure 1: Rate regions with time sharing and power sharing at low bit rates (low power) transmission.

Figure 3 :
Figure 3: Comparison of rate regions with significantly high channel coupling.

Figure 4 :
Figure 4: Convergence of rate regions as coupling factor is reduced.

Figure 5 :
Figure 5: Rate regions with high channel coupling.