A Survey: Non-Orthogonal Multiple Access with Compressed Sensing Multiuser Detection for mMTC

One objective of the 5G communication system and beyond is to support massive machine type of communication (mMTC) to propel the fast growth of diverse Internet of Things use cases. The mMTC aims to provide connectivity to tens of billions sensor nodes. The dramatic increase of sensor devices and massive connectivity impose critical challenges for the network to handle the enormous control signaling overhead with limited radio resource. Non-Orthogonal Multiple Access (NOMA) is a new paradigm shift in the design of multiple user detection and multiple access. NOMA with compressive sensing based multiuser detection is one of the promising candidates to address the challenges of mMTC. The survey article aims at providing an overview of the current state-of-art research work in various compressive sensing based techniques that enable NOMA. We present characteristics of different algorithms and compare their pros and cons, thereby provide useful insights for researchers to make further contributions in NOMA using compressive sensing techniques.


Introduction
The fifth-generation (5G) wireless communication system envisions three major use cases: (i) enhanced mobile broadband (eMBB), (ii) ultrareliable and low latency communication (URLLC), and (iii) massive machine type communication (mMTC). The enhanced mobile broadband is characterized by ubiquitous coverage with a peak data rate of 20 Gbps and a latency of less than 10 ms [1]. The ultrareliable and low latency communication is focused on enabling a variety of mission-critical applications and tactile Internet applications, such as autonomous vehicle, remote industrial control, and remote manufacturing with a latency requirement of less than 1 millisecond [2]. The mMTC is aimed at providing connectivity to massive low power and low data rate Internet of Things (IoT) devices which will open up enormous business opportunities in, e.g., building automation, smart agriculture, smart cities, and fleet management. It is expected that the number of such devices will rise to 83 billion by 2024 [3] which will result in the number of connections increased to one million per square kilometer [4]. In order to accommodate such huge number of devices, the network capacity needs to be significantly improved.
The current orthogonal multiple access (OMA) such as time division multiple access (TDMA), frequency division multiple access (FDMA), and code division multiple access (CDMA) serves a single user in each orthogonal resource block. Therefore, the maximum number of simultaneously supported devices in an OMA scheme is limited by the number of orthogonal resources. For example, in orthogonal CDMA where orthogonal codes are assigned to the users, the maximum number of connected devices cannot exceed the spreading factor. This orthogonality constraint makes the OMA schemes highly spectral inefficient for mMTC in 5G. Figure 1 shows resource allocation in OMA schemes where the resources assigned to the users are orthogonal to each other. Due to this orthogonality constraint, the current OMA-based long-term evolution (LTE) [5] can only support a fraction of the anticipated number of devices for future mMTC with its control channel element [6]. Furthermore, the sporadic transmission of small data packets in mMTC requires minimum control signaling overhead whereas LTE has a high cost of signaling overhead and high channel access latency for small data transmission. Therefore, connecting massive number of resource-constrained devices to the network requires a paradigm shift in the multiple access technique.
Nonorthogonal multiple access (NOMA) is a promising technique for massive connectivity which has attracted tremendous interest from researchers from both academia and industry. The NOMA schemes assign nonorthogonal resources to the users and therefore enable system overloading, i.e., allowing users to more efficiently share the same resources. The overloading capability of the NOMA scheme is characterized by the overloading factor, which is the ratio of the total number of nodes to the total available orthogonal resources. At the receiver, advanced multiuser detection (MUD) techniques are deployed to separate the users. The NOMA schemes can be categorized into two main groups: the power domain NOMA and the code domain NOMA. Note that node and user are used interchangeably in this paper.
1.1. Power Domain NOMA. In power domain NOMA, the users are assigned with different power levels which enable them to share the available resources, i.e., time, frequency, or code [7,8]. At the receiver, successive interference cancellation (SIC) is used for multiuser detection, which differentiates the users according to the assigned power levels. The resource allocation in power domain NOMA is shown in Figure 2. It is depicted that at a given frequency, more than one user can transmit using different power levels.
It is shown analytically in [9] that the NOMA outperforms the OMA techniques in terms of outage performance and ergodic sum rate. However, the user data rate and allocated power should carefully be chosen as the outage performance critically depends on them. NOMA improves the bandwidth efficiency; however, the fact that the cell-edge users, i.e., the users that are far from the BS, have poor channel conditions than the cell-centered users, i.e., users which are closer to the BS, causes performance degradation to the cell-edge users. This effect is mitigated by using cooperative NOMA [10]. In cooperative NOMA, the users having better channel conditions are paired with users having poor channel conditions. Figure 3 depicts a downlink scenario of cooperative NOMA in which a pair of users receives the superimposed signal in the first time slot. In the second time slot, the cell-centered user acts as a relay and forwards the received signal to the cell-edge user. The whole task is completed in two time slots using cooperative NOMA which would take three time slots in cooperative OMA [11].
Cooperative NOMA improves the performance gain of NOMA; however, due to the extra process of relaying, the cell-centered users suffer from battery drainage. In machine-type communication, the battery life is of significant importance as the MTC devices are mostly low power and the batteries once installed are rarely changed. To cope with this issue, in [12], the concept of wireless energy transfer, i.e., the transfer of energy from the source to the destination through the air, is combined with cooperative NOMA. The proposed protocol is named as cooperative nonorthogonal multiple access with simultaneous wireless information and power transfer (cooperative SWIPT NOMA). In cooperative SWIPT NOMA, the cell-centered users harvest energy from the base station, which is used for the extra step of relaying, hence, reduces the battery drainage. Another variation in power domain NOMA is the introduction of cognitive radio (CR) concept [13,14]. The conventional power domain NOMA ensures the user fairness; however, it cannot    Figure 4 by the overlapping parts of the codes. The correlations between the codes increase the probability of errors in detecting the active users at the receiver. However, advanced multiuser detection techniques such as message passing algorithm, minimum mean square error, and SIC are used to efficiently recover the transmitted data. The prominent code domain NOMA schemes include sparse code multiple access (SCMA) [15][16][17][18][19], pattern division multiple access (PDMA) [20], and multiuser shared access (MUSA) [21]. SCMA and PDMA are codebook-based schemes, in which a unique codebook is assigned to each user and a message-passing algorithm (MPA) [22] is used at the receiver for multiuser detection. In MUSA, the users are separated by assigning low correlated spreading sequence at the transmitter and SIC is used at the receiver for MUD. An overview of the code domain NOMA also called as signature-based NOMA is given in [23]. All these NOMA schemes enable system overloading and hence facilitate the massive connectivity. However, all the nodes within the vicinity of a base station (BS) are assumed active in these schemes while in fact in mMTC, a small fraction of the total nodes is active at a time. Moreover, in mMTC, the data packet is of small size, for which a grant-free medium access control mechanism is indispensable to reduce the control signaling overhead.
To overcome the limitations of the code domain NOMA and to more efficiently exploit the MTC traffic pattern, recently, compressive sensing-based multiuser detection (CS-MUD) techniques are used for enabling the code domain NOMA. In order to accommodate large number of nodes within the available resources, nonorthogonal spreading sequences are assigned to the nodes. At the receiver, both the activity and data are jointly detected by using CS-MUD techniques. The CS-MUD exploits the fact that the mMTC transmission is sporadic, i.e., a small number of nodes are active at a time, which allows the compressive sensing techniques to detect the activity at the receiver. The nonorthogonal spreading sequences serve as signatures for the users, which are used to distinguish the active users at the receiver. In NOMA with CS-MUD, the users transmit their data directly using the spreading sequences, thereby, avoiding the control signaling overhead to access the channel.

Contributions.
In the current literature, both the power domain NOMA and the code domain NOMA schemes for 5G are well investigated and surveyed. The NOMA with CS-MUD is a potential code domain NOMA scheme to meet the mMTC requirements in the 5G wireless system. It has the capability of accommodating massive number of devices with no signaling overhead with comparatively less complex multiuser detection. However, there is no review or survey article, which describes the NOMA with CS-MUD. In this paper, we give an overview of the current research trends within this field and provide useful insights, which will help the researchers to better understand the NOMA with CS-MUD. The active research areas are classified into three main categories and the progress made in    [45][46][47][48][49][50] 3 Wireless Communications and Mobile Computing each domain is presented. The first category comprises the development and improvement in designing efficient detection algorithms for the CS-MUD. The second category is the CS-based medium access, i.e., how the signatures can efficiently be assigned to the users to gain maximum output in terms of increasing the overloading capability and improving the activity and data detection. The last category combines CS-MUD with other techniques in order to utilize the advantages of different schemes. Different approaches towards nonorthogonal multiple access with CS-MUD are compared, and their pros and cons are explained. Table 1 categorizes the publications in the literature based on the above-mentioned categories.
The paper is organized as shown in Figure 5. Section 2 introduces the compressive sensing basics. In Section 3, two different ways to formulate the nonorthogonal CDMA with CS-MUD are described. Section 4 compares the different CS-MUD algorithms and explains their pros and cons. Section 5 presents the variants of CS-based nonorthogonal multiple access schemes and shows their connections and differences. Section 6 presents the combination of CS-MUD with other techniques. Section 7 compares the representative CS-based MUD schemes. Finally, Section 8 concludes the paper.
Notations. In this paper, all boldface uppercase letters represent matrices such as S, while all lowercase boldface letters represent vectors such as s and x. The set of binary, integers, and complex numbers are represented by B, ℤ, and ℂ, respectively. Italic letters such as k and x represent variables. Uppercase letters such as K represent constant values.

Compressive Sensing Basics
Compressed sensing (CS) is a signal processing technique which samples a sparse signal at a rate much less than the Nyquist rate [51]. A signal x ∈ ℂ K×1 is said to be K a -sparse   Wireless Communications and Mobile Computing if it has only K a nonzero elements, K a ≪ K. The signal x can be sparse with respect to any basis Φ. Let z = Φx be a compressible signal with respect to Φ. The CS encoding process produces measurement y ∈ ℂ N×1 by a measurement matrix Ψ ∈ ℂ N×K , K a < N < K, where n is the background noise vector. The measurement matrix Ψ and the basis Φ should have low coherence. The reconstruction of vector x is finding a sparse vectorx which satisfies Equation (1). As Equation (1) is an underdetermined system of equations, the reconstruction of vector x is formulated aŝ where k:k 0 is the l 0 norm which simply gives the total number of nonzero elements in the vector. Equation (2) is a nonconvex optimization problem, which is known to be NP-hard to solve. Certain relaxation approaches are used to solve Equation (2) such as basis pursuit denoising [52] in which the l 0 norm is relaxed to l 1 norm. The l 1 minimization is a convex optimization problem which recovers the signal from an undermined system of equations with higher accuracy at the cost of higher complexity of cubic order. Another category of compressive sensing reconstruction algorithms is greedy algorithms. In greedy algorithms, e.g., orthogonal matching pursuit (OMP), the support ofx is obtained iteratively by selecting the column of Ψ which has maximum correlation with the residual. The residual is initialized to y and is updated in each iteration of OMP. Once the support is obtained, the corresponding data is estimated by using least square estimation. The greedy algorithms have lower complexity compared to the basis pursuit algorithms at the cost of slightly poor performance.

Nonorthogonal Multiple Access with CS-MUD
In mMTC, the number of simultaneously active users is far less than the total number of users in a cell [40]. A typical uplink mMTC system is depicted in Figure 6, in which a total of K nodes are in the range of a base station (BS) and each node is active with an activity probability, p a ≪ 1. Due to this low activity probability, out of K nodes, only a small portion of the nodes are simultaneously active. Compressive sensing multiuser detection exploits this sporadic nature of mMTC to enable nonorthogonal multiple access at the transmitter. In the context of CDMA scheme, the nonorthogonal multiple access is realized by assigning nonorthogonal spreading sequences to the nodes. Two models in the literature are used to formulate the nonorthogonal CDMA as a compressive sensing problem: single measurement vector-based compressive sensing (SMV-CS) and multiple measurement vector-based compressive sensing (MMV-CS).
In the SMV-CS model, a one-shot transmission is considered. The received signal, y, is a vector which consists of the superimposed symbols of the active nodes. Figure 7 is a typical SMV-CS representation of an uplink mMTC. The vector d consists of the data symbols of K users, each with a frame length of N d . Out of K users, only a small fraction, K a , is active. The matrixÃ ∈ ℂ L ′ ×KN d contains the influences of channel matrix, H, and spreading matrix, S, k is the number of channel taps of user k, τ k is the relative delay, and N s is the spreading factor [27]. In the SMV-CS model, when the number of users increases, the size of the sensing matrix,Ã, becomes huge which leads to poor sampling matrix properties and therefore limits the scalability of the system in terms of hardware cost, memory storage, and detection speed.
In MMV-CS, the received signal is modeled as a matrix instead of a single vector as shown in Figure 8.
Each row vector of matrix D ∈ ℂ N s ×N d in Figure 8 represents the data frame of a single user, and each column vector represents the symbols from all users at time instant, t. It is clear that with the MMV-CS model, the size of the sensing matrix A ∈ ℂ N s ×K only depends on the number of users, in other words, independent of the data frame size. Hence, compared with the SMV-CS model, the MMV-CS model can better mitigate higher complexity due to the growing number of users. Active Inactive Figure 6: Machine-type communication scenario. Figure 7: Single measurement vector-based compressive sensing model for nonorthogonal multiple access.

Wireless Communications and Mobile Computing
In [32, 34-37, 43, 53], MMV-CS is used to represent the nonorthogonal CDMA in the uplink mMTC scenario while the SMV-CS model is used to describe the system in all other cited papers in this article.

CS-MUD Algorithms
Since the introduction of exploiting the sparsity in mMTC, different CS-MUD algorithms have been proposed. These MUD techniques can be categorized into the following two groups: maximum a posteriori probability-(MAP-) based algorithms and greedy algorithms.

Maximum A Posteriori-Based Algorithms
4.1.1. Sparsity Aware MAP-Based Algorithms. In [54], for the first time, the possibility of exploiting the sporadic nature of user activity for multiuser detection was introduced. For the model, where n ∈ ℂ N s ×1 is the AWGN noise vector, a sparsity-aware maximum a posteriori (S-MAP) criterion is used, and the detection process is formulated as [54] d∧ MAP = arg max d∈A K where and A is the modulation alphabet. Two approaches were followed to solve the detection process in Equation (4): relaxed S-MAP and S-MAP with lattice search. In relaxed S-MAP, it is assumed that A = f±1, 0g, which makes Equation (4) equivalent to [54] d∧ MAP = arg min d∈A K a 1 2 Algorithms were designed for p = 1 and p = 2. Ignoring the finite-alphabet constraint, the optimal solution for p = 1 is a quadratic programming problem while for p = 2, it takes a linear form. The relaxed S-MAP multiuser detection is suboptimal but has advantage of low complexity. In the second approach, i.e., S-MAP detectors with lattice search, the alphabet A is defined as A = f±1,±3, ⋯,±ðM − 1Þg, with M even. Using the QR decomposition, the S-MAP problem is reformulated and the elements of vector d are obtained by searching over a subset of the alphabet. The performance of the detector improves at the cost of higher complexity. The algorithms presented aim at fully loaded (N s = K) CDMA system; however, the lattice search-based algorithms can have fair performance for moderate overloaded (N < K) case.
In [32], the MMV-CS model was considered to reduce the complexity of the receiver and increase the computation speed. The activity is detected based on the covariance matrix of the received signal Y ∈ ℂ N s ×N d which is given as [32] where Φ WW is the sample noise covariance matrix, S is the spreading matrix, and V = EðDD H Þ is the covariance matrix of the transmitted multiuser frame. The kth user is active if the kth diagonal element of V is 1. A MAP detection is used to obtain the positions of nonzero elements in V. The complexity of the proposed algorithms is invariant to the length of the frame, N d ; however, the performance of the algorithms is dependent on N d and improves with increasing N d . It is derived that for reliable activity detection, the length of the sequence should be greater than the square root of the number of nodes [32]. Furthermore, the simulation results show that the algorithms can handle asynchronous frames where the time shifts are up to 60% of the frame duration.

Sphere Decoding Based MUD.
In [55], the MAP detection for nonorthogonal CDMA is formulated aŝ where Pr fdg is the probability distribution for the vector d and σ 2 is the noise variance. In [54], a simple Bernoulli traffic model was considered for the probability distribution of d in which each user is independently active with a probability P a ≪ 1. To consider a more realistic mMTC traffic, the Poisson model was considered instead of Bernoulli in [55] and it is shown that σ 2 log fdg is monotonically increasing. A sorting algorithm is also proposed which reduces the searching levels of the sphere decoding by sorting the sensing matrix according to the correlations with the received signal. The proposed multiuser detectors give optimal performance; however, the detection complexity is still high when the number of nodes is enormous. Moreover, the detection algorithm exploits the sparsity for joint activity and data detection in a fully loaded system where N s = K and does not consider the overloaded system. In CS-MUD, the activity is detected at the receiver and the detection errors, i.e., false alarms and miss detections, Wireless Communications and Mobile Computing lead to significant performance loss. The activity detection therefore becomes a crucial step. In the context of sphere decoding, the mitigation of detection errors is addressed in [24]. The author proposed a Neyman-Pearson-based approach [56] to reduce the detection error. Analytically, the problem is formulated as finding an optimal threshold t * which minimizes the probability of false alarm, P F a , while keeping the probability of miss detection, P M d , below a predefined threshold, η [56] t * = arg min P F a t ð Þ s:t: The calculation of probabilities in Equation (9) is based on the activity log-likelihood ratios of the received symbols, which are estimated by using sphere decoding.

K-Best
Detector for Sphere Decoding. The sphere decoder-based MUDs are considered to achieve the maximum a posteriori probability (MAP) performance in terms of BER [54]. However, there are two main disadvantages of sphere decoding, i.e., it has an exponential complexity [57] and it is not possible to parallelize the computations [58]. In [59] to address these issues, a K-best detection approach is followed. The proposed algorithm reduces the search paths by performing iteratively a breadth-first tree search, which selects only K paths depending on the least metric. Note that in K-best detection, K denotes the number of selected paths and does not represent the number of nodes. The K-best detection reduces the complexity by reducing the number of search paths and facilitates the parallelization of computation by fixing the number of search paths. However, the limited coverage of the search tree results in an error floor and increases the BER. Moreover, for incorporating higher overloading, the number of paths needs to be increased which results in higher complexity.

Greedy Algorithms.
The main disadvantage of the convex optimization and sphere decoding-based algorithms is the higher complexity, which restricts its implementation in mMTC environment. To reduce the complexity of MUD, greedy algorithms were introduced for CS-MUD.

Orthogonal Least Square and Orthogonal Matching
Pursuit. To separate the nodes at the receiver exploiting the sparsity at symbol level, in [25], orthogonal matching pursuit (OMP) and orthogonal least squares (OLS) are used for MUD, which iteratively select the most probable active users and subsequently estimate their data. For both OMP and OLS, the spreading influence, S, in the sensing matrix, A, is known at the base station; however, the influence of channel, H, can be estimated. At each iteration of OMP, the column of sensing matrix, A, which has higher correlation with the received signal is selected and the corresponding user is detected as active. In case of OLS, the selection is based on the minimum least square distance instead of correlations. OLS is more robust to errors than OMP at the cost of relatively higher complexity. However, as the greedy algorithms iteratively estimate the activity and data, it may suffer from error propagation.

Block-OLS and Group-OMP.
The activity detection of CS-MUD using greedy algorithms was improved by exploiting the fact that when a node is active it transmits several bits in the current frame, therefore, makes the multiuser signal frame as block sparse [6,26]. In [26], the OLS is extended to block-wise orthogonal least squares (BOLS) algorithm which detects the active user for a block of transmitted symbols. The activity is detected based on the sum of the minimum Euclidean distance of the spreading sequences from the block of N d symbols of the received signal. In the group orthogonal matching pursuit (GOMP) for CS-MUD [6] instead of considering only the maximum correlated column of the spreading matrix with the received signal, the sum of the correlations of the N d received signals with the spreading sequence is set as the selection criteria which improves the activity detection. In [6] to avoid the complexity of matrix inversion for larger N d , the frame is divided into subframes and parallel detectors are deployed at the receiver to detect the signal. Furthermore, it is shown that false alarms, which are less critical as compared to miss detections, are reduced by incorporating activity-aware Viterbi decoder, which acts as a decision device for the activity of user on frame level. The activity detection was further improved by deploying a weighted GOMP (wGOMP) algorithm in [27]. In the wGOMP algorithm, the weights are generated by the channel decoder as [27] where ξ C ðd k Þ is the Euclidean distance of the most likely codeword; the term ξ ε ðd k Þ = ∥d k ∥ 2 gives an indication about the activity of node k. ΓðkÞ contains the vector indices corresponding to group k, and N d is the length of the vectord. The activity detection is enhanced by multiplying these weights with the correlation in the group selection step of GOMP. The symbol error rate (SER) of the CS-MUD by deploying wGOMP improved by magnitude of one as compared to GOMP and met the oracle least square performance at SNR = 20 dB [27].
In [34], the MMV-CS model was considered and the simultaneous orthogonal matching pursuit algorithm (SOMP) is proposed. The algorithm is similar to the GOMP algorithm and detects the support for a group of symbols. It is assumed that the sparsity remains the same for the whole frame.

Iterative Order Recursive Least
Square. The limitation of GOMP is that its complexity increases exponentially with the group size. To reduce the higher complexity associated with the frame length, the iterative order recursive least square (IORLS) algorithm is proposed in [31]. IORLS iteratively employs the OMP algorithm and, therefore, avoids the computations of group correlations. The OMP node selection criteria are also enhanced by multiplying a weight matrix, W, with the selection metric, i.e., the correlations of the sequences with the received signal. The weight matrix is a diagonal matrix where each entry, W n,n , represents the 7 Wireless Communications and Mobile Computing number of symbols for which the nth sequence is detected as active in the previous iteration, 1 < n < N (number of sequences). Moreover, the matrix inversion in OMP is replaced by order recursive least square to further reduce the complexity. The complexity of IORLS increases linearly with the number of iterations while that of GOMP increases exponentially with the number of symbols in a subframe. The performance of IORLS is dependent on the length of the frame and the number of iterations.

Structured Matching
Pursuit. In the algorithms, which exploit block sparsity, the general assumption is that a node is active for several consecutive symbols. However, in mMTC, some nodes may be active for several consecutive symbols while others may be active for a less number of symbols. Therefore, the active node set is changing within the duration of a frame. This structured sparsity in multiuser signal is considered in [28,29] to improve the MUD. It is assumed that a portion of the active users remain active for several continuous time slots (common active users) while others change at each time slot (dynamic active users). In [28], it is assumed that the number of common active users is known a prior. An algorithm called the structured matching pursuit (SMP) algorithm is proposed for MUD. In SMP, the common active users are detected first based on the sum of their energies over the specified continuous time slots. After the common active users are detected, the residual (initialized to the received signal) is updated by subtracting the effect of the common active users. The dynamic active users are then detected at each time slot using OMP like selection criteria. The algorithm is compared with OMP, which detects the activity on a symbol-by-symbol base, and it is shown that the performance is improved by magnitude of one at SNR 3 dB [28]. In [29], the idea was extended to estimate the data efficiently by exploiting this temporal correlation in the user activity. In the proposed  [29], the active users are detected at symbol level and the support obtained for one symbol is used as an initial support for estimating the data at the next time slot. The proposed algorithm significantly improves the data estimation as compared to the conventional symbol-by-symbol OMP data estimation.

Matrix Matching Pursuit.
Considering the MMV-CS model, besides the MAP-based algorithm in Section 4.1.1, the authors in [32] also presented a greedy algorithm called matrix matching pursuit (MMP) for the activity detection. MMP is an extension of OMP, which selects the active users based on the maximum correlation of the spreading sequences with the sample covariance matrix, Φ YY , of the received matrix Y in Equation (7). The proposed algorithm improves the activity detection with the complexity invariant to the length of the frame. However, in [32], the proposed algorithms are only for activity detection and a separate data detection has to be implemented once the support is obtained. Moreover, the performance evaluation considers only the AWGN channel and the effect of fading channels is not incorporated. Some parameters considered for the evaluation are not realistic in the context of mMTC, e.g., mMTC transmissions are of a few bytes while here a frame of 1000 bits is considered. The activity probability for mMTC is p a < 0:1 while in [32], p a = 0:35 is considered for evaluating the effect of frame size. A summary of the CS-MUD algorithms is given in Table 2.

CS-Based Medium Access
In mMTC, the small data packet size necessitates the use of request grant-free medium access scheme. CS-MUD is introduced to facilitate a grant-free multiple access scheme in the physical layer. To enhance the medium access scheme, many solutions are proposed in the literature, which can be categorized into the following categories.

Baseline Nonorthogonal Medium Access.
Most of the current research considers the conventional CDMA-like medium access in the physical layer, where a dedicated spreading code is assigned to each node in time domain. These code sequences act as signatures to distinguish the active nodes. The data frame of the node consists of only the payload, and no extra control signaling is needed for medium access. As every node has dedicated sequence, there is no collision in accessing the medium access; however, the sequences are nonorthogonal and have some correlation. Authors in [35][36][37] improved the medium access scheme by introducing the CS-MUD in multicarrier CDMA which is named as multicarrier compressed sensing-based multiuser detection (MCSM). The proposed scheme improves the scalability and flexibility of accessing both the time and frequency resources. The spectral efficiency of the system can be improved by reducing the number of subcarriers per node. To gain frequency diversity in MCSM, a scheduling technique for allocating the subcarriers is introduced in [36]. In the proposed scheduling scheme, the set of allocated subcarriers changes after a predefined number of symbols. Furthermore, the subcarriers are allocated such that they lie within the coherence bandwidth of the channel, which facilitates the use of noncoherent modulation and avoids pilot transmissions for channel estimation. A hardware implementation is also demonstrated in [37].
Although the nonorthogonal medium access scheme with CS-MUD facilitates the grant-free medium access and increases the spectral efficiency, the main challenge is the activity detection in massive mMTC. In CS-MUD, the activity detection is dependent on the correlation between the spreading sequences. For a fixed length of the spreading sequence, increasing the number of sequences increases the correlation between sequences. Therefore, in case of massive mMTC where the number of devices is higher, allocating low correlated sequences to each node will be a challenging task.

Enhanced Nonorthogonal Medium
Access. The medium access scheme in the MCSM is enhanced in [43] by introducing spreading sequence diversity. Each user uses two spreading sequences from the sensing matrix, A, to spread their data frame d. Half of the data symbols are spread over one sequence and the other half over the other as shown in Figure 9.
At the receiver, for MUD, the correlation between the received symbols and the spreading sequence is now averaged over two sequences, which results in more accurate activity detection. Due to this spreading diversity, the MUD in a nonorthogonal multiple access scheme is improved for a given number of users. In other words, more number of users can be accommodated while maintaining the same BER, as compared to the single sequence spreading.
Although the proposed enhanced nonorthogonal medium access scheme improves the performance and makes the possibility of increasing the overloading, it still has dependency on the correlations of the spreading sequences for accurate activity detection. In [38,39], the problem of scarcity of low correlated spreading sequences for massive number of nodes is addressed and multiple sequence-based medium access is proposed. In the proposed scheme, the data frame of the kth user, x k , is divided into R = N d /v subframes, where v is the number of symbols in one subframe such that N d mod v = 0. Instead of allocating a dedicated single sequence, sets of sequences are generated from a pool of sequences. Each user randomly selects a set of sequences, S k ∈ ℂ N s ×ν , from the pool of sequence sets. These sequence sets are used to spread the subframes which are then transmitted as depicted in Figure 10. Another variant of the multiple sequence-based schemes which is named as sequence block-based CSMUD (SB-CSMUD) is proposed in [61]. Unlike MS CSMUD, here, the authors have used dedicated multiple sequences for the users which eliminate the probability of collision.
The activity detection in the multiple sequence-based transmission improves due to the averaging of the correlations between the sequences as compared to a single sequence. As the medium is randomly accessed and there is no dedicated sequence, the data frames are also embedded with the frame IDs, which are used to identify the active nodes. The multiple sequence scheme improves the activity detection and limits the problem of scarcity of sequences by using random access. However, as the users select the sequences randomly, there is a probability of collision and increasing the number of sequence sets increases the correlations, which consequently increases the probability of detection errors. Moreover, the large number of sequence sets also increases the complexity of the multiuser detector.

Codebook-Based Nonorthogonal Medium Access.
In [53], a codebook-based nonorthogonal CDMA scheme is proposed. The modulation and spreading processes are merged together into a direct symbol-to-sequence spreader as shown in Figure 11.
Each user is assigned with a unique codebook and the incoming bits are directly mapped to a codeword in the codebook. The codewords are the spreading sequences, which are designed such that they have low correlations. The spreading sequences in the codebook are considered as codewords drawn from a multidimensional constellation; therefore, for higher modulation scheme, the Euclidean distance between the codewords does not decrease much as compared to the two-dimensional constellation. Therefore, the required SNR at higher modulation schemes (i.e., modulation order greater than 4) is less as compared to that of the scheme, which involves PSK modulation.
The codebook-based CDMA with CS-MUD simplifies the encoding process at the sensor nodes, which results in long battery life and increases the spectral efficiency at the cost of minimum performance loss. However, the proposed scheme is sensitive to multiple access interference and a more robust decoding scheme is required for situation where the number of active users is high.

Channel Division Medium
Access. The increase of correlation between the sequences with the increase in the num-ber of devices limits the performance of the NOMA, which uses spreading sequences as the signature. Moreover, for situation where there is massive number of nodes, it would be a challenging task to assign dedicated sequences to each node. In [41], a channel division multiple access scheme is proposed which uses the channel state information (CSI) as signature for separating the active users at the receiver. The active users transmit pilot sequences, which are randomly chosen from a set of sequences. At the receiver, the CSI is estimated and consequently, the active users are detected based on their respective CSI. As the pilot sequences are randomly selected, there is a probability of pilot sequence collision. The effect of pilot collision is reduced by sending multiple pilot sequences within a frame. The frame structure of the proposed scheme is depicted in Figure 12, where each frame of a node consists of T p pilot blocks and T d data blocks.
This approach allows the system to overload without the limitation of the number of spreading sequences. However, to separate the users' data patterns, the CSI of users should be highly uncorrelated which means that the channel impulse response should be of larger size. The proposed scheme will not work for a flat fading channel because it would not be possible to differentiate the CSI of different users.
A similar scheme is also adopted in [44], in which a number of possible Zadoff-Chu and power residue-based pilot sequences are calculated. For Zadoff-Chu, the possible number of sequences in sequence set is P = ðN s − 1ÞðN s /V t + 1Þ, where N s is length of sequence, and V t is the smallest integer greater than the number of channel taps. For power A summary of the CS-based medium access schemes is presented in Table 3.

Combination of CS-MUD with Other Techniques
The method of CS-MUD was combined with other different techniques to enhance the overall performance of the machine-type communication system. The nonorthogonal low-density signature-based orthogonal frequency division multiplexing and CDMA (LDS-OFDM/CDMA) scheme exploits the sparse signatures, which allows the use of a relatively low complexity MPA to approximate the MAP detection. However, the basic assumption in MPA is that the activity is known at the receiver, which requires a higher control signaling overhead. In [45], CS-MUD is combined with MPA receiver and CS-MPA receiver is proposed to optimize the activity and data detection. An active node transmits a dedicated dense spreading sequence before transmitting the actual symbols spread over the sparse LDS signatures. The multiplexed data frame of all active nodes constitutes y 1 and y 2 . y 1 represents the superimposed dense signatures from all active nodes while y 2 is the superimposed sparse LDS signatures over which the data symbols are spread. At the receiver, the CS-MUD takes y 1 as input and uses a compressive sensing algorithm to detect the active users prior to the MPA detector. The support obtained by the CS-MUD and the y 2 are fed into the MPA to efficiently estimate the data. The detection process is depicted in Figure 13.
When the activity is not known, the MPA has to search for all the possible combinations of data colliding at a subcarrier, which increases the complexity as well as degrades the performance. In the CS-MPA receiver when the support is fed into the MPA detector, the possible combinations for data estimation are reduced which enhances the data estimation accuracy as well as reduces the complexity. The CS-MPA is optimized in [46] by incorporating a two-stage CS-based activity detection. At the first stage, a correlation-based activity detection is carried out. The approximated support obtained at the first stage is fed into the second stage which executes a compressive sampling matching pursuit (CoSaMP) [62] algorithm. The CoSaMP uses the y 1 and the estimated support in the first stage as prior information and detects the activity more accurately.
The LDS-based nonorthogonal schemes are capable of accommodating more users on less physical resources when all users are active while the nonorthogonal CDMA with CS-MUD only works if a small number of the users are active. For example, for 6 users sharing 4 physical resources, using the LDS-based NOMA, when all 6 users are active, the MPA algorithm is capable of separating the data of all users. On the other hand, the CS-MUD cannot separate users when the number of active users is greater than half of the number of physical resources. However, in mMTC where only small number of users are active, the LDS-based NOMA requires control signaling overhead for medium access while nonorthogonal CDMA with CS-MUD does not need any control signaling.
Considering the variations in the sparsity level of the multiuser signal, in [50], a switching mechanism between the CS-MUD and the classical multiuser data detection is  Figure 12: Frame structure of channel division medium access transmission. 11 Wireless Communications and Mobile Computing proposed. A threshold for the sparsity level based on the OMP algorithm is derived. The multiuser detector uses the OMP algorithm if the sparsity is high and is switched to the classical linear minimum mean square error estimator when the sparsity is lower than the threshold.
To efficiently handle the massive connectivity, the MAC layer coded slotted ALOHA [63] and the PHY layer CS-MUD are combined in [49,64]. The CS-MUD exploits the sparsity at the physical layer to jointly detect the activity and data while the coded slotted ALOHA is capable of resolving the collisions by exploiting the successive interfer-ence cancellation. The joint activity and data detection at the physical layer facilitates the efficient collision resolution and increases the user density at the MAC layer.
In [48], a large-scale spatial modulation multipleinput-multiple-output system is proposed to reduce the radio frequency chains and hence reduce the hardware cost and power consumption. Each user selects one antenna element out of all the antenna elements used for data transmission. The MUD at the base station is posed as underdetermined signal detection problem, and CS-MUD is used for MUD.

Performance Comparison of CSMUD Schemes
The prominent CS-based MUD schemes are compared in terms of detection error rate (DER) and overall bit error rate (BER) in Figure 14, taking the single sequence CSMUD at activity probability, P a = 0:1, and overloading factor, λ = 4, as the baseline. From the figure, it is evident that the spreading diversity introduced by using E-CSMUD and SB-CSMUD has reduced the DER. As the BER is dependent on the successful detection of activity, therefore, the BER has also improved. The effect of increasing the activity probability from P a = 0:05 to P a = 0:1 on the BER performance is illustrated for different CS-based MUD schemes in Figure 15. For P a = 0:05, the performance of the enhanced-CSMUD (E-CSMUD), multisequence-based CSMUD (MS-CSMUD), and CS-based MPA (CS-MPA) is significantly better than the baseline CSMUD. The improvement in performance for E-CSMUD and MS-CSMUD comes from averaging the correlation over multiple sequences while for the CS-MPA, it comes from the use of separate activity detection by GOMP and data detection by MPA. The MS-CSMUD performs slightly better than the E-CSMUD because of the multiple spreading sequences used by a single user. When the activity probability is increased to P a = 0:1, a performance degradation is observed due to higher multiple access interference; however, the rate of performance degradation varies for different schemes. The baseline CSMUD suffers a performance loss of 25% due to the increase in correlations between the sequences. The rate of performance degradation of E-CSMUD is less than that of the baseline due to the averaging of correlation over two spreading sequences. The performance degradation rate in MS-CSMUD is higher than that of the E-CSMUD because in MS-CSMUD, the users retrieve the spreading sequences randomly from a pool of

13
Wireless Communications and Mobile Computing sequences; therefore, with an increase in the number of active nodes, the probability of collision increases, which increases the bit error rate in MS-CSMUD. Moreover, in MS-CSMUD, the users are identified by decoding the user IDs that are included within the data frame; the probability of wrong detection increases with an increase in the activity probability. The CS-MPA performs well in low activity probability; however, for higher number of active nodes, two sources contribute in performance degradation. Firstly, the correlation between spreading sequences increases, and secondly, the multiuser interference at the nonzero chips of the spreading sequences increases.
The performance improvement in the variants of CSMUD schemes comes at the cost of higher computation complexity. Figure 16 compares the computational complexity of various CSMUD schemes. The dominant operation in CS decoding algorithms, e.g., GOMP, is the pseudoinverse of the sensing matrix. The E-CSMUD uses two sequences per user due to which the complexity of E-CSMUD is twice as that of the baseline. Similarly, for group size of four, the MS-CSMUD uses four sequences for each user, which makes its complexity four times as the baseline. Moreover, the decoding of the frame ID also contributes in increasing the complexity of MS-CSMUD. The CS-MPA is a two-stage decoding technique with the highest complexity amongst the CS-based MUD schemes. Besides the complexity of GOMP, the major complexity in CS-MPA comes from the implementation of a separate MPA for data detection, whose complexity increases exponentially with the number of superimposed users over a single chip of the spreading sequence. In Figure 16, it is shown that as the activity probability increases from 0.05 to 0.1, the complexity of the CS-MPA increases significantly. Owing to the increase in the number of active nodes, the complexity of CSMUD, E-CSMUD, and MS-CSMUD also increases for activity probability of 0.1.

Conclusion and Future Work
This survey article summarizes all the relevant algorithms and schemes of NOMA based on CS-MUD which is a promising solution for massive connectivity in mMTC. CS-MUD leverages the sporadic nature of mMTC and the theoretical foundation of compressive sensing techniques to provide an elegant way to tackle the NOMA problem. NOMA with CS-MUD significantly simplifies the encoding process and enables the request-grant-free medium access scheme for the resource-constrained sensor nodes. Furthermore, it successfully shifts the computation load and complexity to the BS. However, there is a trade-off between the detection error and computation complexity in the state-of-art CS-MUD algorithms. With the total number of sensor nodes growing and consequently the increase of the measurement matrix, the computation complexity of multiuser detection will increase dramatically which calls for more efficient CSMUD algorithms. Moreover, the traffic pattern of the mMTC nodes can be exploited to reduce the set of possibly active nodes, which will consequently reduce the computational complexity.
The performance of CS-based multiple access schemes is by and large determined by the correlation of the nonorthogonal signatures, i.e., the spreading sequences. The higher the correlation between the signatures, the higher will be the error rate. Therefore, it is desirable to invent new spreading sequences with low cross-correlation. Moreover, the signature design in case of SB-CSMUD and MS-CSMUD, from the sensing matrix, can be optimized to reduce the correlation. Furthermore, it is also noted that the maximum detection error occurs when two nodes having maximally correlated signatures are active simultaneously; therefore, based on the activity probability, the signature allocation can be optimized such that less probable simultaneously active nodes are assigned with highly correlated signatures.
Another future direction could be the combination of CS-MUD with other NOMA techniques which will create new opportunities to utilize the advantages of different schemes and further improve the overloading and radio resource sharing for mMTC in the 5G communication system. For instance, the combination of power domain NOMA and SB-CSMUD can be investigated, which has the potential to increase the overloading capability of the CSMUD-based NOMA schemes.

Conflicts of Interest
The authors declare that they have no conflicts of interest.