Superresolution 2D DOA Estimation for a Rectangular Array via Reweighted Decoupled Atomic Norm Minimization

,


Introduction
The problem of two-dimensional (2D) direction of arrival (DOA) estimation is encountered in various signal processing applications, such as 2D angle estimation using 2D arrays in multiple-input and multiple-output (MIMO) communication systems.Although numerous algorithms for this purpose have emerged in the past three decades [1][2][3][4][5], there is still an urgent need for fast and superresolution algorithms that can utilize bits of snapshots, especially for massive systems.
Conventional superresolution algorithms such as MUSIC and ESPRIT [6][7][8][9] are based on subspace techniques.Considering that these methods either require a heavy computational load for the 2D spectrum search or are suitable only for arrays that exhibit geometric invariance based on a number of samples that is larger than the number of sensors, we instead focus on algorithms that rely on the popular theory of compressed sensing (CS), which makes it possible to use a smaller number of snapshots, or even a single snapshot, to estimate the DOA, even when the sources are correlated.However, another challenge arises due to grid mismatch [10,11].
Recently, the mathematical concept of continuous compressed sensing (CCS) was introduced by Candès and Fernandes-Granda [12].CCS no longer follows traditional grid-based models, which require linear independence among the grids, but rather relies on a superresolution concept based on the total variation norm; thus, it fundamentally avoids the problem of grid mismatch and further facilitates the application of CS in parameter estimation.Subsequently, Chandrasekaran et al. [13] and Tang et al. [14] developed an atomic norm metric and presented an atomic norm minimization (ANM) problem in the positive semidefinite (PSD) form utilizing the Vandermonde structure of the atoms.The theory was further developed for the estimation of one-dimensional (1D) parameters by Yang et al. [15], who also proposed the reweighted atomic norm minimization (RAM) algorithm to enhance the resolution [16].In addition, the metric was extended for higher-dimensional parameter estimation by Xu et al. [17] and the authors [18,19].However, the computational complexity is enormous even for 2D applications.To combat this issue, Tian et al. [20] presented a decoupled atomic norm minimization (DAM) algorithm in which the two-level Toeplitz matrix is divided into two onelevel matrices, thereby improving the calculation efficiency.However, the resolution is limited because the minimum angular interval is inversely proportional to the number of rows or columns in the array [12,19,20].
To address the above issues, we propose a low-complexity superresolution optimization algorithm for 2D DOA estimation with a rectangular array.In essence, the proposed algorithm adopts an extended atomic ℓ 0 norm that avoids the constraint on the angular intervals, and we argue that the problem can be translated into a decoupled rank minimization problem in the PSD form, whereafter since the rank minimization problem is difficult to solve directly, a novel sparse surrogate for it is presented, inspired by the works [16,21].Then, the new concave objective function can be iteratively optimized by the well-known majorizationminimization (MM) method.Our algorithm eventually takes the form of a weighted DAM problem in each iteration; thus, we name it reweighted decoupled atomic norm minimization (RDAM).A series of proofs are presented in this paper to illustrate the rationality of the proposed method, and corresponding simulations are also provided to show its validity.Our proposed algorithm enhances the resolution to a great extent without increasing the computational burden, and it requires neither prior information on the number of sources nor the noise level which can be obtained in the iterative process.In addition, our method is more robust to noise than the DAM algorithm due to its iterative nature.

Signal Model
Suppose that  far-field narrowband sources {  ()}  =1 are impinging on a uniform rectangular array (URA) composed of  ×  sensors, where  and  are the numbers of elements in the  and  directions, respectively.For the sake of subsequent discussion, we adopt a decoupled observation model for the array output [22].As shown in Figure 1, we redefine the azimuth  of the received signal as the angle between the signal and the  plane instead of using the traditional definition; the definition of the elevation  remains unchanged.Correspondingly, the single-snapshot array observation model without noise is where  [ 1 ,  2 , . . .,   ]  and a  (  ) = [ 1 ,  2 , . . .,   ]  are the 1D steering vectors in the  and  directions, respectively.Suppose that the array spacing is set to half the wavelength; then, we have   = exp(−2 ⋅ 0.5( − 1) sin   ) and   = exp(2 ⋅ 0.5( − 1) sin   ).Note that   is the conjugate of the regular vector.
Our goal in this paper is to estimate   and   from the observed data X  , and we assume that the following conditions hold: (1) a  (  ) and a  (  ) have the Vandermonde structure.
The first two conditions, which obviously hold for array models, are the basic requirements for the algorithm subsequently proposed in this paper, and the last one guarantees that the solution of our proposed algorithm is exact with a probability of at least 1 − , where  is a very small value [14,19,23].

Decoupled Rank Minimization
Problem.The atomic ℓ 0 norm was proposed in [14] as a measure of the sparsest number of spikes with unknown positions that could compose the observed signal in the frequency domain.In this paper, we employ this norm to express the number of sources with unknown directions composing the array output in the spatial domain and implement 2D DOA estimation by optimizing the atomic representation.More precisely, we merge the signal phase information into the steering vectors and redefine them as follows: Regarding a  (  , (1/2)  )a   (  , (1/2)  ) as an atom, we define the atom set B as Then, the atomic ℓ 0 norm induced by B is given by This is an optimization problem with the objective of finding the fewest possible number of sources given the observed data X  , which can be represented as a positive combination of the atoms {A(  ,   ,   )}.Inspired by the idea proposed in [14] and previous research on rank minimization [24][25][26], we will demonstrate that the above norm can be represented as the sum of the ranks of two 1D Toeplitz matrices satisfying a PSD constraint and propose a decoupled rank minimization problem as depicted by the following proposition.
Proposition 1.Let X  ∈  × be an array observed data that has the following atomic form and satisfies the conditions given in the previous section: The corresponding atomic ℓ 0 norm minimization problem can be expressed as Then, (7) is equivalent to the optimum value of the following decoupled rank minimization problem: where (u  ) and (u  ) denote two 1D Toeplitz matrices and u  and u  are their first rows, respectively.
On one side, we will show that (1/2) ‰ 1 + (1/2) ‰ 2 ≤ K ‰ given problem (7).Suppose that ‖X‖ B,0 = K ‰ < min{, } can be realized in the arbitrary atomic decomposed form and Thus, we obtain the following PSD matrix: i.e., the relational expressions On the other side, given problem (8), we will try to verify that and  ‰ 2 <  (the full-rank case is trivial).Given the PSD condition, and letting u ‰  and u ‰  compose the optimal solution to (8), the following can be derived in accordance with the Schur Complement Lemma [27,28]: In addition, note that (u ‰  ) and (u ‰  ) have Vandermonde decompositions of the forms (u ‰  ) =        and (u ‰  ) =        [14], where and both   (with dimensions of are diagonal matrices with positive real elements.Therefore, (12) implies that X lies in the range of   , which means that X can be composed of no more than  ‰ 1 atoms; i.e., K ‰ ≤  ‰ 1 .Similarly, (13) leads to the conclusion that Based on the above proposition, we can straightforwardly optimize the decoupled rank minimization problem (8) under the PSD constraint to implement the optimization of (7).It is evident from the above proof that   and   are encoded in the Toeplitz matrices (u ‰  ) and (u ‰  ), respectively.Once the optimal solution (u ‰  , u ‰  , X ‰ ) is obtained,   and   can be determined from the Vandermonde decompositions of these two matrices; there are many off-theshelf computational methods such as the dual root-finding method [12,14,15] and the matrix pencil method [29,30].In addition, supposing   and   are acquired, the simple and straightforward method of angle pairing provided in [20] can be applied.Therefore, our task is simply to optimize problem (8) for (u ‰  , u ‰  , X ‰ ).

A Novel Sparse Surrogate for Rank Minimization.
The essential objective of the decoupled rank minimization problem (8) discussed above is to recover two low-rank matrices within the PSD feasible set.Unfortunately, due to the difficulty of solving this problem (which is similar to that of solving the original optimization problem), there is an urgent need to seek new metrics to express the objective function.
The main idea is to find one simpler surrogate function that not only can effectively simulate the ranks of the two Toeplitz matrices but also can simultaneously encourage a low-rank solution.In most papers that discuss the rank minimization problem [16,27,31,32], the log-det function  1 ((u)), which is expressed as follows, is generally used: However, in this paper, we consider a novel concave surrogate function for (u) ∈ C Q×Q , given by where  represents the identity matrix.The regularization parameter  2 > 0 enables the function  2 ((u)) to approximate rank((u)) as it varies and prevents the function from  taking a value of −∞ when (u) is rank-deficient.Moreover, note that the parameter  1 does the same job with  1 ((u)).
From the properties of matrix eigenvalues and trace, it follows that eig (( (u) +  2 ) and where {  }  =1 is the set of eigenvalues of (u).Equation ( 17) implies that the essence of function  2 ((u)) is to operate on {  }  =1 ; thus, to better understand the rank-approximation property of  2 ((u)) on a PSD definition domain, we denote a new subfunction by  2 (  ) = −(  / 2 + 1) −1 + 1 for any eigenvalue   ≥ 0, which is an inverse function obtained through a certain translation and scaling process and has the features of monotonicity and concavity.Note that the constant term 1 is added merely for consistency with the ℓ 0 norm; it is equivalent to adding an identity matrix to  2 ((u)) and does not influence the outcome.Obviously, we have the identity  2 (0) = 0; at   > 0 and near the zero point,  2 (  ) has an increasingly steep slope of 1/ 2 as  2 → 0, which equivalently imposes a large penalty on near-zero   and ensures that the values of  2 (  ) at these   become much smaller.In addition,  2 (  ) tends towards a constant value of 1 for a sufficiently large   .The above trend illustrates that as  2 → 0,  2 (  ) favours the separation of   at different levels, thus becoming a relaxation of the ℓ 0 norm ‖  ‖ ℓ 0 , which has an infinite slope near the zero point and a constant value of 1 except at the zero point.To provide an intuitive illustration, we plot  2 (  ) along different  2 's at   ≥ 0 in Figure 2.
Meanwhile, the curves of the ℓ 0 norm, the ℓ 1 norm, and the log-det subfunction represented by  1 (  ) = ln(  / 1 + 1) are provided for comparison.Suitable proportionality constants are set to ensure that  2 (  ) and  1 (  ) are zero at   = 0 and have a constant value of 1 at   = 1 without modifying their properties.As shown in Figure 2, both  2 and  1 approach the ℓ 0 norm as the regularization parameters decrease; the smaller  2 is, the steeper the slope is, which offers a basis for the selection of  2 during both initialization and iteration, as discussed in Section 3.3.Moreover, the function  2 is more likely to perform better since it has a much faster speed of approximation; i.e.,  2 is deemed to be more sparsityenhancing.
The approximate relationship between  2 (  ) and ‖  ‖ ℓ 0 provides an explanation for why rank((u)) can be relaxed by  2 ((u)).In particular, since each larger   corresponds to a value of  2 (  ) that tends towards a constant value of 1 as  2 → 0 and the addition operation does not affect the monotonicity and concavity of a function, the sum of all  2 (  ) becomes a relaxation of the number of larger eigenvalues, which is considered to be the rank of the Hermitian matrix (u) or the number of sources; i.e., the sparse surrogate  2 ((u)) can be regarded as a relaxation of rank((u)).Furthermore, the proposed function  2 ((u)) leads to better performance than  1 ((u)) in encouraging low-rank solutions, as seen by comparing the curves for  2 with the curves for  1 ; stronger interpretations will be presented in later subsections.

RDAM via the Majorization-Minimization (MM) Algorithm.
Using the surrogate function  2 ((u)), the decoupled rank minimization problem (8) can be relaxed as follows: Here, the parameters   and   are deemed to be mutually independent.In this subsection, we aim to minimize the objective function in (18), denoted by ((u  ), (u  )), on the PSD cone (u  , u  ).Considering that ((u  ), (u  )) is a sum of two concave functions, the MM algorithm [33,34] can be adopted to obtain a local optimum.This algorithm follows an iterative procedure consisting of two steps, i.e., majorization and minimization, in each iteration.Given an initial point {(u ,0 ), (u ,0 )} ∈ (u  , u  ), a series of feasible points {(u ,j ), (u ,j )} ∈N are generated.At each feasible point, the majorization step requires the construction of a tractable function majorizing ((u  ), (u  )) while ensuring that its value at the current point {(u ,j ), (u ,j )} is equal to ((u , ), (u , )).Here, we use the first-order Taylor expansion for this purpose.Since both terms of ((u  ), (u  )) possess the same structure, we consider only the first term as an example, for which we have the following first-order approximate expression: where  , is a constant.
Then, in the minimization step, the following minimization problem is solved, ignoring the constant   , to update (u ,+1 ) and (u ,+1 ): where the optimization problem is a semidefinite programming (SDP) that can be solved in polynomial time.Notice that this problem is equivalent to a weighted DAM problem in the PSD form [20] i.e., {((u , ), (u , ))} ∈N is a nonincreasing sequence, implying that the iterative process makes the objective function ((u  ), (u  )) repeatedly smaller, and it can be expected to converge to a local minimum ((u ‰  ), (u ‰  )), as proven in detail in [21,33].Furthermore, since the algorithm is locally convergent, suitable initial values are indispensable for guaranteeing convergence to the desired local optimum.One immediate possibility is to use the DAM solution for initialization.In particular, let W ,0 and W ,0 be identity matrices; then, we can obtain the initial values (u ,1 ) and (u ,1 ) by solving a DAM problem.Next, let us emphasize the selection of  ,1 and  ,1 , both of which influence the outcome.As argued in Section 3.2, ((u  ), (u  )) is the sum of two subfunctions, denoted by  2 ( , ) = −( , /  + 1) −1 and  2 ( , ) = −( , /  + 1) −1 , for  and , respectively.For consistency with the curves in Figure 2, we set   =  ,max and   =  ,max .Then, both the factors  and  can be viewed as equivalent to the parameter  2 in this figure, and we have  , / ,max ∈ [0, 1] and  , / ,max ∈ [0, 1].Now, let us consider the choice of  as an example.Let larger eigenvalues, corresponding to the signals, be denoted by  , , while smaller eigenvalues, corresponding to the noise, are denoted by  , ; thus, we should be able to ensure a relatively large difference in the values of  2 ( , ) between the points  , / ,max and  , / ,max by tuning the initial factor .For example, for points  , / ,max and  , / ,max that are close to each other, we can select a small  value, such as 0.1 or smaller; then, there will be a sharp slope such that the subfunction values will be distinctly separated.In contrast, for the relatively distant points, a large  value, such as 1, will be sufficient to achieve good estimation performance.The same analysis applies for the initialization of the factor .
Finally, it is instructive to explore the potential implications of the RDAM algorithm in combination with the atomic norm [13,14].Notice that the concept of the 1D weighted atom set proposed in [16] by Yang et al. can be extended to a new 2D weighted atom set B  as follows: where {  (),  *  ()} denotes a pair of nonnegative weights on the 2D space angles.Accordingly, the weighted atom norm is given by where This implies that iteratively minimizing the weighted atomic norm ‖X‖ B  induced by the 2D weighted atom set B  serves the same purpose as the proposed RDAM algorithm; i.e., RDAM can also be called decoupled reweighted atom norm minimization (DRAM).More significantly, our proposed trace function  2 ((u)) gives rise to quadratic weighting factors, which greatly accelerates the speed of local convergence and further enhances the preference for low-rank solutions in (18) compared with the first-order factors generated by the log-det function  1 ((u)) [16].

Another Interpretation from the Perspective of Conventional Direction of Arrival (DOA) Estimation Methods.
The proposed RDAM algorithm can also be interpreted from the perspective of traditional DOA estimation methods.Suppose that the signals are independent of each other and that all steering vectors have been normalized, without exception.Then, we can redefine two decoupled covariance matrices for the noiseless observation data as follows: From the poof of Proposition 1, we know that the Toeplitz matrices (u  ) and (u  ) are exactly equivalent to the two decoupled covariance matrices above except that the coefficients include a squared difference.The whole RDAM process, therefore, can be viewed as the process of recovering R and R with the regulators   and   .Furthermore, the two atomic weight values in (24) are related to the Capon algorithm; in particular,  −2  () and  −2  () happen to be the 1D modified cost function of the Capon algorithm.In other words, we utilize the inverse of the revised power spectrum to weight the atoms, with the aim of penalizing directions without sources.Moreover, note that the atomic weight values produced by the log-det function  1 ((u)) correspond to the conventional cost function; this comparison, on the other hand, shows that our surrogate  2 ((u)) performs better.
Remarkably, however, our algorithm demonstrates superiority beyond 2D DOA estimation with respect to conventional subspace methods such as MUSIC and ESPRIT.The 2D MUSIC method not only requires prior information on the number of sources but also incurs a heavy computational load for the 2D spectrum search; by contrast, the proposed algorithm can directly estimate continuousvalued angles through two decompositions, and it merely needs prior knowledge of the noise level (moreover, even this is unnecessary, as shown by a later analysis).Meanwhile, the 2D ESPRIT method is suitable only for an array with the property of so-called geometric invariance, whereas the proposed algorithm can be applied to a sparse rectangular array (SRA) with missing array outputs or only a few elements of a URA and still achieves better estimation performance under certain conditions [12,15,19,35].In addition, far fewer snapshots than the array size are required for exact estimation, whereas this is not true of the subspace methods.

Further Analysis
4.1.Superresolution.The DAM algorithm requires the following angular interval conditions [20] to guarantee that directions can be estimated from the observed data X * with probability 1 − , where  is the estimation precision.
Here, Δ  and Δ  are the wrap-around distances on the complex unit circle.These conditions mean that the resolutions in terms of  and  are inversely proportional to the numbers of sensors by row and column, respectively.By contrast, the proposed algorithm (RDAM) has no such restriction on the angular intervals.As the analysis in Section 3 shows, the quadratic inverse of the solution from the previous iteration, denoted by W , =   ((u , )+  ) −2 and W , =   ((u , )+   ) −2 , is utilized as the weighting factors for the objective function (21) in the current iteration; thus, large weights are applied to small eigenvalues, while small weights are applied to large eigenvalues.Therefore, minimizing ( 21) is equivalent to penalizing small eigenvalues, making them much smaller while making large ones relatively larger.Thus, it becomes far easier to pick out the eigenvalues corresponding to signals.
In other words, the iterative nature of the algorithm allows it to overcome the limits of the minimal angular interval, and, hence, the resolution is greatly improved.In addition, the simulations presented in the next section similarly indicate that, given a proper initialization, our algorithm yields a very high resolution.

Computational Complexity.
Another merit of RDAM is its low complexity.As mentioned in [18], the vector-based atomic norm minimization (vector-ANM) method has a computational complexity as high as (() 3.5 log(1/)) due to its PSD constraint of size ( + 1) × ( + 1).However, in the DAM method [20], the two-level Toeplitz matrix is divided into two one-level Toeplitz matrices, thus reducing the computational complexity to (( + ) 3.5 log(1/)).Fortunately, our proposed RDAM algorithm has the same complexity as DAM in a single iteration; thus the computational efficiency depends merely on the number of iterations.Since such a reweighted iterative algorithm can converge within a few steps [21], the computational cost of RDAM is no greater than  ⋅ (( + ) 3.5 log(1/)), where  denotes the number of iterations; i.e., the complexity is comparable to that of DAM but far below the heavy load of vector-ANM.An empirical run-time comparison is also provided based on the simulations reported below.

The Noisy Case.
In fact, DOA estimation is typically implemented in a noisy environment; therefore, we consider the following observation model with additive noise N, which involves independent entries drawn from N(0,  2 ): Then, an iterative denoising optimization problem is obtained as follows: where  is a regularization factor that keeps a balance between data fitting and a low-rank solution.When  ≥ E‖‖ 2 , where ‖⋅‖ 2 represents the spectral norm, the soft threshold denoising algorithm has the following asymptotic mean square error (MSE) [36]: In this paper, we directly optimize the dual problem [37] of (32) as follows: where  ≥ ‖Y  − X‖  .In practice, we can set an initial value for  and then gradually decrease it by a suitable small factor in each iteration; as shown by the simulations reported below, this approach results in better performance.and enables a faster convergence speed.Accordingly, let  = 3 and  =  = 10, and consider randomly selected signal directions of (34.35 ∘ , 67.03 ∘ ), (38.09 ∘ , 76.04 ∘ ), and (44.12 ∘ , 70.56 ∘ ).Initial factors of 0.01 and 0.1 are selected for the functions  1 ((u)) and  2 ((u)), respectively; under these conditions, the subfunctions have similar approximation properties, as seen from Figure 2. The iterative processes using the two surrogate functions are shown in Figure 5, where the first two subplots in both (a) and (b) present the changes in the eigenvalues for  and , respectively, and the third subplot shows the estimation performance in each iteration.It is obvious that the previous function  1 ((u)) requires four iterations wihle the proposed function  2 ((u)) needs only three iterations, which indicates that the proposed function has a faster estimation speed.Finally, we carry out 200 Monte Carlo experiments considering the case of very closely spaced sources to illustrate the antinoise performance of the proposed algorithm relative to the Cramér-Rao bound (CRB).The array response with i.i.d.additive Gaussian noise satisfying N(0,  2 ) was obtained; the signal-to-noise ratio (SNR) was 10 log  −2 since the powers of all signals were set to a constant value of 1.As before, we set  =  = 10, and the directions of the two sources were set to (10.10 ∘ , 40.01 ∘ ) and (13.56 ∘ , 43.12 ∘ ).

Mathematical Problems in Engineering
Figure 6 shows a graph of the average root mean square error (RMSE) with respect to the SNR with the parameters set to  =  = 1 and the initial noise level set to √  2 .It is worth noting that the noise level was reduced by a factor of 0.8 in the first weighted iteration for both our RDAM algorithm and the vector-RAM algorithm.Obviously, the performance of the proposed algorithm is close to the CRB at higher SNRs, although its antinoise performance may be worse below SNR=15dB; the main reason for the performance degradation is that, at lower SNRs, the probability of efficient estimation decreases so that a very small amount of deteriorating estimates is obtained, which influences the performance of the curve.However, our algorithm still shows more robust performance than the others in the absence of noise.

Conclusion
In this paper, we proposed a superresolution decoupled reweighted atomic norm minimization (RDAM) algorithm for 2D DOA estimation based on the popular theory of continuous compressed sensing (CCS).By directly employing the atomic ℓ 0 norm and converting the optimization problem into a decoupled rank minimization problem, we ensured that the computational complexity is greatly reduced.On the other side, we presented a novel surrogate function and formulated a new concave optimization problem, overcoming the shortcomings of existing methods in terms of resolution and also accelerating the convergence speed.Our algorithm is even suitable for application to sparse rectangular array (SRA), although there is a need for better denoising processing, which remains to be studied in future work.

Figure 2 :
Figure 2: An illustration of the sparsity-enhancing properties of p1 and the proposed p2, compared with the ℓ 0 norm and ℓ 1 norm.