A Greedy Multistage Convex Relaxation Algorithm Applied to Structured Group Sparse Reconstruction Problems Based on Iterative Support Detection

l 1 -regularized models as expected. In this paper we make a further use of the prior grouping information as well as possibly other prior information by considering a weighted l 2,1 model. Specifically, we propose a multistage convex relaxation procedure to alternatively estimate weights and solve the resulted weighted problem. The procedure of estimating weights makes better use of the prior grouping information and is implemented based on the iterative support detection (Wang and Yin, 2010). Comprehensive numerical experiments show that our approach brings significant recovery enhancements compared with the plain l 2,1 model, solved via the alternating direction method (ADM) (Deng et al., 2013), either in noiseless or in noisy environments.


Group Sparse Reconstruction and Related Work.
It is becoming a hot research topic to find sparse solutions of underdetermined linear systems in the last few years in various fields, for example, compressive sensing (CS), signal processing, statistics, and machine learning [1], for example, multiple kernel learning [2], microarray data analysis [3], and channel estimations in doubly dispersive multicarrier systems [4]. For example, in machine learning, the high dimensionality poses significant challenges for us to build interpretable models with high prediction accuracy, and many sparsity related regularization techniques have been commonly utilized to obtain more stable and interpretable models. In compressive sensing, the sparsity regularization allows us to reconstruct high dimensional data with only a small number of samples. However, recent studies encourage us to go beyond sparsity to further enhance the recoverability, that is, taking into account additional information about the underlying structure of the solutions. As an important case, lots of solutions are known to have certain groups sparsity structure, and more precisely, not only do they have a natural grouping division of its components, but also the components within a group are likely to be all nonzeros or all zeros. Thus encoding the group sparsity structure will reduce the degrees of freedom in the solution, resulting in better recovery performance.
In this paper, we focus on the above group sparsity and corresponding commonly used ℓ 2,1 regularized model and try to extend it for better recovery performance. Assume that ∈ R denotes an unknown group sparse solution. Let { ∈ R : = 1, 2, . . . , } be the grouping division of , where ⊆ {1, 2, . . . , } is an index set corresponding to the th group, and denotes the subvector of indexed by , which is predefined based on the prior information of the underlying solution generally. The mixed ℓ 2,1 norm is defined as follows: 2

Mathematical Problems in Engineering
Compared with the classical using of ℓ 1 -regularization for sparse reconstruction, ℓ 2,1 -regularization will take the grouping information into consideration and facilitate group sparsity. Notice that the resulted problem is convex, and this mixed ℓ 2,1 optimization problem (1) is commonly solved through several efficient first-order algorithms proposed in the literature, for example, spectral projected gradient method (SPGL1) [5], accelerated gradient method (SLEP) [6], block-coordinated descent algorithms [7], and SpaRSA [8].
Among them, the authors proposed the alternating direction method (ADM) to solve the primal and dual formulation of ℓ 2,1 regularized optimization problems in [9]. Their preliminary numerical results have shown that the ADM algorithms are fast, stable, and robust, outperforming the previously known state-of-art algorithms.
1.2. Weighted ℓ 2,1 -Norm Regularized Group Sparse Reconstruction Problem. In this paper, for better reconstruction quality, instead of considering the ℓ 2,1 -norm, we consider a more general formulation, that is, the weighted ℓ 2,1 (or ℓ w,2,1 )-norm [9] defined as follows: where > 0 ( = 1, 2, . . . , ) are the corresponding weights associated with each group, consisting of a weighting vector w = [ 1 , 2 , . . . , ]. We assume that the groups { : = 1, 2, . . . , } form a division of unless otherwise specified. We emphasize that the weighted ℓ 2,1 -norm can also be extended to the more general group configurations, for example, overlapping and/or incomplete cover group sparsity. As a nonconvex model, (2) is well known to behavior better than the unweighted counterpart with appropriate settings of weights (1). The key is how to determine the weights and it constitutes the main contribution as summarized in Section 1.3.
We will consider several weighted models in this paper. The following basis pursuit (BP) model is considered when the measurement vector do not contain noise: where ∈ R × ( < ) and ∈ R . Without loss of generality, we suppose has full rank. The basis pursuit denoising (BPDN) models are commonly employed when the measurement vector contains noise, including the following constrained form: min‖ ‖ w,2,1 s.t. ‖ − ‖ 2 ≤ , (4) and the unconstrained form min‖ ‖ w,2,1 + 1 2 where ≥ 0 and > 0 are penalty parameters, respectively. It should be noted that the constrained form is equivalent to unconstrained form from the viewpoint of optimization theory when the parameters and are properly selected.
In this paper, we will focus on the basis pursuit model, and the extension of our proposed algorithms for the basis pursuit denoising models (4) and (5) follows similarly. Furthermore, it should be pointed out that the basis pursuit model (3) can also be good for reconstructing noisy data if the iterations are stopped properly prior to convergence based on the noise level.

Contributions.
In this paper, we propose an effective way to determine the weights of the above weighed models, by extending the iterative support detection (ISD) proposed in [10], from sparsity to group sparsity, which is a special case of structure sparsity [11][12][13]. In other words, for the weighted ℓ 2,1 reconstruction problem in this paper, based on ISD, we obtain the final result via multistage convex relaxation process [14], which consists of solving a series of weighted ℓ 2,1 (or referred to as ℓ w,2,1 ) convex optimization problems, where weights have been estimated via the support detection applied to the reconstructed signal of the previous stage. The solution of multistage weighted ℓ 2,1 model is usually better than the traditional ℓ ⃗ 1,2,1 model in terms of the relative error and reconstruction quality, from both the theoretical and practical points of view, and numerical results demonstrate that our proposed algorithm can recover a satisfying result from a failed reconstruction of ℓ ⃗ 1,2,1 model in our cases. In addition, we empirically demonstrate that, in the cases of grouping sparsity, the previous requirements of the fast decaying property in the common plain sparse signal recovery are not necessary any longer, for the threshold-ISD proposed in [10], and we will give some intuitive explanation.

1.4.
Organization. The rest of this paper is organized as follows. In Section 2, we present our algorithmic framework. In Section 3, we provide comprehensive numerical experiments to evaluate the performance of our proposed algorithm for group sparse signal reconstruction and compare it with the ADM approach proposed in [9]. We end this paper with some conclusions and discussions on some possible future work.

Algorithmic Framework
In this section, without loss of generality, we suppose that the grouping is a partition of the solution, which is predefined as a prior knowledge, and it can be easily extended to the general group configurations, for example, overlapping and incomplete cover grouping cases.
The main difficulty of setting the weights is that we do not usually know the true solution and that even when we have some knowledge about the true solution, we still need to find a proper way to make use of it to help obtain a better solution. The iterative support detection is an effective way to deal with the above difficulty and we will generalize this idea to the cases of group sparsity. So we first review the iterative support detection.

Revisiting and Some New Thoughts of Iterative Support
Detection. Iterative support detection was first proposed in Mathematical Problems in Engineering 3 our early work [10], and the idea of exploiting partially support detection arises in several subsequent literatures, for example, [15][16][17][18][19][20]. We first briefly review the iterative support detection in compressive sensing in terms of the single sparse signal reconstruction [10]. In addition, we give some novel analysis of the advantages of 0-1 weighting scheme adopted in ISD compared with other weighting alternatives [21,22]. Compressive sensing (CS) [23,24] reconstructs a sparse signal from a small set of linear projections. Let denote a -sparse signal, let ∈ R × be the measurement matrix, and = represent the linear projections of . The general optimization model is the basis pursuit (BP) problem: ISD alternatively calls its two components: support detection and signal reconstruction. Support detection identifies an index set from an incorrect reconstruction, which contains some elements of supp( ) = { : ̸ = 0}. After acquiring the support detection, a resulting truncated BP problem is considered: where = and ‖ ‖ 1 = ∑ ∉ | |. Let be the true sparse signal. If the support detection = supp( ), then the solution of (7) is, of course, equal to true . But we should point out that even if contains enough, not necessarily all, entries of supp( ), a better solution can still be expected. When does have enough supp( ), those entries of supp( ) in will help (7) return a better solution in comparison to (6), and from this better solution, support detection will be able to identify more entries in supp( ) and then yield an even better . In this way, the two components of ISD work together to gradually recover supp( ) and improve the reconstruction performance. It is clear that ISD is a multistage procedure.
ISD requires reliable true support detection from inexact reconstruction, which can be obtained by taking advantages of the features and prior information about the original true signal [10,25]. For example, for the sparse or compressible signals with components having a fast decaying distribution of nonzeros in [10]. One can perform the support detection by thresholding the solution of (7), and the corresponding ISD implementation is denoted as threshold-ISD.
In this paper, we would like to present some further analysis for ISD. ISD adopts a specific 0-1 weighting scheme and it has several advantages. First, its performance over the single stage pure ℓ 1 model has been proved rigorously in [10] once we can detect correct partial support information (by thresholding, e.g., in this paper), while most of the related alternatives such as the reweighted ℓ 1 algorithm [21] do not have such rigorous theoretical guarantees. Secondly, for the reweighted ℓ 1 algorithm, the weights are usually set like = 1/(| | + ). In [10], we have pointed out that the tuning parameter > 0 is the key parameter and should be determined carefully. Roughly, should not be a fixed value and should decrease from a large value to a small value. In an extreme case, if is always fixed to 0, then we would not get a better solution for the next stage, even if there is no numerical trouble of dividing 0, because we just passively make use of all the information of the current solution without any filtering out for the inaccurate information. From our analysis, the choice of is like controlling the extraction of the useful information and suppressing the distortion of the recovery noise. Our 0-1 weights of ISD are a more explicit and straightforward way to imply the idea of making use of the correct information (mainly about the locations of components of large magnitude and setting the corresponding weights as 0) and give up the rest of the too noisy information (for those components of small magnitudes, they are mostly overwhelmed by the recovery noise and therefore there is very little meaning to set different weights according to their magnitudes; so it is more reasonable to set the same weights as 1).
We need to point out that the estimation of w based on the support detection is often advantageous. The implementations of support detection can be more flexible and different for specific signals, in order to make use of their different underlying structure, for example, grouping structure, tree structure, or even graph structure.

ISD
Extended to the Weighted ℓ 2,1 Model. The main effort of this paper is to extend the idea of ISD from the plain sparse vector recovery to the group sparsity cases and demonstrate the extraordinary advantages of thresholding support detection in these cases, compared to the plain sparsity cases.
As the original ISD does, our extension is also in general an alternating optimization procedure to decouple the nonlinear combination of w and , by repeatedly applying the following two steps.
Step 1. First we optimize with w fixed (initially ⃗ 1): this is a convex problem in .
Step 2. Second we determine the value of w according to the currently reconstructed . The value of w will be used in Step 1 of the next iteration. The plain ℓ 1,2,1 model, as a single stage process, is solved once generally and treats the solution as the final result. For the weighted ℓ 2,1 model, we will obtain the final result from a multistage process by solving a series of weighted ℓ 2,1 problems. At each stage, the adaptive weights will change according to the newly reconstructed signal. The full procedure and the details of Steps 1 and 2 will be presented in the following sections.

2.3.
Step 1: Solving the Weighted ℓ 2,1 Model Given Weights. Assume the weights are given; [9] proposed the approach for solving the weighted ℓ 2,1 -problem (3), based on the variable splitting technique and the alternating direction method (ADM, for short) [26][27][28][29][30][31][32]. However, how to select proper weights was not presented and they used the uniform weights in their numerical experiments, that is, plain ℓ ⃗ 1,2,1 model. Here, we briefly review their approach, using the nonoverlapping case. But in the numerical experiments, the overlapping group sparsity [33] will also be considered.
The corresponding augmented Lagrangian function of (8) is defined as where 1 ∈ R , 2 ∈ R are multipliers and 1 , 2 > 0 are penalty parameters, respectively. If we start at = and ( 1 , 2 ), the iterative framework of the augmented Lagrangian problem has the following form: The -subproblem of this iterative framework, namely, the minimization of L A ( , +1 , 1 , 2 ) with respect to , is and it can be transformed as an equivalent convex quadratic problem: Note that it can be reduced to solving the linear system based on the optimality condition as follows: Similarly, minimizing with respect to in the iterative framework has the following formula: While stopping criterion is not met Do (1) Update : Compute +1 according to (14) for given ( , 1 , 1 ). (2) Update : Compute +1 by solving (13). (3) Update 1 and 2 : Update +1 1 and +1 2 via (17). (4) = + 1. end while Algorithm 1: Primal-based ADM for group sparsity [9].
Finally, we update the multipliers ( +1 1 , +1 2 ). Note that step lengths can be incorporated to the update of ( +1 1 , +1 2 ); that is, where 1 , 2 > 0 are step lengths. Under certain assumptions, convergence of the ADM framework with step lengths 2.3.2. Applying ADM to the Dual Weighted ℓ 2,1 Model. In this part, we briefly review the ADM approach to the dual form of the weighted ℓ 2,1 model and derive an equally simple yet more efficient algorithm. The dual form of (3) is given as follows: Mathematical Problems in Engineering 5 Similarly, we introduce an auxiliary variable and transform (18) as an equivalent constraint optimization problem: The Lagrangian function of (19) is given by where > 0 is a penalty parameter, and note that ∈ R is a multiplier and essentially the primal variable. If we start at = and = , the iteration framework of the augmented Lagrangian problem is given by The subproblem of this iterative framework can be solved as a linear system according to its optimality condition The subproblem has the form It is easy to see that (24) has a closed-form solution: where P represents a projection (in Euclidean norm) onto a convex set denoted by a subscript and B 2 ≜ { ∈ R : ‖ ‖ 2 ≤ }. Finally, we update the multiplier , that is, the primal variable essentially: where ∈ (0, ( √ 5 + 1)/2) is a step length. Now, we rewrite the ADM iteration scheme for (18) as follows.

2.4.
Step 2: Adaptive Weights Determined Based on Iterative Support Detection. In this section, we will present the way to determine the weights w in Step 2, extending the idea of iterative support detection (ISD) [10] from single sparse vector cases to group sparsity cases.
The support detection based on thresholding in terms of group sparsity is as follows: where = 1, 2, . . . denotes the stage number. The elements of the weighting vector w ( +1) are equal to 0 if the corresponding positions belong to the support detection ( +1) , or 1 otherwise. Before discussing the choice of ( ) , it should be pointed out that support detection sets ( ) are not necessarily increased and nested; that is, ( ) ⊂ ( +1) may not hold for all , because the support detection we get from the current solution may contain wrong detections by ( ) thresholding, and not requiring ( ) to be monotonic leaves the chance for support detection to remove previous wrong detections. This makes ( ) less sensitive to ( ) , thus making the threshold value ( ) easier to choose. The tuning parameter ( ) is a key parameter, and it is not a fixed value but is preferred to decrease from a large value to a small value, which can extract more correct nonzero information from the intermediate reconstruction results as the ISD iteration proceeds. In addition, we have proved that ISD can tolerate certain ratio of wrong support detections and still achieve a better reconstruction in [10].
We set the threshold value as follows: with > 0. An excessively large will result in too many false detections and lead to low solution quality, while an excessively small tends to need a large number of iterations. This rule will be quite effective with an appropriate , and the proper range of is case-dependent [35]. Empirically, the performance of our algorithm is not very sensitive to the choice of in our cases.
Here we would like to point out the difference of the situations between this paper and our pervious work in [10]. While we still use the threshold-ISD [10] in our cases, the fastdecaying property of the unknown signal is not required any more, because the prior information of grouping improves the performance of threshold based support detection. Some simple intuitive explanation is given here. In [10], the components of a sparse signal are considered separately while the known grouping information of the sparse signal is considered in this paper. Therefore, when performing the thresholding via (27), the prior grouping information reduces the freedom of the unknown signal and provides us with better robustness to the recovery errors of the intermediate results than the plain sparse recovery, which is either the arbitrary grouping or only component-wise.

Our Algorithm Framework and Some Further Analysis.
Now, we summarize the algorithm framework of the multistage convex relaxation for the weighted ℓ 2,1 model based on While stopping criterion is not met Do (1) Update : Compute +1 according to (23) for given ( , ). (2) Update : Compute +1 by solving (25).
Given observation and the measurement matrix .
(1) Set the iteration number ← 0 and initialize the set of detected entries ( ) ← 0.
ISD. The algorithm repeatedly performs the two steps mentioned above: support detection to determine the 0-1 weights and solving the resulted weighted ℓ 2,1 utilizing the ADM algorithms. Moreover, we present some new viewpoints of ISD.
Notice that since w (0) = ⃗ 1, the weighted model (3) in Step 2(b) is nothing but the plain ℓ ⃗ 1,2,1 model in iteration 0. The weighted ℓ 2,1 model in 2(b) of Algorithm 3 can be solved by Algorithm 1 or Algorithm 2. The support detection in 2(c) has been introduced in part C of this section. In each iteration, ISD estimates the indices of the nonzero groups using thresholding. Since ISD belongs to the greedy methods and it is also a multistage procedure, the new method is denoted by GM-ADM. Each iteration GM-ADM needs to solve a weighted ℓ 2,1 problem and hence GM-ADM is computationally more demanding, compared to ADM. However, its costing time is not necessarily several times of that of original ℓ ⃗ 1,2,1 model. The reason is due to the adopted warm-starting; that is, the output of the current stage (outside iteration) is employed as the input of the next stage, and then in the next stage, we often just need to run a few inner iterations to obtain a better updated solution. In addition, usually the number of the stages is not necessarily large (no more than 9 empirically). Notice that, for our problems, we are mainly focusing on the reconstruction quality and GM-ADMs are performing much better than ADMs in this aspect, and it is worth the extra computing cost.
ISD can be also considered as a procedure based on mixed soft-threshoding and hard-thresholding. Specifically, The processing of obtaining the solution of the subproblem (15) is actually a soft-thresholding procedure, and we will obtain a group sparse solution via this group-wise shrinkage operation if the weights are all equal to 1. However, this kind of shrinkage has a fatal disadvantage; that is, it shrinks the components of the true nonzero groups as well and reduces the groups' sharpness of the solution. ISD aims not to use the uniform weights in the weighted ℓ 2,1 (ℓ w,2,1 ) model but uses the greedy 0-1 weights. It is easy to see that some components of certain groups will not be shrunk if we believe they are unlikely to be zero groups. In such cases, the weights of these certain groups are set as 0, which corresponds to the hard-thresholding. Thus the process of solving becomes a selective shrinkage procedure, that is, mixed soft-threshoding and hard-thresholding when ISD is applied.

Numerical Experiments
In this section, we show numerical results to evaluate the performance of our proposed GM-ADM approach in comparison with the ADM approach [9] in the case of group sparsity because [9] has made comprehensive comparisons with some other group sparsity methods, and their proposed algorithms are outperforming the previously known stateof-the-art algorithms mostly. The code of ADM (referred to as YALL1-group package) can be downloaded in the website [36]. All the experiments were performed under Windows 7 and MATLAB v7.10.0 (R2010a) running on a desktop with an Intel(R) pentium(R) CPU G640 (2.80 GHz) and 2 GB of memory.

Synthetic Nonoverlapping Group Sparsity Experiment.
We generate the nonoverlapping group sparse solutions as follows: we first randomly divide an -vector into groups and then randomly pick of them as active (nonzero) groups, whose entries are iid random Gaussian or ±1 (Bernoulli signals; instead of randomly dividing -vector into groups, we fix the number of components in each group to be / and randomly pick out of groups as active groups). The purpose of testing the sparse Bernoulli signal is to demonstrate that in, the cases of grouping sparsity, the fast decaying property of the magnitudes of the nonzero components is not required any longer, even in terms of the magnitudes of groups of variables. We use the standard iid Gaussian sensing matrices generated by A = randn(m,n) in MATLAB. We add the Gaussian noise into the observation by noise = randn(m,1), b = b + sigma * norm(b)/norm(noise) * noise in MATLAB. The test sets are summarized in Table 1. It needs to be pointed out that the YALL1-group package not only includes solvers for the constrained model (3) but also includes 6 different approaches for solving the corresponding nonoverlapping group sparse problems, as summarized in Table 2. The continuation technique takes the rules as follows: for the primalbased ADM, that is, PADM3, if Mathematical Problems in Engineering    Table 3: The optimal parameter setting of 6 different primalbased and dual-based ADM approaches that achieve the best reconstruction quality.
we update = × . The ADMs are terminated when one of the following situations are met: That is, the relative change of two consecutive iterations becomes smaller than the tolerance. Or the iteration number reaches the prescribed maximal number, for example, 1000. We use the same inner iteration terminal condition for GM-ADMs, that is, the terminal condition of 2(b) in Algorithm 3. We set = max(0.1 * , 1 − 16) in all of the numerical experiments.
We set the optimal parameters for the 6 different ADMs in Table 3, which can achieve the best reconstruction quality in terms of relative error. These parameter values are mostly borrowed from [9], and the parameters for our proposed 6 different GM-ADMs are set the same as the corresponding ADMs and one can refer to [9] for some guidance. Empirically, these parameters are not very sensitive, and we believe the comparisons are fair under this parameter setting. Here we use the MATLAB-type notation mean(abs(b)) to denote the arithmetic average of the absolute value . For the primalbased ADMs and dual-based ADMs, we set 1 = 2 = 1.618 and = 1.618, respectively. With regard to our proposed GM-ADM, considering that the 2(b) in Algorithm 3 can be solved by the 6 different above-mentioned ADMs, we call them GM-PADM1, GM-PADM2, GM-PADM3, GM-DADM1, GM-DADM2, and GM-DADM3, respectively. The parameters for our proposed 6 different GM-ADMs are set same as the corresponding ADMs. In addition, we set = 5 in (28), and empirically the parameter is not very sensitive, and thus we choose to fix it in our experiments.
As mentioned before, in the cases of grouping sparsity, the fast decaying property of the magnitudes of the nonzero components is not required any longer for the effectiveness of ISD in the case of group sparsity, though fast decaying property might be able to further enhance the performance of ISD. Our experiments will include the results of both fast decaying signals and non-fast decaying signals. Figure 1 shows that the sorted groups' magnitude

Mathematical Problems in Engineering
}) of our test nonoverlapping group sparse Gaussian signal has fast decaying property, and this is good for our threshold-ISD, though not necessarily. In Figure 2, we present the relative error between recovered signal and original true signal of the test 1, and we can see that our algorithm GM-ADMs brings significant enhancements compared with the corresponding ADMs, for either primalbased ADMs or dual-based ADMs. From Figure 2, we can also see that GM-ADMs can obtain satisfactory promotion compared with ADMs, even if the maximum stage number max is not very large. The key factor to our algorithm is that ISD requires reliable true support detection from inexact reconstruction, if the output of the first iteration (the result of ADMs) is rather unsatisfactory, for example, suffering from insufficient measurement number and/or considerable quantity of noise: it also becomes a hard work for GM-ADMs to achieve very huge promotion because of the inexact support detection. Thus the improvement by ISD when the measurements number = 180 is less obvious than other cases. In Figure 3, we show the comparison results between ADMs and GM-ADMs in noise environments, and ISD still brings better results.
In order to better illustrate the practical advantage of our algorithm, we also give visual comparisons of the reconstruction. Figures 4, 5, and 6 plot the reconstructed signals in the case of = 200, max = 3, and the noise levels are = 0, = 5×10 −3 , = 5×10 −2 , respectively. For paper conciseness, here we just give the comparisons between PADM3 and GM-PADM3, DADM3 and GM-DADM3, since the other cases have the similar conclusions. From Figures 4, 5, and 6, we can see that the results of GM-PADM3 and GM-DADM3 are much better than those of the correspondent PADM3 and DADM3, in either noiseless environment or noise environment.
In Figure 7, we show the sorted groups' magnitude ({‖ ( ) 1 ‖ 2 2 , ‖ ( ) 2 ‖ 2 2 , . . . , ‖ ( ) ‖ 2 2 }) of nonoverlapping group sparse Bernoulli signal whose nonzero components are generated by randomly either 1 or −1 and do not have fast decaying property. In Figure 8, we present the comparison results between ADMs and corresponding GM-ADMs, and we can see that GM-ADMs can achieve considerable promotion either in noiseless environment or in noise environment. We also give the visual comparison between PADM3 and GM-ADM3 in Figure 9. From Figures 8 and 9, for nonoverlapping group sparse Bernoulli signals, we can observe the similar conclusions as nonoverlapping group sparse Gaussian signals.

Synthetic Overlapping Group Sparsity Experiment.
To assess the performance of our algorithm when overlapping groups are given a priori, we generate the simulation data with = 1024 variables, covered by 126 groups of 10 variables with 2 variables of overlap between two consecutive groups: Then we randomly pick of them as active (nonzero) groups, whose entries are either iid random Gaussian or ±1, while the remaining groups are all zeros. We use the same test sets as shown in Table 1, and the only difference is that the signal has overlapping group sparse structure here. In addition, the optimal parameters and terminal condition in Part A are still applicable here. Considering that the YALL1-group package only includes primal-based ADMs for overlapping group sparse problems, here we just compare the results between PADM3 and GM-PADM3, for fair comparison and paper conciseness. Empirically, max = 2 can already achieve satisfactory reconstruction quality in terms of relative error, and thereby we set max = 2 for all of the experiments.
In Figure 10, we show the sorted groups' magnitude }) of our tested overlapping group sparse signals whose nonzero components are generated by Gaussian distribution also has fast decaying property. In Figure 11, we present the comparison between PADM3 and GM-PADM3, and we can see that GM-ADM3 achieve considerable promotion in either noiseless environment or noisy environment.
In order to better illustrate the practical advantage of our algorithm for overalpping group sparse reconstruction, we also give visual comparisons in Figure 12, and we can see that the accuracy rate of the elements' value of PADM3 is fairly low, while GM-ADM3 can achieve a rather high accuracy rate. In Figure 13, we show that the sorted groups' magnitude

Mathematical Problems in Engineering
}) of overlapping group sparse signals whose nonzeros are generated by randomly either 1 or −1 does not have fast decaying property. In Figure 14, we present the comparison between PADM3 and GM-PADM3, where GM-ADM3 achieves considerable promotion no matter in noiseless environment or noise environment. We also give the visual comparison between PADM3 and GM-ADM3 in Figure 15. From Figures 14 and 15, for overlapping group sparse Bernoulli signals, we can observe the similar conclusions as overlapping group sparse Gaussian signals.

A Nonoverlapping Group Sparsity Simulation Example from Collaborative Spectrum Sensing.
In this part, we study an interesting special case of the nonoverlapping group sparsity structure called joint sparsity; namely, a set of sparse solutions share a common nonzero support. This example that comes from collaborative spectrum sensing [37], which aims at detecting spectrum holes (i.e., channels not used by any primal users), is the precondition for the implementation of Cognitive Radio (CR). The Cognitive Radio (CR) nodes must constantly sense the spectrum in order to detect the presence of the Primary Radio (PR) nodes and use the spectrum holes without causing harmful interference to the PRs. Hence, sensing the spectrum in a reliable manner is of critical importance and constitutes a major challenge in CR networks. Collaborative spectrum sensing is expected to improve the ability of checking complete spectrum usage. We consider a cognitive radio network with CR nodes that locally monitor a subset of channels. A channel is either occupied by a PR or unoccupied, corresponding to the states 1 and 0, respectively. It is assumed that the number of occupied channels is much smaller than . The goal is to recover the occupied channels form the CR nodes' observations. Via frequency-selective filters, a CR takes a small number of measurements that are linear combinations of multiple channels. In order to mix the different channel sensing information, the filter coefficients are designed to be random numbers. Then, the filter outputs are sent to the fusion center. Assume that there are frequency selective filters in each CR node sending out reports regarding the channels. The sensing process at each CR can be represented by × filter coefficients matrix . Let an × diagonal matrix represent the states of all the channel sources using 0 and 1 as diagonal entries, indicating the unoccupied or occupied states, respectively. There are nonzero entries in the diagonal matrix . In addition, channel gains between the CRs and channels are described in an × channel gain matrix given by [38]. Then, the measurements reports sent to the fusion center can be written as a × matrix as follows: Now, we need a highly effective method for recovering In , each column (denoted by :, ) corresponds to the channel occupancy status received by CR , and each row (denoted by ,: ) corresponds to the status of channel . Since there are only a small number of channels which are used, is sparse in terms of the number of rows containing nonzero. In each nonzero row ,: , if , ̸ = 0, other entries in the same row are likely nonzeros. Therefore, is a joint sparse matrix.
The weighted 2,1 model of the joint sparsity problem is

14
Mathematical Problems in Engineering  where = [ 1 , 2 , . . . , ] ∈ R × denotes the collection of jointly sparse solutions, and and denote the th row and th column of , respectively.
Let us definẽ ≜ vec ( ) = ( where ∈ R × is the identity matrix, and vec(⋅) and ⊗ are standard notations for the vectorization of a matrix and the Kronecker product, respectively. We partitioñinto groups { 1 , 2 , . . . , }, where ∈ R ( = 1, 2, . . . , ) corresponds to the th row of matrix . Thus, we can obtain an equivalent group w,2,1 problem to (34) as follows: The main advantage of joint sparsity reconstruction is its applications to large scale problems. Therefore, the following simulations is carried out for a relatively large dimensional applications with the following settings: we consider a 16node cognitive radio network (i.e., = 16), the number of channels is 1024 (i.e., = 1024), the number of active PR nodes ranges from 100 to 120 on the given set of 1024 channels, the measurement matrix is Gaussian random matrix, and the size is fixed as = 256.
Here, we adopt the support detection strategy originally used in our previous work [10]. In order to better illustrate this strategy, we introduce a vector , and the elements of are the magnitudes of these groups (i.e., ‖ ‖ 2 , = 1, 2, . . . , ). The rule of our choice is based on locating the "first significant jump in the increasingly sorted sequence Then, we set = ( ) [ ] in (27). For paper conciseness, here we just give the comparison between the ADM1 algorithm and GM-ADM1 algorithm, to demonstrate the superiority of our method. Namely, the multistage w,2,1 process can achieve impressive improvement compared to the original single stage 2,1 process. The parameters and tolerance of ADM1 method are the same as the nonoverlapping group sparse experiment. In addition, for GM-ADM1 algorithm, empirically, we fix the maximum stage number max = 3, ( ) = 0.01 * mean(diff(sort( ( ) ))) in MATLAB.
In Figure 16, we present the comparison results between the ADM1 algorithm and the corresponding GM-ADM1 algorithm, in terms of the relative error, both in noiseless environment and noisy environment (noise level = 5 × 10 −3 ). We can draw similar conclusions as those in parts A and B. Moreover, we also give the ideal results of our algorithm; that is, the support detection is based on the underlying true solution. While we usually do not know the true solution in practice, here we just use it as a reference and name it as ideal GM-PADM1 method (shortened as IGM-PADM1), as an ideal golden upper bound of the performance of iterative support detection based multistage methods. It fully demonstrates the superiority of our new idea; the iterative support detection based multistage process can bring significantly enhancement compared to the standard single stage process. In addition, it makes us believe that the GM-ADMs can bring dramatically better reconstructions compared to corresponding ADMs, as long as we can acquire enough reliable support detections. Even though the ideal case is not possible in practice, however, it serves as a benchmark and chalks out a path for us to explore. In addition, to make a further comparison between the ADM1 algorithm and GM-ADM1 algorithm, in Figure 17, we show the recoverability results of the two algorithms. It is worth emphasizing that we believe that the reconstruction is successful if the relative error is below the given threshold. Here, we just consider the noiseless case, and we set the threshold = 1 × 10 −6 and = 1 × 10 −4 , respectively. Not surprisingly, the recoverability of the GM-ADM1 is better than ADM1.

Conclusions and Possible Future Work
In this paper, we propose a novel GM-ADM approach for group sparse reconstruction problems. The final result is obtained from multistage process consisting of solving a series of weighted ℓ 2,1 model and determining the adaptive weights via ISD. The numerical results demonstrate the extendability of iterative support detection from sparsity to group sparsity.
Our previous work [10] has demonstrated the dependence of threshold-ISD on the fast decaying property in the case of the plain sparse signal recovery for effectiveness. However, in this paper, the threshold-ISD can still    achieve impressive performance without the dependence on the fast decaying property, in the presence of the prior grouping information. Considering that support detection is not limited to thresholding and reliable support detection guarantees better performance, we would like to study other specific signals, for example, structure sparsity signals in the future, and design more effective support detection methods based on the particular property of these signals.