Distributed Joint Cluster Formation and Resource Allocation Scheme for Cooperative Data Collection in Virtual MIMO-Based M2M Networks

an open


Introduction
Machine-to-machine (M2M) or machine type communications (MTC) have been recognized as a key and promising technique in next-generation communication networks [1].Impelled by the recent theoretical advances, various kinds of M2M-based application networks are attracting more and more attentions, such as smart grid [2], vehicle communication networks [3], and e-health [4].In these networks, various types of electronic terminals can be self-organized together and work without the intervention of humans.However, no matter in which applications, an efficient scheme for the realtime information collection plays an important role for the feasibility and reliability of the network, since it is crucial for the functions of network monitoring, automatic control, safety precautions, and so forth.
In many kinds of networks, the number of terminals may be very large, and their payload and distribution conditions are always different.Therefore, how to collect the real-time information from the whole network efficiently and automatically is a great challenge.To deal with this issue, many traditional schemes can be utilized, such as the CSMA-based schemes, typical cellular multiple access schemes (e.g., CDMA, OFDMA) [5], or schemes with noncooperative games [6].However, with the limitation of frequency resources, as well as the growth of terminal number and payload size, cooperative schemes have showed more promising performance for the high efficient transmission with finite resources [7].
Among cooperative schemes, cluster-based communications are a promising research direction.Though it is attracting more and more attention, how to construct a highly efficient network structure and take full use of the limited resources is still a great challenge in M2M networks.The authors in [8] gave a comprehensive survey of the existing clustering algorithms for mobile ad hoc networks (MANETs).Literatures such as [9,10] focused on the transmission schemes in cluster-based wireless sensor networks (WSN) under different situations.Furthermore, how to select or deploy the cluster heads has also been considered in the context of cluster-based WSN networks [11].However, the particularity of WSN makes these results not suitable for common M2M networks for generally two reasons.First, existing researches pay more attention on the battery life of sensors and take no account of the delay performance [10].Second, the own properties of the information are always taken as the clustering criterion in WSN, such as compressibility [11].Though some cooperative data collection schemes for M2M networks have also been studied recently, such as [12,13], most of them focused on developing centralized algorithms.In addition, the joint cluster formation and resource allocation scheme, as well as the corresponding theoretical analysis, have not been sufficiently studied neither.
In this paper, we focus on cluster-based M2M network with VMIMO protocols.In this kind of networks, the members within a cluster transmit the information by means of VMIMO, which is different from the cooperative transmission schemes with cluster heads [14].Though the VMIMObased schemes have already been studied in WSN [15,16], there were few practical solutions on the efficient cluster formation by taking resource allocation into consideration.Meanwhile, the quantitative theoretical analysis on the advantages and properties of this kind of system has also not been given clearly in existing researches.
In this context, a distributed joint cluster formation and resource allocation scheme is proposed in this paper for the real-time data collection in VMIMO-based M2M networks.We first give a hierarchical cooperative transmission scheme with two communication phases: the information sharing phase and the cooperative transmission phase.To demonstrate the advantages of this cooperative scheme and the influence of different factors on performance, some theoretical results are given.Based on these work, we propose a joint cluster formation and resource allocation scheme based on coalition formation game and then provide the whole implementation procedure via a distributed cooperative algorithm.In this paper, time cost for data collection is chosen to be the performance indicator to meet the requirement of some realtime applications.
The rest of this paper is organized as follows.We first present the system model in Section 2. Then an optimization problem and the corresponding theoretical analysis are given in Section 3. In Section 4, a distributed joint cluster formation and resource allocation scheme is proposed.Simulation results and the data analysis are provided in Section 5. Finally, we conclude this paper in Section 6.

System Model
Consider a cooperative data collection system in M2M networks shown in Figure 1.There are totally  distributed single-antenna data acquisition nodes (DANs), a base station (BS) with   antennas, and a data center within a certain region.The roles of these nodes are different in different networks, such as smart meters in smart grid or vehicle sensors in vehicle communications networks.All the information acquired by these DANs should be collected by the BS timely and forwarded to the data center (such as power center in smart grid).

Cluster-Based Data Collection.
In this system, the data collection process can be carried out cooperatively and hierarchically.Before transmission, the DANs can organize themselves into  (1 ≤  ≤ ) disjoint clusters according to certain rules without the intervention of humans.Let S = {S 1 , . . ., S  } be the set of clusters, where S  is the th cluster with |S  | nodes.Based on the cluster-organized network structure, the whole data collection process can be divided into two phases (subprocesses).In the first phase, the nodes in the same cluster share their data information { 1, , . . .,  |S  |, } with each other by means of broadcasting, where  , denotes the payload of the th DAN in cluster S  .Since the range of these intracluster communications is always short, unlicensed spectrum can be utilized to enhance the spectral efficiency.In the second phase, the nodes in the same cluster cooperate together as a virtual MIMO (VMIMO) transmitter and send the combined payload  S  = ∑ ∈S   , to the BS jointly.In this phase, licensed spectrum (such as the cellular spectrum) is occupied for the possibility of long-distance communication.In the first-layer transmission, TDMA scheme is chosen, since it is the most common option for the nodes to support various kinds of shortdistance communication approaches in different scenarios, such as WiFi, Zigbee, and bluetooth.However, since the distances in the second-layer transmission are much larger, it is necessary for them to adopt a compatible MA scheme with LTE system to avoid extra interference.Therefore, we adopt OFDMA scheme for the second-layer transmission.
International Journal of Antennas and Propagation 3 2.2.Traffic Model.The payload conditions are different in different kinds of networks.In this paper, we define the traffic model according to IEEE Std 2030 [17], which is the guidance provided by IEEE to model the features of smart grid.For the th DAN, the packet number and the size of each packet all obey truncated Pareto distribution, which has also been used in several data analysis fields [17].Assume the upper and lower bounds of packet number are  max and  min ; the packet number of the th DAN can be generated with the following probability density function correspondingly: where  1 is the constant parameter of Pareto distribution and (1 − ( min / max )  1 ) −1 is the probability normalization factor.
Similarly, the length of each packet ranges from  max to  min also obeys the truncated Pareto distribution with parameter  2 .Hence, the total payload of the th DAN is   = ∑   =1    , which is to be shared in the first communication phase within cluster S  .

Data Transmission Model.
As mentioned above, the data collection process in the system contains two phases, and hence the data transmission model also can be formed with two parts.
Since the TDMA scheme is adopted in the intracluster information sharing process, all DANs within the same cluster broadcast their information sequentially.We assume that the transmit power of DANs can be low enough to eliminate the intercluster interference (ICI).In this context, the minimum time cost for the th DAN to share its information with all the others in the same cluster S  can be expressed as where ℎ , î and  , î are the channel and distance between the th DAN and its farthest partner î in cluster S  ,  is the path loss factor,  is the available bandwidth in unlicensed spectrum,  1 is the transmit power of DANs in the first communication phase, and  2 is the noise variance.Therefore, the total time cost for finishing the information sharing process in cluster S  can be calculated as Before cooperative transmission, the DANs in the same cluster combine their own data with the received data and generate a whole payload  S  = ∑ ∈S   , .After that, they transmit    to the BS with VMIMO protocol in the second phase.In this phase, OFDMA scheme is adopted.Hence, after being multiplied by weight vector, the received signal at the BS that comes from cluster S  in the second phase is where is the channel matrix among DANs and antennas of the BS, w ,S  ∈ C   ×1 is the normalized precoding vector adopted by DANs in S  , that is, [w  ,S  w ,S  ] = I   ,  2 is the transmit power in the second phase,  S  is the distance between the cluster center and the BS, w ,S  ∈ C   ×1 is the receiving weight vector of the BS,  S  = w  ,S  n S  is the scalar noise after processing, and n S  is the additive white Gaussian noise (AWGN), denoted by CN(0,  2 ).
In this system, maximal ratio transmission (MRT) and maximal ratio combining (MRC) are adopted by DANs and the BS, respectively [18].Specifically, the precoding vector for DANs in S  is w ,S  = k 1,S  , where k 1,S  is the first right singular vector of the channel matrix H S  .Correspondingly, the receiving weight vector for the BS is w ,S  = H S  ⋅ k 1,S  .In this case, the signal-to-noise ratio (SNR) is where  max,  is the largest singular value of H S  , which corresponds to the first right singular vector k 1,S  .Then the time cost for cooperative transmission in the second phase can be derived as where  S  is the available bandwidth of cluster S  .Therefore, the total time cost for the BS to collect all data from cluster S  is Obviously, the total time cost for information submission of the th DAN in cluster S  is also equal to  S  ; that is,  ,S  =  S  .
Since in many application scenarios, time-effectiveness is a key performance indicator or even determines the success or failure of network functions, such as network monitoring or safety precaution in vehicle networks or smart grid, we will take time cost as the performance evaluation criterion in this paper.As we can see from ( 2) to (7), the network structure and resource allocation strategy in this system can influence the performance.In next section, the theoretical analysis will be given to show this influence explicitly.After that, a practical scheme will be proposed.
International Journal of Antennas and Propagation

Theoretical Analysis on Hierarchical Cooperative Transmission
3.1.Optimization Problem Formulation.In this system, the ideal solution is to find out a high-efficient network structure and a resource allocation strategy to minimize the average time cost for data collection, which is defined as follows: If the global network information is available at the BS, such as the channel information, the payload condition, and the distribution of DANs, a centralized approach can be used to obtain the optimal system performance.
To describe the cooperative relationships, a network structure indicator matrix S ∈ C × and a resource allocation indicator matrix W∈ × can be defined, where  and  are the cluster number and available subband number, respectively.Their elements are where w  indicated the available subbands for cluster S  .Thus, the time cost  S  can be rewritten as The average time cost  in (8) can also be redefined correspondingly.In this context, the following optimization problem can be formulated: where  0 is the constraint on the maximum period.The third constraint guarantees the nonoverlapping cluster formation, which means that one DAN can only join in one cluster.The last constraint ensures the exclusive bandwidth allocation to avoid interference among different clusters.
However, problem ( 11) is a nonlinear integer programming problem.To obtain the optimal solution, all possibilities of cluster formation and the corresponding optimal resource allocation pattern should be enumerated, which is an NPhard problem when the DAN number grows up.Though the optimal solutions can hardly be obtained, the properties of them and the influence factors on system performance can be obtained by the theoretical analysis, which is our work in the next subsection.

Advantages of Hierarchical Cooperative Transmission.
Consider the cluster S  with   DANs.To prove the advantages of cluster-based transmission with VMIMO protocol, we compare the average time cost of the noncooperative and cooperative schemes under the same condition of bandwidth and power resources.Assume that the available bandwidth is   and the transmit power of each node is  2 ; then if all DANs send their information separately with time division strategy and the BS processes the received signals with MRC, the average time cost can be calculated as follows: where  ,max is the largest singular value of the channel vector h  ∈ C   ×1 ,   is the distance between node  and the BS, and () and () are the average packet number and packet size of DANs, respectively.Since both  and  obey truncated Pareto distribution, () and () can be calculated as follows: According to (6), if the DANs in cluster S  cooperate with each other as a VMIMO system, the average time cost can be written as ) , (14) where  max is the largest singular value of the whole channel matrix is the average distance between DANs and the BS, and () and () are the same with the ones defined in (13).
Since H  is composed of h  , ∀ ∈ S  , the relationship  max >  ,max , ∀ ∈ S  is always true according to matrix theory.Assume that the range of a cluster is much smaller than the distance to the BS; then the approximation  ≈   holds.Define   ≜ log(1 ) as the achievable rates in unit bandwidth; then difference value between  , and  *  can be calculated as Since  * >   , ∀ ∈ S  , then  , −  *  > 0 holds.From (15) it can be seen that, for a certain number of DANs, the proposed cluster-based transmission scheme can bring more benefit on transmission efficiency when comparing with noncooperative scheme under the same condition.
However, the result in (15) has not taken the cluster formation cost into consideration.Actually, for a whole data collection process as mentioned in Section 2, the cost in information sharing phase also cannot be ignored.According to (2) and (3), the average time cost for the information sharing process in cluster S  is ) .( 16) It shows that this process not only increases the time cost for data collection but also occupies a certain amount of power resources and therefore brings negative effect to the system performance.Define R = log(1 Combining ( 15) and ( 16), we can calculate the final performance enhancement for this scheme, which is given as follows: The result in (17) shows that the performance enhancement is influenced by many factors.And in all of these factors, the network structure and the resource allocation strategy are the most important ones.

Influence of Network Structure and Resource Allocation.
As shown in (17), there are many parameters related to the network structure, such as   , ℎ , î, H  , and  , î.Therefore, how to form an optimal network structure is a key issue in this network.Define the network structure as S = {S 1 , . . ., S  }, and the following property can be established.
Property 1.Though the cluster-based cooperative transmission scheme can bring benefit for the system, a grand cluster, which is denotes by S = M, can hardly be formed to get the optimal performance.
Proof.Assume that the total bandwidth for the second phase transmission is  2 , and the available bandwidth of cluster S  is   =   ⋅  2 /.Consider the average time cost in the cooperative transmission phase  *  , and if the DAN number in cluster S  increases, the first-order and secondorder derivatives can be calculated as where  ≜  2  2 max  − / 2 .From (18), it can be seen that, with the increase of   , the  *  can be reduced.However, the declining trend of  *  becomes more and more slow.On the other hand, the average farthest distance in a cluster,  , î = (1/  )∑ ∈S   , î, is generally proportional to   ; that is,  , î ∝   .Meanwhile, in order to avoid the interference among clusters, the transmit power  1 should be low enough.
Therefore, the first-order derivative  (1)  /  > 0, which means that the average time cost for the information sharing in the first phase scales up with   .Thus, the average total time cost   =  (1)  + *  is a convex function of   .With the growth of   ,   may even become larger than the time cost  , of noncooperative scheme.Then, some DANs have incentive to deviate from the grand cluster and form small clusters instead.Therefore, the grand cluster can hardly be formed.
Apart from network structure, power and frequency resource allocation can also impact the performance greatly.Assume that the total available power for a DAN is  0 , which can be allocated to the two communication phases; that is,  0 =  1 +  2 , as shown in ( 2) and (5).Though it can be verified that for the total time cost in (7),  2  S  / 2  1 > 0, the optimal point of power allocation cannot be chosen because interference among clusters may be introduced if  1 is large.

International Journal of Antennas and Propagation
Therefore, for simplicity, we set  1 =  2 in this system, where  is a constant satisfying 0 <  < 0.5.
Since the time-division scheme is adopted for the intracluster transmission, the frequency resource allocation refers to the bandwidth allocation in the second communications phase.According to ( 14) and ( 17), it can be calculated that The results in (19) show that, for a certain cluster, with the increase of available bandwidth, the average total time cost   =  (1)  +  *  can be reduced.On the other hand, the average performance enhancement Δ  also decreases with the growth of available bandwidth, which means the decrease of spectral efficiency.Therefore, an appropriate resource allocation strategy is necessary for the network to maximize the spectral efficiency and meanwhile guarantee the fairness among clusters.
According to the above theoretical analysis, a joint cluster formation and resource allocation scheme will be proposed in the next section, which is implemented via distributed method.Though the time cost performance of the proposed scheme may be worse than the centralized scheme, the optimal solution of the later scheme can hardly be obtained in practical M2M applications, since it is hard to deploy a centric controller for all the DANs and meanwhile the computing complexity is quite high.

Distributed Joint Cluster Formation and Resource Allocation Scheme
In this section, we first formulate the problem as a coalitional game.Then a simplified resource allocation criterion is provided to transfer the complicated original game into a feasible game.After that, we give the detailed implementation process for the scheme based on a practical distributed algorithm.

Coalition Formation Game Formulation.
We formulate the cooperation among DANs as a coalitional game (M, V), where M is the player set who are seeking partners to maximize their utilities, and V is the payoff mapping of these players.For a coalition S  , if its utility function V(S  ) is only decided by its own members and irrelevant with the player set M \ S  , it is a coalitional game with characteristic form [19]. On the contrary, if the behaviors of players outside can influence V(S  ), the game can be recognized as a coalitional game with partition form.On the other hand, from the view of utility allocation, coalitional game can be divided into two categories: games with transferable utility (TU) and nontransferable utility (NTU).In the former category, the coalition utility V(S  ) can be divided and transferred among players inside, which satisfies ∑ ∈S  V  = V(S  ), where V  is the payoff of the th player.In NTU coalitional game, however, the utility V(S  ) is a payoff vector, whose elements are the payoffs of players decided by their own strategy and joint behaviors of others in S  .
In this system, the utility function of the th DAN in cluster S  is defined as V , = − ,S  = − ( (1)  S  +  (2) where  ,S  has been defined in (7).Since the time costs of all DANs in the same cluster are the same, the total utility of cluster S  is Among all coalitional games, canonical coalitional games and coalition formation games are the most typical ones which have been studied in communication systems [20].However, since we have already proved that the grand cluster cannot be formed in Section 3, we only focus on the later kind of games, in which many small clusters can be formed to benefit all DANs in this network.
As we can see, the utility function V(S  ) contains the term   , which is not only determined by DANs in cluster S  but also influenced by the resource allocation strategy of the whole network.Therefore, this game can be recognized as a NTU coalition formation game (M, V) with partition mode.However, coalitional games with partition mode are always quite complicated to be solved in practical scenario [19].In the next subsection, we will provide a simplified resource allocation rule in this system with the aim to transfer the game into the one with characteristic mode.

Resource Allocation.
In this system, both power and frequency resources can be allocated.As shown in Section 3, though there is an optimal tradeoff between  (1)  S  and  (2)   S  with proper power allocation, we set a small fixed ratio between  1 and  2 to avoid the intercluster interference for simplicity.Therefore, the resource allocation here refers to the bandwidth allocation in the second phase.
In order to transfer the original game into a game with characteristic mode, the available bandwidth  S  for cluster S  should only depend on its own members.To achieve this goal, we first divide the total bandwidth  2 into  parts virtually with the same size  V =  2 /.Then, each DAN is assumed to be bound with one part of the bandwidth resource.If the th DAN joins into a cluster S  , the bandwidth  ,V can be used by all DANs within the same cluster.In this way, the frequency resource can be recognized as being carried out in nodes' level and therefore the total available bandwidth for cluster S  is   =   ⋅  V .Though it is not the optimal allocation, the computing complexity of this manner is much lower when comparing with the centralized scheme shown in (11).Meanwhile, V(S  ) here is only determined by the DANs in S  and therefore the original game has been transferred into a game with characteristic mode.

Initialization
Set each individual DAN as a cluster, and the initial partition is S = M = {S 1 , . . ., S  }.

Result reported and final resource allocation
(1) Report the final partition S * = {S * 1,V , . . ., S * ,V } and the corresponding virtual resource allocation results W = { * 1,V , . . .,  * ,V } to the BS.(2) The BS allocates proper frequency resources for each cluster with the same amount denoted by W.
Algorithm 1: Implementation of the joint cluster formation and resource allocation scheme.

Distributed Implementation for Joint Cluster Formation and Resource Allocation.
According to the theoretical results in Section 3 and the resource allocation rule above, the joint problem of cluster formation and resource allocation can be solved with a distributed method.Before the description of the implementation process, some concepts should be briefly introduced [19].The DANs prefer to self-organize into the clusters indicated by partition S with utilitarian order if the following condition is satisfied: The operator ⊳ shows the comparison relation.Equation ( 22) means that the total social welfare in the network increases if the network structure changes from R to S. In other words, the average time cost for data collection with network structure S is lower.
Based on these definitions, two important rules for the partition adjustment can be introduced [19], which are the foundation of the distributed implementation process that will be described later.This merge-and-split algorithm has also been used in other cooperative networks [20].Based on this algorithm, the implementation process for the distributed joint cluster formation and resource allocation scheme can be designed.The detailed procedures are provided in Algorithm 1.The whole process can be carried out in a distributed method and therefore is easy to be achieved.Specifically, in the iteration process, all DANs try to find their proper partners to get the maximum utilities.The solution of network structure and resource allocation changes virtually with each merge and split operation.Only when the iteration process stops, the final solution of network structure and resource allocation is valid and can be executed actually by the DANs and the BS.
Authors in [19] have proved that the merge-and-split algorithm can converge to at least one kind of stability, D ℎ .Since the cluster utilities keep increasing in the process of merge-and-split operations, the final results can be regarded as optimal solutions in the view of utilitarian order.Meanwhile, since there is no payload transmission in the virtual iteration process, the time cost for cluster formation is very low.Therefore, though the performance of the proposed scheme is worse than the optimal solution in problem (11), the computing and implementing complexity is much lower while the solutions in (11) can hardly be obtained.

Simulation Results
To verify the performance of the proposed scheme, we set up a practical simulation scenario.A number of DANs are uniformly distributed within a 1 km × 1 km square area and the BS is located at the center of this focused area.The total available bandwidth in the unlicensed and licensed spectrum is 5 MHz and 10 MHz, respectively.The total power constraint per node is 10 and 20 dBm, which is divided into two parts with the power ratio being fixed at  = 0.1.The channel model is constituted by Rayleigh fading and pathloss components, and the pathloss factor is set to  = 3.The payloads are generated by truncated Pareto distribution with parameters  1 = 1.5 and  2 = 3.And the average payload size is 5 kbits.The constraint on the maximum collection period is 1 s.

International Journal of Antennas and Propagation
) versus different number of DANs (  ) with different cluster locations ( = 100 m, 200 m).
Figure 2 shows the variation trend of the average time cost with different number of DANs and cluster locations.It can be seen that, with the increase of the DAN number in one cluster, the average total time cost can be reduced to a minimum value and then begins to increase at a certain position.Specifically, with more DANs, the time cost for cooperative transmission declines correspondingly.However, the declining trend becomes more and more slow.Meanwhile, more DANs create a larger cluster range, which will raise the cost for information sharing.Hence, a certain number of DANs is required for a cluster to obtain the optimal performance in this context.Besides, the cluster location also influences the variation trend of performance.For example, the same clusters at the locations 100 m and 200 m far from the BS have different time cost as well as the switch point of DAN number.The reason is that the farther the distance between cluster and BS is, the more enhancement on performance can be obtained via cooperative transmission, and therefore the larger DAN number is required.
Figure 3 shows the performance comparison between the proposed and the noncooperative schemes with the change of available bandwidth under different conditions of DAN number.Obviously, larger available bandwidth can bring better performance for both schemes.However, the difference between the performance enhancement of these two schemes becomes smaller and smaller.It means that though increasing the bandwidth for a cluster can reduce the time cost, the spectral efficiency will decrease inversely.In addition, Figure 3 also shows that it is reasonable to allocate more frequency resource for clusters with more DANs, since the spectral efficiency in large clusters is higher.
Figure 4 shows the comparison results between the proposed and the noncooperative schemes with the variation  of DAN number and transmit power under the condition of the same frequency resources; that is,   =   ⋅  V as mentioned in Section 4. It can be seen that, with a certain number of DANs, the performance of the proposed scheme is much higher than the noncooperative scheme due to the advantages of VMIMO.However, if there are too many DANs in a cluster, the performance will decline since the benefit of cooperative transmission is not large enough to cover the cost for cluster formation and information sharing.Meanwhile, from Figure 4, we can also conclude that more benefit of the proposed scheme can be gotten with lower transmit power.The reason is that the poorer the condition is, the more necessity for DANs to find partner is.

Conclusion
In this paper, we focused on the issue of cooperative data collection schemes in M2M networks.To collect the real-time information efficiently, a cluster-based hierarchical transmission scheme with virtual MIMO protocol was provided.In this scheme, the data was firstly shared with intracluster broadcasting.And then all partners transmitted all the total payload cooperatively.To construct the network and allocate resource optimally, we first formulated an optimization problem and derived theoretical analysis results on the advantages and properties of the system.Since the optimal solution can hardly be obtained in practical use, a distributed joint cluster formation and resource allocation scheme was proposed under the context of coalition formation game.Simulation results showed the efficiency of the proposed scheme by evaluating the average time cost performance for data collection.

Figure 1 :
Figure 1: An illustration of system model.

Definition 1 (
partition).If the whole set of DANs M can be divided into a set of coalitions S = {S 1 , . . ., S  }, which satisfies ∪  =1 S  = M and S  ∩ S  = , ∀ ̸ = , S can be defined as a partition of M. Each partition represents a kind of network structure.Definition 2 (utilitarian order).Assume that R = { 1 , . . .,   } and S = {S 1 , . . ., S  } are two different partitions of M.

( i )
Merge Rule.For any subset of clusters {S  1 , . . ., S   }, they can merge to a new cluster S  when the condition {S  } = {⋃

Figure 3 :Figure 4 :
Figure 3: Performance comparison between the proposed and the non-cooperative schemes versus different frequency resources (  ) with different number of DANs (  = 2,5).