Clustering Optimization for Out-of-Band D 2 D Communications

Significant increase in multimedia traffic challenges 5G networks in terms of capacity and correspondent QoS parameters. Deviceto-device communication paradigm has already become an integral part of 3GPP standards; nevertheless it has not yet been widely deployed due to many different reasons. D2D is expected to leverage implementation of many qualitatively new services and to efficiently accomplish it D2D devices are supposed to form clusters. Due to practical limitations, current D2D implementations are mostly out-of-band and use Wi-Fi Direct. In this paper, we propose a novel model for throughput optimization in out-ofband D2D clusters. We delivered numerical results for different typical cluster member distributions and revealed key functional dependencies. Further, for the first time we compare clustering algorithms for out-of-band D2D and identify effective clustering algorithm that increases network resource utilization rate.


Introduction and Rationale
Integration of device-to-device (D2D) communication technology became a mainstream direction for fifth-generation (5G) communication networks.Driven by a huge increase in demand of multimedia traffic transfer, D2D communication allows saving scarce network resources by transferring data directly between devices either in-band or out-of-band, and D2D communications allow significantly reducing traffic between base station (BS) and end-user device [1].
According to the 3rd-Generation Partnership Project (3GPP) [2], D2D is a flexible paradigm of direct communication between devices which is open for use and based on cellular communication technologies (in-band D2D communication) and also WLAN technologies which are IEEE 802.11 standardization (out-of-band D2D communication) [3].
The last approach has recently become attractive due to the ease of implementation compared to in-band D2D, where end-user device shall be equipped with appropriate uplink/downlink functionality.In case of in-band D2D communications, the transmission power should be properly regulated so that the D2D transmitter does not interfere with the cellular UE communication while maintaining the minimum SINR requirement of the D2D receiver [4].This significantly complicates feasibility of in-band D2D widescale implementation at least for the time being.Out-ofband D2D can be easily implemented with network assistance option; hence cellular operators are able to control out-ofband sessions.For the obvious reasons, IEEE 802.11-basedWi-Fi is taken as the transmission technology for implementation of out-of-band D2D functionality [5].Operators of communication networks can encourage regular users to use D2D technology in order to improve the overall performance of the communication system in return for rewards proportional to their contributions.
Typically, geographically beside D2D nodes can form a cluster (Figure 1), where traffic circulates between cluster nodes directly, and outside-of-cluster traffic is forwarded to BS via relay node, so-called cluster head.A number of algorithms for cluster head selection are available today, e.g., [6,7].The decision on selection of a particular cluster member as a cluster head affects at least network efficiency, energy expenditures, and quality of service (QoS) offered to all members within the cluster.Generally, if all data transfer members are the members of the same cluster, the cluster can operate off-line, meaning without connection to network BS.
A larger number of cluster members are expected to lead to larger savings of the network resources.However, the maximum number of members in a cluster is restricted by coverage of selected D2D technology, channel throughput of cluster head and cluster traffic intensity, and cluster members physical location towards cluster head.Existing studies show that D2D clustering in 5G leads to reduction of signaling traffic and provides higher spectral efficiency and better energy performance than conventional cellular systems [7,8].Thus, efficient D2D clustering in 5G networks especially with high density of devices is of a paramount importance.
Multiple past works have concentrated on quantitative and qualitative analysis of cluster algorithms for D2D communications.In [1,3] the authors provide comprehensive analysis of D2D communications.The use of out-of-band D2D communications and D2D clustering is discussed in detail given criteria of cluster head selection based on channel quality between cluster head and BS.In [9], the authors designed clustering algorithm for in-band D2D case, which increases system-level spectral efficiency.Numerical analysis and simulation modeling have shown that this proposal gives 66% gain in terms of throughput compared to traditional solutions, in the case where 20% of users use D2D communication.The authors derived the probability density formula (pdf) for the optimal number of repeater units in the cluster and have come up with the cross-cluster interaction scheme.Also via simulation the authors show that the proposed algorithm provides gains up to 40% in terms of network efficiency of resource use.
Different aspects of the out-of-band D2D communication are presented in [4,10,11].In [4,10], the authors developed analytical model of the network unloading for different D2D scenarios using stochastic geometry.The authors estimated potential opportunities of the out-of-band D2D communication, using both the system level and the mathematical analysis.They show that at 30% of clustering productivity and energy network performance increase up to four and two times, respectively.In [11] the authors studied problems of implementation of network-assisted D2D communication while interacting in social networks.Besides, they use the existing experimental LTE testbed [12] for implementation of D2D system and show its performance evaluation in terms of latency and users satisfaction.D2D transmission technology selection is still rather limited to Wi-Fi and Bluetooth due to wide implementation of those in consumer devices.Most recent of works, e.g., [13,14], consider D2D devices forming clusters by using Wi-Fi Direct (see Figure 2).Due to features of radio channel, resources of channel between cluster member and cluster head may drastically vary for different nodes within one cluster.Therefore, while forming cluster, we suggest the way cluster head is selected shall be based primarily on anticipated QoS parameters, not distance as many studies suggested up to date [15,16].The same applies to selection of cluster members.Clustering algorithm can be implemented for different target parameters such as cumulative throughput of cluster as a whole, maximum number of cluster nodes, and quality of service.
Summarizing the results of the analysis of publications, we can conclude that most of them are devoted to analysis and evidence of the effectiveness of using both in-band and out-of-band clustering.The efficiency of spectrum utilization is not the only criterion that should be considered when solving the clustering problem.Therefore, further in this paper we design clustering algorithm, which is characterized by bandwidth of the channels between cluster members and the cluster head.
The remainder of this paper is organized as follows.In Section 2, the reference scenario for D2D is defined and analytical model for cluster is introduced.The proposed model is delivered in Section 3. The required analytical and simulation performance evaluation campaign are reported in Section 4. Section 5 discusses clustering algorithms, whereas the concluding remarks and future research directions appear in Section 6.

Reference Scenario Definition
The use of D2D communication allows the increase in system effectiveness of cellular communication; moreover D2D directly influences at system level both efficiency and energy.The users are distributed on the BS coverage area randomly.Generally, network planning takes into account distribution of nodes in the geographical area letting operator provide at least wanted coverage and required throughput and QoS.The possible cluster structures are presented in Figure 2.
We assume that mobile stations (MS) can interact directly with base station (BS), through transit node (CH 1 ) or head node of a cluster (CH 2 , CH 3 ).Generally, one cluster can have a star-like structure with single transit node, head node of cluster, or tree-like structure, including both head node and other transit nodes ().The choice of the cluster structure can be done without the involvement of network functionality (for mobile stations) and involving the network functionality (for the BS).Shaping of the cluster consists of the MS group choice and distribution of their functionality within the cluster (terminal node, transit node, or head node of the cluster).To shape a cluster one needs to define the indicators that characterize the decision for cluster shaping (status indicators) and the criterion validity (quality) of decision and control settings that affect performance status and also method of finding the valid (optimal) solution.
The authors in [3,4] suggested approach for the analysis and shaping of tree-like structure of clusters.Following the proposal, in this paper we focus attention on clusters with star-like structure with one head node; such structure is very useful for high density wireless environments such as apartment block houses, offices, and stadiums.
Quality of traffic service within a cluster and between cluster head node and BS depends on channels throughput between cluster participants (  ) and between head node and BS (  ) and also on traffic intensity (  ) produced by users.We suppose that in BS service area there are  of MS that support D2D mode.We indicate the set of MS as  = { 1 ,  2 , . . .,   }.Then the task of clustering consists of estimating a quantity of clusters () and choosing their structure when the best possible QoS is provided.We assume the efficiency of the solution is higher, if in a service area of BS there are a smaller number of channels (BS-CH) (i.e., quantity of clusters ()) and a greater number of MS which are using D2D mode.At the same time QoS for the participants of clusters should not be below target value (the rule)  0  .Generally, target values can be different for different users.They depend on characteristics of the produced traffic, that is, the type of services required.
We indicate the set of clusters in BS service area as  = { 1 ,  2 , . . .,   }.We assume that all clusters owned by multitude  are formed by elements of multitude , and not all elements of  should be included in clusters, and the clusters have no general elements (they form disjoint subsets)  ⊆ , ∀ ⊂  and ∪ = { }.
As it was said above, cluster shaping can be performed by various methods, the choice of which depends on desired outcome.Further we consider the possibility of use of certain centroid methods for the task of clustering objects [17,18].The solution of the clustering problem represents the solution to the optimization problem where certain metric (,   ) is minimized or maximized.This metric characterizes the "distance" between cluster participant and cluster center   = (1/||) ∑ ∈ .
Distance, throughput, time delay, and so forth can be used as optimization metric.Generally, it is the task of nonconvex optimization which may not have a unique solution.As a rule, to solve this problem dynamic programming is required, which can minimize the parameter  2 (,   ) for all clusters.
Well-known clustering algorithms allow us to find the particular (near-optimal) solution.As it was said above, clustering can be chosen in different scenarios, depending on goals and restrictions.Therefore, to solve the considered clustering task the analysis of possible solutions and approach to selecting of criteria and clustering method is required.

The Proposed Model Description
Quality of traffic service is instantiated by probability and time indexes as the probability of availability, discard probability, and data delivery time.These indexes depend on the traffic parameters and bandwidth of the network connection.Thus, for given traffic characteristics, the amount of bandwidth best describes the results of decision that is made in terms of quality of communication services.Under the throughput we assume the achievable data rate.The throughput is not a complete metric for the quality of service description packet loss ratio and delay it is necessary to build a complete model based on queuing theory.At this stage we concentrate on the initial model where known parameters such as number of users and distribution of users in the service area are available.We suppose that the initial clustering solution shall be done taking into account only throughput parameter in assumption that all users generate equal traffic flows.In fact, analytical results given below can be used for different types of traffic flows.As a target metric we consider throughput between network elements   .In our analysis we consider the case when head node is already defined.We assume the communication area of head node represents as a circle with  radius, centered at the location of the head CH node, as it is shown on Figure 3.
Considering the IEEE 802.11 family standards as communication technology between D2D nodes, one shall define the type of dependence   .According to [12], data transmission rate between two cluster members is defined by making the choice of modulation and coding scheme (MCS) according to receiving conditions (i.e., radio channel quality).These conditions are evaluated by signal power on receiver input  =   or the signal-to-noise ratio (SNR) or signal/noise + noise (SINR), where, in addition to the useful signal, the noise power is total power of all received signals, including noise.The noise power is the power created by nodes from neighboring clusters.According to [12] the model dependence of data transmission rate on signal power at the receiver input  is a jump function which increases with power growth.Figure 4 shows an example of dependence between data transfer rate and magnitude  for the IEEE 802.11n standard using the 20 MHz channel width.
Similar dependence can also be constructed for values of SNR and SINR.Signal power, SNR, and SINR are estimated by the equation given below.where   is the receiver input power (dBm);   is the receiver input power (W);   is the input noise intensity (W);   is the receiver input interference power (W).The receiver input signal power can be described by the RCPI power indicator of receiving channel.According to [19] the value of this indicator is measured with an accuracy of ±5 dB (95% of confidential interval) taking into account the band noise, corresponding to the channel strip.
Along with the specified metrics, signal power on an input of the receiver can be described by the indicator of power of the accepted RSSI signal.For this value the exact compliance with a power of the accepted signal is not defined.According to the IEEE 802.11 standard it can vary from the minimum to the maximum value.
Each of the parameters specified above affects throughput of the channel and can be selected as a metric in the task of a clustering.The choice of specific parameter depends on possibility of its assessment and statements of the problem.In this article we generally restrict ourselves by the analysis of throughput in a cluster without considering a specific method of clusters shaping (this is a topic of further research).In the presented analysis as metrics we selected the input signal power on the receiver  which can be estimated by RCPI parameter.
The value of throughput which can be defined by this model (see Figure 4) depends on receiving conditions and, in practice, has the considerable dispersion.Taking this fact into account for analyzing the bandwidth we can assume that the jump model approximation by the continuous function does not propagate a significant error in the results but significantly simplifies the task.
Considering that most of D2D users are concentrated indoors, we describe the signal attenuation using the model recommended in [20] for indoor application (ITU-R 1238), thus considering out-of-band D2D: where  is the distance in meters;  is the frequency in MHz;  is the back-off power;  is the number of obstacles;   () is the loss of power parameter while passing over the obstacle (dB).Considering the attenuation model we describe dependence of throughput on the distance of jump function (see Figure 5): The jump function was approximated by normal distribution [21], which reflects the trend of the throughput at varying distance.
where  is the distance in meters;  is the constant;  max is the maximum possible data transfer rate (Mbps);  is the halfwidth of the curve in meters.
Generally, mobile stations are distributed across the service area in a random way; therefore, value of distance  and value of signal attenuation () in between are also a random value.
In this case we do not consider signal depression, which also affects the character of a random value of attenuation and consequently throughput.In this analysis we consider only a factor of a relative positioning of users ().
Since throughput is a function of a random value, the distribution function  can be determined according to [22] as where   is the range of  values and (, ) is the function of the user distribution within the circle with radius .
The probability density of  can be determined as The mathematical expectation of  is Further we consider two kinds of functions of the users distribution on the service area: (i) uniform distribution and (ii) normal probability distribution.
3.1.Uniform Distribution.The uniform distribution is described as a set inside of a circle, the square is  and the radius is  ( =  2 ), and the interval is 0 ≤  ≤ .The radius  of the circle is defined as  = arg{ b() = 0}(m).
Probability density function () inside the circle is constant: If the function expressing dependence of throughput  from distances to the base station has the form (5), that is, where  is the radius of the service area, which is defined from the model of attenuation, we can express from (5)  = √−2 ln(/ max ) (m).Throughput distribution function  inside the circle , according to (8), can be expressed as ) . ( The probability density function according to (9) is Throughput distribution and probability density functions are shown on Figure 6.For example, for the IEEE 802.11g standard the mathematical expectation of the throughput () in the service area (circle ) according to (9) can be determined as By choosing approximation throughput function in the distance and using uniform users distribution in the service area, mathematical expectation of throughput, for IEEE 802.11n we obtain 19.51 Mbps.

Normal Distribution.
It is supposed that the traffic intensity in each point on the surface is a random value and is given by random, independent  and  coordinates.Then, the probability density function will be determined by the joint distribution function of random  and .For a normal distribution with a center of dispersion in the center of the circle and circular dispersion (equality of the variance of  and ), the density distribution is equal to where  is the root-mean-square deviation.
For throughput analysis, we assume that the probability of penetration inside the circle  (which is the service area) is equal to 1.The normal law of probability distribution is infinite according to the  and  values.In those conditions, the probability distribution law cannot be normal.However, with sufficient accuracy it can be described by normal truncated distribution [21] as where where   is the area that is restricted by the circle with radius .
The  is throughput distribution function inside the circle .
From ( 5) it can be seen that  = √−2 ln(/ max ), and from (5) according to ( 16) and ( 17) The throughput probability density according to (8) is Throughput distribution function and probability density results are presented on Figure 7.
The mathematical expectation of the throughput value according to (19) is The values of () are 55.72; 47.28; and 25.15 Mbps when the root-mean-square deviation values are 20, 30, and 80 m, respectively.Thus, the average throughput using the normal law of user traffic distribution inside the service area depends on the variance (scattering), when some bigger throughput values happen while there are lower values of variance.
Wireless Communications and Mobile Computing  Average throughput in the case of normal distribution always exceeds analogical value of uniform distribution.
From the analysis of the examples we can assume that the uniform distribution is the worst one according to the average amount of throughput criteria in comparison with other types of distributions, where the coordinates of the access point are coincident with the mathematical expectation of traffic distribution.The mathematical expectation of traffic distribution coordinate is the most appropriate base station installation point from the position of maximum throughput support (potential capacity), at least for unimodal distribution laws.
Using (12) or (18) depending on type of distribution of users we can estimate probability that the throughput not less than the given value: where  min is the minimum throughput value permitted for a cluster member and () is the probability distribution given by ( 12) or (18).
In the condition of  min restriction this probability (21) presents part of users included in clusters.It allows estimating a number of clustering users, choosing parameters of the network from (12) or (18).If in the network service area a number of randomly distributed clusters formed, using (21) we can estimate the number of users served with throughput no lower than  min .

Performance Evaluation
To verify the models above, a simulation campaign was performed.The following data represent the results of throughput simulation between user terminals and terminals within the communication area with uniform distribution.During simulation modeling there are permissible variations about the level of received signal that are verified by short fading effect and represent random value with a Nakagami distribution [22].
where  is the shape parameter;  is the scale parameter;  and  are associated with the throughput by the formula given below: Figure 8 represents analytical and simulation results for the case of uniformly distributed users: dependence between throughput and distance between cluster head and cluster member (a); probability density function (b), obtained via simulation (red) and analysis according to the formula (13) (blue).The average throughput for this case is 17.2 Mbps. Figure 9, in turn, represents simulation results for the case of normally distributed users: dependence between throughput and distance between cluster head and cluster member (a) and probability density function (b), obtained via simulation (red) and analysis according to formula (19) (blue).The average throughput for this case is 47.4 Mbps.Therefore, one can conclude that in case of normal distribution the average throughput between cluster head and cluster member is larger than 2.5 times that for the case with uniform distribution.Also, as it can be seen from the simulation and analytical distributions, the probability density function of throughput is sufficiently close to the simulation modeling results.This allows us to make a conclusion on credibility of obtained models for uniform and normal distributions of user devices.Hence, both analytical and simulation models allow defining the throughput in clusters with different user distribution in the service area of the head node.

Clustering Method Selection
Figure 10 shows an example of clustering of 5,000 objects using two different algorithms [23,24] (-means algorithm on the left and FOREL algorithm on the right), obtained as a result of our simulation modeling.This is a theoretical example that does not reflect the real sizes of clusters in the network.However, this method quite clearly demonstrates the difference of the result, by choosing different methods of clustering.
As it can be seen from Figure 10, with an equal number of clusters (25 clusters in total) the shape of clusters is marked visually differently in the first and second cases.
To compare these methods we performed analysis of cluster members distribution processed relative to the cluster centers.Distribution was obtained by simulation modeling.For center-of-mass we used the following expression: where   the is center-of-mass coordinate of cluster ;   is the coordinate of   element of   cluster;   is cluster ;   is element  of cluster ;   is the number of elements within the cluster ; || is the number of clusters.
Figure 11 shows the empirical probability density of cluster members coordinates relative to cluster center (histogram) and its approximation by the normal distribution (smooth curve) for -means and FOREL cases, respectively.The diagrams show that in both the first and second cases the distribution of elements within clusters (relative to the centers of clusters) is close enough to the normal.It is worth noting that for the considered clustering algorithms dispersion of elements in clusters is remarkably different, which can be definitely seen from Figure 11. Figure 12 shows an example for small number of clusters, 2 and 5 clusters, respectively.The figure shows that different algorithms form clusters in different ways.-means split two clusters of a similar size, FOREL in turn has combined most of the elements in one cluster of a given size, and for the remaining elements 4 clusters were formed.
As it is shown on Figure 13, with a small number of clusters, the distribution of the elements inside them is closer to uniform than to normal, and more specifically it is closer to the original distribution of elements in a clustering area.It should be noted that the shape of the boundaries of the clusters when using -means is close to the Voronoi diagram [17], which is constructed relative to the cluster centers.Therefore, in case of large number of clusters the distribution of cluster members can be approximated by normal law, in case of small number of clusters by uniform distribution.

Conclusions and Future Work
In this paper, we propose and evaluate a novel model for throughput estimation in out-of-band D2D clustering.Compared to existing studies where a distance between cluster head and cluster members was used, we suggest using a throughput.We obtained the numerical results for different types of typical cluster members distributions, uniform and normal.We delivered closed-form analytical expressions for probability distribution of throughput in the cluster for a given distribution and density of users.Through analytical and simulation studies, we show that the average throughput between cluster head and cluster member for the normal distribution is 2.5 times larger than for the case with uniform distribution.The obtained results show that known clustering Wireless Communications and Mobile Computing algorithms can be used to choose cluster head which provides near-optimal solution for throughput of channels between cluster head and cluster members.
Further, by using well-known clustering algorithms means and FOREL we obtained distribution of cluster members for large and small number of clusters.We show that, for the case with large number of cluster members, compared to FOREL -means algorithm gives remarkably smaller dispersion of elements in clusters.Thus -means constructs cluster of similar sizes, compared to FOREL that constructs clusters of very different sizes.Therefore, for out-of-band D2D case -means provide better clustering considering target even resource distribution and increase of network resource utilization rate.
Our future work will concentrate on further enhancements of the delivered model by introducing more QoS metrics such as packet loss ratio, delay, and energy as optimization parameters.Also, a system-level performance evaluation would be needed to understand ultimate implications of the suggested model.In addition, we plan to consider 3D cases for out-of-band D2D scenarios, as well as cluster member location dynamics, which will allow addressing advanced drone-based scenarios of D2D communications.

Figure 4 :
Figure 4: The data transfer rate dependence from the magnitude .

Figure 5 :
Figure 5: Dependence of data transfer rate on the distance to the device.

Figure 6 :
Figure 6: Throughput probability distribution and the probability density functions.

Figure 8 :Figure 9 :
Figure 8: Dependence of throughput on the distance and probability density in case of uniform distribution of user devices.

Figure 10 :Figure 11 :
Figure 10: Clustering of 5,000 of elements using two different algorithms.

Figure 12 :Figure 13 :
Figure 12: The node distribution within clusters by using different clustering methods, 2 and 5 clusters.