D2D-Enabled Small Cell Network Control Scheme Based on the Dynamic Stackelberg Game

,


Introduction
Today, we are witnessing a tremendous increase in mobile data traffic at a rapid pace owing to the increasing number of users and the explosive growth of mobile multimedia services.Ubiquitous devices such as smart phones have fueled the demand for data intensive applications including live video streaming, real-time social networking, and mobile gaming.With the explosive growth of multimedia data traffic, the wireless cellular networks need more bandwidth to boost their system capacity.However, the wireless bandwidth is an extremely valuable and scarce resource, and it is almost exhausted.Furthermore, classical ways of improving cellular network capacity have suffered from physical and economical limitations.Therefore, current research on 5G networks is geared towards developing intelligent ways of data dissemination by deviating from the traditional network architecture [1][2][3].
Small cells have been widely viewed as a key enabling technology for 5G mobile wireless networks.By densely deploying low-power low-cost small cell base stations (SBSs), network system can improve local coverage, bandwidth efficiency, network throughput, and energy efficiency [1,2,4].However, small cell networks cannot solve the problem of backhaul congestion.With the rapid increase of multimedia data traffic, the pressure on the backhaul traffic overhead becomes more and more serious while jeopardizing QoS satisfaction throughout the entire network [4].In order to solve this problem, caching mechanism in SBSs is an attractive approach to improve the transmission rate while reducing the backhaul load.In the cache-based operation, each SBS is equipped with a local cache and serves user requests using its cached contents.If the users' requested contents already exist in the caches of SBSs, the SBSs can directly transmit the contents to users without backhaul.This approach allows the caching mechanism to shift the backhaul traffic with effective access delay [4][5][6].
In next-generation cellular networks, device-to-device (D2D) communication has recently attracted a substantial amount of attention from both industry and academia.D2D technique enables mobile users to communicate directly when those users are in range for direct communications [2].Initially, D2D technique was proposed to enhance the performance of multihop systems.In a 5G cellular network, 2 Mobile Information Systems D2D communication is one of the key technologies to get to very high data rates through offloading part of the cellular traffic onto D2D communications.This approach can reduce the backhaul load without the cost of additional network infrastructure.While extensive research is targeted on addressing the many 5G networks challenges, we face a myriad of technical challenges of D2D communications [7,8].
To provide D2D communication services, the bandwidth of SBS should be divided into two subbands, called licensed and unlicensed bands.Unlicensed band is able to be leased to the unlicensed users to perform D2D communication.The licensed users of the SBS in the licensed band are protected while the unlicensed users seek chances to transmit with a limited amount of power.However, due to the temporal fluctuations of service requests, the assumption of fixed assignment for licensed and unlicensed bands may not be practical.To get a globally desirable 5G network performance, the bandwidth in each SBS should be adaptively split to improve the system performance [9,10].
To design a novel 5G network control scheme, we need a new control paradigm.Nowadays, the interaction between the rational agents, who are conflicting objectives, is often characterized using game theory.Game theory is the study of strategic interactions between multiple intelligent rational decision makers trying to maximize the expected value of their own payoffs.In particular, game theory has been successfully applied to wireless communications for solving competition problems on network resources [11].In the situation of D2D-enabled SBS operation, SBSs and users are rational individuals.Motivated by the facts of 5G network system, we have adopted a game theoretic approach to develop a practical cache placement and bandwidth splitting algorithms.In this way, we are able to ease the heavy computational burden of theoretically optimal centralized solutions.
However, game theory also has its own shortcomings.First, the idea of classical game theory has mostly been developed in a perfect rational perspective.This rationality of the player requires complete information in real-world operations.However, in reality, this assumption rarely holds.Second, most game theoretic models seek one-sided or unilateral stability in a static setting.Therefore, they cannot capture the adaptation of players to change their strategies and reach an effective solution over time.Last, but not least, game theoretic methods require arduous endeavors to solve multiple high-order polynomial equations.In practical cases, it is a complex and difficult work to be solved in the real-time process [11].
In the study, we devise a new game model, called dynamic Stackelberg game, to adapt effectively to the D2D-enabled small cell network situation.Specifically, the hierarchical relationship between SBSs and users best suits the Stackelberg game model.In a classical Stackelberg game, one player acts as a leader and the rest as followers, and the main goal is to find an optimal strategy for the leader, assuming that the followers react in such a rational way that followers optimize their objective functions given the leader's actions [11,12].However, in our dynamic Stackelberg game, game players can be leaders or followers dynamically.Based on the current roles, players make control decisions logically in order to pursue their own interests while learning the current system conditions.For the cache placement algorithm, each SBS is a single follower, and its corresponding users are multiple leaders.For the bandwidth splitting algorithm, each SBS is a single leader, and its corresponding users are multiple followers.Under dynamically changing 5G network environments, this dynamic and flexible approach can obtain the finest solution.
To maintain a well-balanced network performance, we use learning and bargaining algorithms in a distributed manner.By taking into account local, global, and social learning ways and simple bargaining process, individual SBSs make intelligent decisions to effectively address the caching and splitting issues.Different from existing work, we focus on design principles such as feasibility and self-adaptability to provide a desirable solution.Therefore, the major novelty of our scheme is its effectiveness for 5G network dynamics.Although several D2D-enabled small cell network control schemes have been proposed, there has been very little research by integrating game theory and learning algorithms.
1.1.Contribution.Our study generalizes the cache placement and bandwidth splitting algorithms in the following aspects.To model the interaction between each SBS and users, we design a new dynamic Stackelberg game.As a follower, the SBS monitors his multiple leaders, that is, its corresponding users to decide cache contents.As a leader, the SBS splits his bandwidth for licensed and unlicensed services.Employing learning and bargaining approaches, control decisions in each algorithm are made in an adaptive online manner.Finally, a fair-balanced solution can be obtained under diversified D2D-enabled small cell network situations.In summary, the contributions of this paper are as follows.
(i) Dynamic Stackelberg Game Model.Motivated by hierarchical and feedback depending situations, we introduce a new game model while capturing a variety of D2Denabled network characteristics.This approach is generic and applicable to various small cell network scenarios.
(ii) Cache Placement Algorithm.We design a multiple-leaders single-follower Stackelberg game to decide the SBS's cache contents.By considering users' external social influences, a single-follower SBS focuses on how to utilize social ties of leaders for efficient multifile dissemination.Most of the current studies ignore the social relations among mobile users.
(iii) Bandwidth Splitting Algorithm.We design a singleleader multiple-followers Stackelberg game to adaptively split the SBS's bandwidth.According to an interactive learning and bargaining process, we model the responsive tradeoff between licensed and unlicensed communications.
(iv) Implementation Practicality.As game players, SBSs and users learn how to modify their prior knowledge and select their strategies with bounded rationality.This approach requires a lower control and computational overhead.It is practical and suitable for real-world network operations.
(v) Solution Concept.The main idea of our dynamic Stackelberg game lies in its responsiveness to the reciprocal combination of optimality and practicality.Instead of analyzing the equilibrium of our dynamic Stackelberg game, the main goal of this study is to investigate the potential benefit gained from practical cooperations of different control methods.The concept of our solution is to approximate the finest solution using local, global, and social learnings and simple bargaining approaches.
(vi) Conclusions.Numerical study shows that our dynamic Stackelberg game approach can increase the bandwidth utilization, cache hit ratio, and system throughput by 5% to 10% under different service request rates, comparing to the existing SADC [13], HSAC [14], and SDWC [15] schemes.
1.2.Organization.The remainder of this article is organized as follows.In the next section, we review some related D2D-enabled small cell network control schemes and their problems.Section 3 presents our Stackelberg game model and the proposed cache placement and bandwidth splitting algorithms in detail.In particular, this section provides fresh insights into the benefits and design of a game-based various learnings and bargaining approaches.For convenience, the main steps of the proposed scheme are then listed.In Section 4, we analyze the performance of the proposed scheme, and numerical results are presented comparing with some existing methods.Finally, the paper is concluded with Section 5.In this section, we also discuss the remaining open challenges in this area along with possible solutions.

Related Work
There has been considerable research into the design of D2Denabled small cell network control schemes.Ma et al. proposed a contract-based cooperative spectrum sharing mechanism to exploit transmission opportunities for the D2D links while maximizing the profit of cellular links [16].At first, they designed a cooperative relaying algorithm that employed superposition coding at both the cellular transmitters and D2D transmitters.This algorithm can maximize the data rate of the D2D links without deteriorating the performance of the cellular links.Secondly, they modeled the spectrum trading process and derived the optimal power-payment contracts for the cellular links.Finally, some numerical results on the performance of the proposed cooperative relaying scheme and optimal contracts were provided [16].
Zhao et al. considered the complex social connections in the social domain and introduced social relationships in the continuum space into the resource allocation problem for D2D communications [17].In order to evaluate the joint optimization performance of social and physical domains qualitatively, they investigated users' payoffs and defined the utility of each D2D communication user.To maximize the social group utility of each D2D user, a social group utility maximization game was formulated, and the Nash equilibrium of proposed game was theoretically investigated.Finally, they demonstrated numerical results, which increased the utility of overall social groups [17].
In the paper [18], a new game theoretic approach was employed to analyze the interactions and correlations among user equipment.And then, an iterative power allocation mechanism was developed to establish mutual preferences based on the nonlinear fractional programing.To match D2D communication pairs with cellular user equipment, the wellknown Gale-Shapley algorithm was adopted; it can obtain a stable and weak Pareto optimal solution.Also, the proposed matching algorithm was extended to address scalability issues encountered in large-scale networks.One main focus of this study was how to establish mutual preferences from the perspectives of energy efficiency.The existence and uniqueness of the Nash equilibrium were analyzed theoretically via mathematical proofs [18].
In [19], authors attempted to analyze the impact of mobile social networks on the performance of edge caching in fog radio access networks.Based on the Markov chain, they analyzed edge caching among edge nodes.The proposed scheme in [19] computed the expectation of bandwidth consumption of radio access networks and fronthaul with edge caching, as well as the corresponding content diffusion ratio in complicated scenarios in social aware fog radio access networks.In [20], authors proposed a hierarchical game framework to investigate the distributed solution of the resource allocation problems in D2D-enabled small cell networks.This hierarchical game was consisting of two subgames: overlapping coalition formation game and Stackelberg game.The subband allocation problem was modeled as an overlapping coalition formation game, in which the D2D links in same subbands acted cooperatively to maximize the payoff sum.The interference control problem was modeled as a Stackelberg game, in which the base station acted as the leader to make decision and the D2D links acted as followers to play the best response after the leader's move [20].
Ma et al. [13] developed a new Socially Aware Distributed Caching (SADC) scheme based on a decentralized learning automaton to optimize the cache placement operation in D2D-enabled cellular networks.The SADC scheme was a new and practical feedback scheme by taking into account three key factors: (i) file request probability, (ii) physical distance between D2D transmitters and receivers, and (iii) social influence.Furthermore, the SADC scheme not only considered the file request probability and the closeness of devices as measured by their distance but also took into account the social relationship between D2D communication users.Finally, they characterized the mutual impact between the contents cached in different D2D users [13].
Zhi et al. [14] designed a novel Hierarchically Social Aware Caching (HSAC) scheme to make mobile nodes cache for others.To address the incentivizing data cache issue, this scheme adopted a hierarchical social aware incentivized caching method based on both physical and social relationships.And then, an incentive method was proposed to ensure the maximization of benefits based on the selfish nature of nodes.In particular, this approach considered the social ties and physical distance as the factors for the cost.Finally, authors showed the existence of Nash equilibrium in this HSAC scheme and demonstrated that it can significantly reduce total cost of mobile nodes [14].
Liu et al. [15] formulated a new Stackelberg based D2D Wireless Caching (SDWC) scheme to solve the interests' conflict in D2D-enabled wireless caching networks.Based on the Stackelberg game, system model was characterized by a hierarchical structure, where the base stations optimized their strategies based on the prices and then the optimal price and optimal power were derived in closed-forms.The optimal price was associated with the channel gain; a high channel gain led to a high price.Finally, the tradeoffs between power and prices were presented in the simulation results [15].Some earlier studies [13][14][15][16][17][18][19][20] have attracted considerable attention while introducing unique challenges in handling the caching and D2D-enabled small cell network control problems.In this paper, we demonstrate that our proposed scheme significantly outperforms these existing SADC [13], HSAC [14], and SDWC [15] schemes.

The Proposed D2D-Enabled Network Control Algorithms
In this section, we provide a brief introduction to our new game model, which forms the theoretical basis of the proposed D2D-enabled small cell network control scheme.By adopting a dynamic Stackelberg game-based approach, we design cache placement and bandwidth splitting protocols to adapt the dynamic changing 5G network environments.(ix)  = {H 1 , . . ., H  , H +1 , . ..} denotes time, which is represented by a sequence of time steps with imperfect information for the dynamic Stackelberg game process.

Dynamic Stackelberg Game
Our dynamic Stackelberg game (G) is a special case of traditional Stackelberg game.To solve the joint problem of cache placement and bandwidth splitting, it is natural that G is designed as a two-stage game approach.Either leaders or followers, all individual game players select their strategies independently and selfishly to maximize their payoffs.At the end of each game iteration, players examine their payoffs periodically and dynamically adapt their decisions in an entirely distributed fashion.During the step-by-step iteration, this feedback process is repeated until the best solution has been found.

Cache Placement Algorithm in Dynamic Stackelberg
Game.Caching technology can cache popular contents to effectively serve UE, locally.Otherwise, UE should download these files via the backhaul.Therefore, using the caching technique, backhaul overhead and access delay can be reduced while improving system performance.However, it is impossible for caching all the files due to the limited cache capacity in each SBS.Therefore, popular contents are carefully cached to achieve an effective content distribution.To collaboratively select the proper cache contents, we model the interaction between each SBS and its corresponding UE as a new dynamic Stackelberg game.
For the cache placement algorithm, we consider a commercial small cell caching system consisting of SBSs and a number of UE.By adopting the multiple-leaders singlefollower Stackelberg game model, a cooperative SBS caching algorithm is developed.Commonly, a practical caching mechanism is coupled with the file placement.In our small cell network architecture, we assume that a multimedia file set M = {M 1 , . . ., M  } consists of  popular files among total  multimedia files, and files in M can be possibly cached in each SBS.The popularity distribution among M is represented by a vector Q = [g 1 , . . ., g  ], which is frequently requested by users.Generally, the vector Q can be modeled by a Zipf distribution, which is a discrete probability distribution commonly used in the modeling of rare events [21].
In this study, we consider the social relations of EU and interactions among SBSs to adaptively obtain the g values in Q.In fact, social characteristics such as the external influence for users' relationships and ties have played a crucial role in information propagation over the Internet and will continue to shape the way information is accessed [22].To exploit the correlation between users' social relations, centrality can be used to identify the most influential users, who may act as a conduit for information diffusion [23,24].In this paper, centrality is considered weightily to estimate g values.At time H  , the M  file's g value in the B  (g M  (B  , H  )) is defined as follows: where  is a weighted average between local and global caching information, and  and  are the upper and lower bounds of  function, respectively.median {⋅} returns the median value of all SBSs.Generally, the most popular files account for the majority of download requests.In the proposed algorithm, social and global properties of small cell network can be used to design a cache placement protocol using Θ(⋅), (⋅), and H(⋅) functions.Finally, the file with a higher g value corresponds to be cached in each SBS from (1) and (3).

Bandwidth Splitting Algorithm in Dynamic Stackelberg
Game.Recently, the traffic offloading technology is introduced to improve the system capacity significantly.It can reduce the amount of data being carried on the cellular bands, freeing bandwidth for other types of UE.However, due to the constraints of the limited bandwidth, bandwidth splitting needs to be carefully studied.In this study, we consider a scenario that the bandwidth is licensed to the SBSs, and they are willing to lease a part of assigned bandwidth to the unlicensed UE for D2D communication.In different aspects of system performance, the unlicensed bandwidth can provide excellent capacity and coverage.Therefore, bandwidth splitting technique carries critical importance for maximizing the total system capacity and QoS satisfaction of UE [9,25].For D2D-enabled small cell networks, we would face a two-tiered network structure where licensed bandwidth (C B × (1 − P B )) is allocated for cellular communications and unlicensed bandwidth (C B × P B ) is allocated for D2D communications.To support this mechanism, each SBS splits the total bandwidth (C B ) for two kinds of communication paradigms [9,25].To cope with the design challenges of bandwidth splitting, we design a single-leader multiplefollowers Stackelberg game model.In this model, traffic offloading through opportunistic communications exploits D2D communications in the unlicensed bandwidth bands.However, the bandwidth splitting problem for D2D communications is generally NP-hard.For this reason, our single-leader multiple-followers Stackelberg game model is reformulated based on the reinforcement learning algorithm with low computational complexity.To fine-tune the system performance, it is a suitable approach.
Under the coexistence situation of cellular and D2D communications, UE should consider using the unlicensed bandwidth for D2D communications or the licensed bandwidth for cellular communications.From a standpoint of UE, two utility functions are defined for cellular and D2D communications.Both of them are formulated by considering the tradeoff between throughput and transmit power.For E  ∈ E, the utility function for cellular communications ( CC E  ) and for D2D communications ( D2D E  ) is given by where B CC , B D2D are the assigned bandwidth channel for E  's cellular and D2D communications, respectively.Ω (Ω ≥ 1) is the gap between uncoded M-QAM and the capacity, minus the coding gain. E  (Ρ) is the E  's signal-tointerference plus noise ratio (SINR) with power vector Ρ for all UE, and P CC E  , P D2D E  are the E  's power level for cellular and D2D communications, respectively.As UE, the goal of each E is to maximize its own payoff by selecting a strategy in S E ; it decides his type, that is, S-UE or D-UE. max From a standpoint of SBS, utility function is designed based on the total system throughput; it is obtained from the sum of cellular and D2D communications in its coverage area.For B  , the utility function ( B  ) is defined as follows: where are the set of S-UE type SBSs and D-UE type SBSs, respectively.To maximize  B , each SBS selects his strategy, which decides the amounts of licensed and unlicensed bandwidth bands.To decide effectively the bandwidth splitting policy (P B ), we develop a new learning algorithm.In the proposed algorithm, learning is divided into two categories: local learning and global learning.Local learning refers to an insight temporal learning in its local SBS, and global learning refers to spatial leaning through neighboring SBSs.The main novelty of proposed bandwidth splitting algorithm is a joint-design manner concerning local and global learning approaches.To specify the global relationship of SBSs, the affinity indicator ( B  B  ) between B  and B  is defined as [26] According to (11), the B  stochastically selects the P B  strategy using his strategy selection distribution (P).Based on the selected P B  strategy, the B  finally attempts to adjust the system performance.By using a simple bargaining process, the B  carefully deliberates on his final decision.In particular, the outcome from the selected P B   strategy is considered as the status quo point of this bargaining process.
At time H −1 , this point is the vector (V H −1 (P B   )) of cellular and D2D communication payoffs achieved by B  with P B   ; that is, where   , which was obtained by the learning process, B  selects finally a strategy to obtain the desirable best solution as follows: At each game round, the bargaining process will be taken sequentially.Through sequential bargaining process, each SBS can improve the unexpected result.In fact, the basic concept of bargaining solution has become an interesting research topic due to its many appealing properties.However, traditional bargaining approach is equal to a random optimization method in a huge space, which converges hardly [10].Therefore, it is impractical to be implemented for real-world network operations.In this study, we effectively implement the bargaining model by adopting the learning process.It is a promising approach for practical network operations and attains better performance under diverse system environments.

Main Steps of Proposed D2D-Enabled Small Cell Control
Scheme.In recent years, special focus has been put on D2D-enabled cellular network system with caching SBSs to maximize the total system capacity while ensuring QoS.This approach is expected to play a crucial role in the 5G networkcontrolled decentralized communications.However, designing a proper combination of cache placement and bandwidth splitting algorithms is a particularly challenging problem.In this paper, we proposed a new dynamic Stackelberg game model, which is implemented as a distributed and dynamic repeated game while SBSs and UE can be leaders or followers dynamically.In the proposed scheme, individual game players can learn locally and globally the current network situation and determine their best strategies to maximize their payoffs through a step-by-step interactive game process.
Generally, well-known solution concepts of game theory are presented in closed-form expressions under the complete information.However, they cannot capture the adaptation issue of 5G network operations over time.In the point view of practical operations, our learning based solution concept is suitable for the dynamic and unknown D2Denabled cellular network environments.In addition, we can transfer the computational burden from a central system to individual SBSs in a distributed online fashion.It is practical for real-world decision making process.The main steps of the proposed scheme are described as follows.
Step 1.Control parameters are determined by the simulation scenario (see Table 1).
Step 2. At the initial time, the L learning values in each SBS are equally distributed.This starting guess guarantees that each SBS's bandwidth splitting strategy enjoys the same benefit at the beginning of dynamic Stackelberg game.
Step 3.For each SBS's cache placement algorithm, a multipleleaders single-follower Stackelberg game model is adopted.As leaders, UE freely requests multimedia files to their corresponding SBSs.As a follower, each SBS monitors the most popular files.
Step 4. To adaptively select caching files, each SBS estimates files' popularity distribution.By considering social characteristics of UE, each SBS calculates each file's g value using (3) and ( 5), which include social and global properties of small cell network.
Step 5.In each SBS, the files with higher g values are cached in a distributed manner.At each game period H, files' g values are dynamically recalculated to cat up with the current network environment.
Step 6.For each SBS's bandwidth splitting algorithm, a single-leaders multiple-follower Stackelberg game model is adopted.As followers, UE decides their communication ways using ( 5) and (6).Based on  CC E and  D2D E utility functions, UE selects their best strategies to maximize their payoffs.
Step 7. As a leader, the SBS splits the total bandwidth (C B ) for cellular and D2D communications.To maximize the payoff ( B ), each SBS adaptively selects his splitting strategies P B ∈ S B  .
Step 8.During each game period (H), SBSs learn the global relationship by using the affinity indicator and estimate the learning values L(⋅) for each strategy P B based on local and global viewpoints.Finally, the probability distribution (P B ) for each P B strategy selections is obtained according to (8), (9), and (11).
Step 9. Based on the selected P B strategy, the status quo point (V) is obtained using (12).And then, each SBS ultimately determines his final strategy based on the bargaining process.
Step 10.Based on the interactive feedback process, the dynamics of our Stackelberg game can cause a cascade of interactions among SBS and UE, locally and globally.As game

W
Predefined power level for D2D communications players, they dynamically choose their best strategies in an online distributed fashion.
Step 11.Under the dynamic D2D-enabled cellular network environment, individual game players are constantly selfmonitoring for the next game process; go to Step 3.

Performance Evaluation
In this section, we evaluate the performance of our proposed protocol and compare it with that of the SADC [13], HSAC [14], and SDWC [15] schemes.Based on the simulation results, we confirm the superiority of the proposed approach.
To ensure a fair comparison, the following assumptions and system scenario were used: (viii) For simplicity, we assume the absence of physical obstacles in the experiments.
To demonstrate the validity of our proposed method, we measured the bandwidth utilization, cache hit ratio, and system throughput.Table 1 shows the system parameters used in the simulation.Major system control parameters of the simulation, presented in Table 1, facilitate the development and implementation of our simulator.Figure 1 compares the bandwidth utilization of each scheme.In this study, the bandwidth utilization is measured as the percentage of actually used bandwidth, and it is a key factor to estimate the resource usability in D2D-enabled cellular network systems.All schemes exhibit a similar trend.However, the proposed scheme outperforms the existing methods from low to high traffic load distributions.As The proposed scheme The SADC scheme The HSAC scheme The SDWC scheme The proposed scheme The SADC scheme The HSAC scheme The SDWC scheme  a leader in our single-leader multiple-follower Stackelberg game model, each SBS negotiates the amounts of licensed and unlicensed bandwidth bands to improve communication capacity and adaptively splits the available bandwidth.This method can improve the bandwidth utilization compared to other schemes.
Figure 2 presents the cache hit ratio for each scheme.During the small cell network operation, there is no doubt that cache-enabled SBSs can improve the system capacity and save the backhaul resource obviously.Generally, cache hit ratio increases with the service generation rate, which is intuitively correct.Based on our cache placement algorithm, the SBS monitors intelligently every requested file from UE and picks up the most popular files to cache these files.Therefore, the proposed scheme attains superior cache hit ratio to other schemes, from low to high traffic load intensities.
The curves in Figure 3 indicate the normalized system throughput in the D2D-enabled cellular network system.As game players, all UE adaptively selects their communication paradigm in a distribution online manner.From the viewpoint of SBSs, the main goal is to maximize the total system throughput.According to an interactive feedback mechanism, SBSs effectively learn the current system environments and attempt to improve the system throughput.In particular, social, local, and global learning methods can effectively solve the cache placement and bandwidth splitting problems.At every game period, our combined learning approach can provide synergistic and complementary features to adapt dynamic network situations.
The simulation results shown in Figures 1-3 demonstrate that the proposed scheme, which uses a learning based dynamic Stackelberg game model, can monitor the current D2D-enabled small cell network conditions and adapt to highly dynamic system situations.In particular, all SBSs and UE in our approach gain real-time information from the current environment and make intelligent decisions in a selfadapting manner.The simulation results indicate that the proposed scheme attains an attractive network performance, something that the schemes cannot offer.

Summary and Conclusions
In the 5G cellular network, small cell networks will be a prevailing and promising trend.In this paper, we research on a new D2D-enabled small network control prototype within caching strategy.The motivation of this research is to explore jointly the cache placement and bandwidth splitting problems.According to the learning based game approach, we formulate the dynamic Stackelberg game model and configure appropriate cache-file placement and bandwidth splitting algorithms to maximize the system capacity.Nowadays, the game theoretic approach is widely recognized as a practical perspective for the implementation of real-world network operations.Based on the multiple-leaders singlefollower Stackelberg game, our cache placement algorithm can cache the most popular files to relive the heavy traffic load at fronthaul links while decreasing the request latency.Based on the single-leader multiple-follower Stackelberg game, our bandwidth split algorithm investigates the cellular and D2D communications to leverage the balanced system performance.Using feedback based self-monitoring and social, local, and global learning techniques, SBSs and UE dynamically adapt to the current D2D-enabled small cell network situation and effectively maximize their expected benefits.We verify the effectiveness of our proposed scheme using extensive simulations.Simulation results are presented to show the superiority of our scheme.
For the future research, the open issues and practical challenges exist.Interesting topics for further research include employing other game theory models like mechanism design and cooperative game models to further improve the system performance.Another interesting direction is to address the optimality issues in the D2D-enabled small cell network system from the operator's perspective.In addition, our methodology can be used to develop new adaptive game theoretic algorithms.Control decisions in interprocess communication, disk and memory management, file and I/O systems, CPU scheduling, and distributed operating system also must be made without perfect information.Therefore, the main concept employed in this study is suitable to develop various real-time control decision algorithms.

(
vii) L(P, B) is the learning value for the B's strategy P; L(P, B) is used to estimate the probability distribution (P B ) for the next bandwidth splitting strategy selection.(viii)In {U B , U E }, U B is the payoff received by B and U E is the payoff received by the E during the D2D-enabled small cell network operation.

( i )
The simulated system consists of 50 SBSs, and the number of UE is 1000.The bandwidth capacity (C B ) of each SBS is 100 Gbps.(ii) According to the UE's characteristics, service requests are generated based on the Poisson process, which is with rate  (services/s), and the range is varied from 0 to 3. (iii) There are 8 different service requests.They are randomly generated from UE. (iv) In order to represent various application services, eight different traffic types are assumed based on connection duration and bandwidth requirement.They are generated with equal probability.(v) The durations of service applications are exponentially distributed.(vi) The splitting strategies in S B  are defined as P B min=1 = 0.1, P B 2 = 0.2, P B 3 = 0.3, P B 4 = 0.4, and P B max=5 = 0.5.(vii)System performance measures obtained on the basis of 100 simulation runs are plotted as functions of the service request generation rate.

Figure 1 :
Figure 1: Bandwidth utilization of the network system.
At the bandwidth spitting phase, each SBS investigates the underlaid D2D communications and splits the bandwidth for licensed and unlicensed communications services to improve communication capacity.In this case, a traditional singleleader multiple-follower Stackelberg game model is suitable; the SBS is a leader and UE is follower.In our single-leader multiple-follower Stackelberg game model, different UE is in different situations.There are two types of UE, that is, S-UE and D-UE, in the SBS coverage area.S-UE can connect to the SBS straightly with licensed bandwidth, and D-UE communicates each other without traversing the SBS with unlicensed bandwidth.As game players, SBSs and UE select their strategies to maximize their payoffs based on the interactions of feedback mechanism.At each time period of gameplay, we formally define our dynamic Stackelberg game model G = {{B, E}, C E}, B = {B 1 ⋅ ⋅ ⋅ B  } represents a set of SBSs and E = {E 1 ⋅ ⋅ ⋅ E  } is the set of UE; they are game players.(ii) E = {E 1 ⋅ ⋅ ⋅ E  } can be divided into two subsets E = I S ∪ I D ; I S is the subset of S-UE, that is, E  1≤≤ ∈ I S ⊂ E, and I D is the subset of D-UEs, that is,  means the th bandwidth splitting ratio for unlicensed D2D communications.If the SBS selects the P B  strategy, the [P B  × C B ] bandwidth amount is assigned to D2D communications and the remaining bandwidth amount [(1 − P B  ) × C B ] is assigned to cellular communications.
Model.During the D2Denabled small cell network operation, SBSs and user equipment (UE) make control decisions individually while taking considering their mutual relationship.This situation is wellsuited for study using game theory.In this paper, we develop a new dynamic Stackelberg game model for each SBS and its corresponding UE.This game procedure consists of two phases.At the cache placement phase, each individual SBS observes the file request frequency of its corresponding UE and deploys file placement in the limited cache size.In this case, we can assume that users are multiple leaders, and SBS is a single follower, who keeps track of the availability of the cached content.Therefore, a multiple-leaders single-follower Stackelberg game is an appropriate model.B , {(S B  , S B  ), S E }, L(P, B), {U B , U E }, } as follows:(i) In {B, (vi) S E is the strategies of E ∈ E. E decides his type, that is, S-UE or D-UE, for his communications.
X(M  , B  ) is the set of UE, who requests M  file in the B  .D(E  ) is the number of first degree social friends of E .M is the maximum number of D(⋅) in B  .In(1), the (M  , B  , H  ) function represents the skewness of the M  distribution in B  ; a higher (⋅) outcome corresponds to a higher file reuse.To adaptively obtain the (⋅) outcome, we concentrate on the notion of global aware networking, which attracts significantly the social and behavioral communities.By considering neighbor SBSs' file request situations, each SBS learns the global trends of UE's propensity.Finally, (M  , B  , H  ) is given by (M  , B  , H  ) = (max {min (W (M  , B  , H  ) , ) , })(3)s.t., W (M  , B  , H  ) = ( × H (M  , B  , H −1 )) + ((1 − ) × ∑ −1 =1, ̸ = H (M  , B  , H −1 ) / ( − 1)) median {H (M  , B, H −1 )} ,(4)where H(M  , B  , H −1 ) function returns the g value of M  file in the B  at time H −1 .If the M  file was not in M at time H −1 , H(M  , B  , H −1 ) function returns zero.
CC B  (P −1 ) are  CC B  and  D2D B  values with the P B   strategy at time H −1 .Based on the selection strategy P , H

Table 1 :
System used in the simulation experiments.