Optimal Bandwidth Scheduling of Networked Learning Control System Based on Nash Theory and Auction Mechanism

This paper addresses the optimal bandwidth scheduling problem for a double-layer networked learning control system (NLCS). To deal with this issue, auction mechanism is employed, and a dynamic bandwidth scheduling methodology is proposed to allocate the bandwidth for each subsystem. A noncooperative game fairness model is formulated, and the utility function of subsystems is designed. Under this framework, estimation of distribution algorithm (EDA) is used to obtain Nash equilibrium for NLCS. Finally, simulation and experimental results are given to demonstrate the effectiveness of the proposed approach.

It is notable that most of the aforementioned researches are focused on single-layer network structure; few results have been reported on NCSs with double-layer structure.As pointed out by [21,22], a networked learning control system (NLCS) with double-layer structure can obtain better control performance and stronger robustness.Nonetheless, in real-time NLCS with limited network resources, random network-induced delay may have a significant impact on the performance and stability of the system [23].The bandwidth availability is the major concern in many networking problems.A good schedule gives an appropriate resource allocation to network nodes and reduces packet collision.The performance of network applications is directly affected by the amount of available bandwidth and the sampling rate [24].Therefore, the quality of service (QoS) and the quality of control (QoC) depend not only on the control algorithm and the system structure but also on the allocating and scheduling of the network resources.A more optimal allocation of the network bandwidth is the key to improve QoS and QoC.Due to the above discussion, bandwidth scheduling and optimizing is studied, based on the structure of the NLCS, in this paper.
We know that if each node is allowed to occupy network resource as much as possible according to its own requirements, the overall system performance will be very poor [25].The limitations of the network resources cause the competition among many network nodes.These network nodes will maximize their own network utility, which forms a noncooperation game among them [26].To meet the demands of individual and entirety in the NLCS, a network bandwidth scheduling strategy based on a noncooperation game model and auction mechanism is introduced.The auction mechanism will be used at the high-layer level in centralized control method, and the control system performance will be estimated by the cumulative amount of the output errors of the closed control loops.Considering the difficulty of the precise calculation, EDA is applied to obtain an optimal solution.
The remainder of this paper is organized as follows: Section 2 describes NLCS and scheduling optimization schemes.Section 3 explains the auction-based methodology of bandwidth allocation and presents a fair noncooperative game model for scheduling in NLCS.Section 4 shows how to use the EDA-based optimization algorithm to find the equilibrium point of the game model.Simulation and experimental results are provided in Section 4. Finally, Section 5 concludes the results of this study.
Notation.Throughout this paper, R  denotes the -dimensional Euclidean space.The superscript "" denotes matrix transposition.And the notation Z + stands for the set of nonnegative integers.

System Structure.
A typical double-layer NLCS structure is shown in Figure 1 [21,22].Suppose that the number of closed-loop subsystems in the lower layer communication network (LLN) is  and there are two data transfer nodes (the sensor node and the controller node) in every closedloop subsystem, the number of the total data transfer points will be  = 2.In this figure, C, A, and S represent th controller, actuator, and sensor, respectively.The actuator is event driven, and the controller and sensor are time driven.The sampling period is longer than the polling period of the system.The data (sampling data, node identities, and others) are packed by sensors and sent to the controller.Then, such data will be used to calculate the control commands in the controller, which will be packed and sent to the actuator, then the actuator will update the control algorithm based on the new control commands.At the same time, the upper layer communication network (ULN) collects data of the control performance of the subsystems from the LLN controllers through shared network (LAN, WAN, or Internet) and then optimizes the sampling period and control parameters based on a self-adaptive and scheduling algorithm.
The timing diagram of bandwidth allocation is illustrated in Figure 2. The controllers report their bandwidth requirements to the learning and scheduling center through the ULN during   (the polling period).Then the center collects information and allocates bandwidths during   (the allocation period).According to the given bandwidth and the scheduling algorithm, all control loops send data during   (the transmission period).After   (the total operating period), a new period begins.

The Controlled Plants.
The dynamics of the remote controlled plant is given by the following linear model: where  in the networked subsystem.The sensor is time driven, that is, at time instant ℎ, it sends the most recent motor state and its timestamp to the controller via the network.Note that (1) can be considered as discretized from a continuous-time system given by with the sampling period ℎ and The electromechanical dynamics of the networked dcmotor subsystem used in this paper can be described as where   is the armature winding current;  is the rotor angular speed;  is the armature winding input voltage;  is the armature winding resistance;  is the armature winding inductance;   is the back-electromotive-force-(EMF-) constant;   is the torque constant;  is the system damping coefficient;  is the system moment of inertia.The parameters of the dc-motor subsystem are listed in Table 1.
By letting   ≜ [  , ]  and the rotor angular speed  = , the dc-motor subsystem as Loop 1 can be expressed by

NLCS Performance Analysis.
The performance of the NLCS can be demonstrated jointly by the subsystems, and the performance of the subsystem can be described by a function, such as the integral absolute error (IAE) [27].The system error will be greater if some control loop is disturbed, indicating that the sampling frequency should be raised to a degree that enables the system to return to the equilibrium point rapidly.If the error is smaller, no higher sampling frequency is needed.Figure 3 shows the cumulative error of a rotor speed for a broad range of sampling rates from ℎ = 0.05 ⋅ ⋅ ⋅ 0.5 s.Integral absolute error = ∫ , where   is the evaluation time interval.As it can be seen in Figure 3, the relation between control performance and a range of allowed periods can be approximated by a linear relationship [28] as (7).Consider the following where   and   are specific for each control loop and can be determined prior to system run time.Due to the calculation of the gradient in solving optimization problems,   could be ignored.
For each control loop, the relation between sampling period   and allocation bandwidth   is given by [29] where   is the control time of the control loop (which may contain data exchange from sensor to controller and from controller to actuator, as well as the time spent to execute the controller); hypothetically   is within a sampling period.On one hand, the QoC of each subsystem should be improved, on the other hand, the demands of network bandwidth of each subsystem should be reduced to provide more resources for extra subsystems.Due to the bandwidth where To deal with the accidental overload and avoid the sudden deterioration of the system performance, the scale factor  is introduced to retain part of the bandwidth. min  and  max  can be calculated based on the resource constraints of the control loop and the maximum allowable delay bound (MADB) [30].

Main Results
The game model based on auction mechanism is discussed in this section.Auction theory is first proposed by Vickrey in 1961, which mainly includes four basic types: English auction (ascending-bid auction), Dutch Auction (descending-bid auction), and first price auction and second price auction (Vickrey auction).The highest offer will win the bid no matter what type of auction is used, and the optimum outcome is that the price exactly equals what the second highest bidder can afford.
Remark 1.It is worth noting that the first two kinds of auction mechanism are bidding open, and the others are sealed.In sealed auction, each bidder submits his price without any information of others, then the auctioneer announces the winner who offers the highest bid.Both of them are suitable for bandwidth allocation in NLCS.The only difference between first and second price auction is the actual price that the winner paid.For second price auction, the winner just needs to pay the second highest bid of others instead of his own.For ease of operation, first price auction is applied in this paper.
The closed-loop control subsystem is modeled as a player.Every player is not explicitly aware of the existence of other players and their status.Each player puts forward its own bandwidth strategy denoted as   .Obviously, the bandwidth consumption of each player will affect the bandwidth allocation of other players, but it is impossible to improve the working performance by only increasing the bandwidth requirement.So, a noncooperative game is formed among the closed-loop control subsystems in NLCS.In the game, ULN plays the roll of auctioneer, and every player will pay the bandwidth based on their own bidding strategy and the final bandwidth they have got.The bidding and auction process is shown as follow.
Step 1.Every player has the same amount of money  before every round of bidding begins.
Step 2. Every player submits the price   (  ∈ [0, ]) based on their own bandwidth requirement.
Step 3. ULN will run the bandwidth allocation programs based on the bidding prices and allocate the bandwidth resources to each player.
Step 4. Every player will pay for the bandwidth they have got.
The price of bandwidth   defined as   which player  needs to pay can be calculated from where  is the price of unity bandwidth.The revenue function of every player can be described as follow: Further, the revenue function based on auction theory and IAE evaluation method is shown as (12): Obviously, the revenue of the player is determined by the received QoC and the payment under certain QoS.Every player will make the bidding strategy based on their own revenue and will not increase the price aimlessly.If a player overly spends its money for a much larger bandwidth than its actual need, it will only get a low payoff due to the reduced network utility.The money each player has to pay is based on the bandwidth they have got, which may not match the initial offer.
Every player in LLN wants to maximize their revenue under the framework of NLCS.ULN will make a network resources allocation strategy which makes it impossible to get more bandwidth resources through changing the offer.Generally speaking, the network utility of a single node depends on the scheduling strategy of others.If there is no node chooses other scheduling strategies when the scheduling strategies of other nodes that are decided, this equilibrium will not be broken in the network.This equilibrium is called Nash equilibrium (NE) [31].Thus, the allocation problem of the network resources can be transformed into the solution of the Nash equilibrium with a noncooperation game model.
To address the modelling problem, we introduce the following definitions to prove the existence and uniqueness of Nash equilibrium in NLCS.
(2)   () is continuous in  and quasi concave in   .
Proof.Theorem 4 can be obtained using the classical Kakutani fixed-point theorem and in some sense generalizes Nash's setting on the strategy space of the players.For the proof of this theorem, please refer to [32]. .Thus, the first condition is satisfied.It remains to prove that the revenue function   () is quasi concave in   for all  in NNCG.
The first-order differential of the revenue function   (⋅) is The second-order differential of the revenue function   (⋅) is Known from the above equations, the revenue function is continuous and differentiable in  and concave in   .A concave function   () is quasi concave too.This completes the proof of the theorem.Theorem 6.The NNLCG has a unique equilibrium.
Proof.The proof of the unique equilibrium can be carried out by following similar lines as in the proof of Theorem 2 in [33] and is thus omitted here to avoid unnecessary repetition.
Up to now, the problem of network resources allocation under a general framework of double-layer NLCS has be changed into the problem of Nash-equilibrium-point solving in the noncooperation game model.It is hard to solve the Nash equilibrium point by the traditional numerical method.So, estimation of distribution algorithm (EDA) is introduced in this paper to find the Nash equilibrium point.The EDA algorithm used in this paper is described as follows.
Rule 1. Initialization: generate gain candidates meeting the ratio of bandwidth (RoB) randomly to form an initial population.
Rule 2. Repeat the following steps until the termination criterion is met.
(a) Selection: select the best gain candidates from the parent generation.(b) Updating: update using the selected promising gain candidates.(c) Sampling: generate gain candidates meeting RoB based on the updated; copy the best gain candidate in the current population to the next population.
For more details about EDA, please refer to [34,35], and the reference therein.

Experiments and Result Analysis
In this section, an illustrative example is presented to show the effectiveness of the proposed method.To this end, let us consider a double-layer NLCS as shown in Figure 1.The considered NLCS has three different networked dc-motor subsystems, where one takes the form ( 5) and the models of the rest two subsystems are listed in Table 2. Apparently,  = 0.1,   = 0.5 s,   = 0.005 s,   = 0.03 s, and the rest of   is   .The allocation period   is very short compared to   so that   can be assumed to be zero.The auction-based bandwidth allocation (ABA) has an initial population of 100 solutions and ten generations.When the overall fitness value is stabilized, the Nash equilibrium point is reached.The bandwidth vector allocated by the ULN in this case is  * = [0.220.37 0.31]  .To illustrate the improving level of QoC and the bandwidth occupied, we compare ABA with equal bandwidth allocation (EBA), which is a simple static bandwidth allocation method.And the ratio of bandwidth given to player  can be quickly computed by   = (1 − )/.The IAE and the RoB of two different strategies are given in Table 3. Accordingly, the step responses of three networked dc-motor subsystems with given corresponding RoB are shown in Figure 4.
As shown in Table 3 and Figure 4, the control performance of networked subsystem degrades when network conditions become worse (given less RoB).This phenomenon is reasonable, since each dc-motor subsystem performance worsens with longer time delays and more packet losses.It is notable that some loop has higher value of IAE in ABA than EBA; the ABA still gives the better overall performance according to the simulation results.So this equal bandwidth allocation method may be ineffective, since some loops may not receive enough bandwidth according to their demands, while others may receive unnecessary bandwidth compared to them.
Figure 5 shows the box-and-whisker diagram of system revenue over 30 paired simulation.And the statistics of the 30 simulation results are listed in Table 4.The same network condition is used in the simulation of networked system using the ABA and the EBA.Note that the 30 paired runs used 30 different network conditions and they were generated randomly under the given network constraints.Obviously, the overall payoff of ABA is higher than EBA.Based on the mean value of system payoff using ABA and EBA (denoted as  ABA and  EBA , resp.), as listed in Table 4, we can further conclude that, comparing with EBA on the average, ABA method increases the system payoff by 21.56% [( ABA −  EBA )100%/ EBA ].
Motivated by the results, we found that auction-based bandwidth allocation that optimizes resource scheduling strategy can effectively meet the desired objectives in the resource-constrained NLCS.

Conclusion
This paper presents a noncooperation game model based on Nash theory and auction mechanism for bandwidth allocation in NLCS with limited resources.And the estimation of distribution algorithm is introduced to solve the problem effectively.The proposed method forces all players sharing the same network to have allocated bandwidths at Nash equilibrium point.Network resources are allocated in the optimum way to reduce delays and packet losses, and the overall performance of systems with communication constraints is significantly improved.The simulation and experiment results indicate the effectiveness and availability of the proposed approach.

Theorem 5 .
A Nash equilibrium exists in the NNLCG,  = (Γ, {  }, {  (⋅)}).Proof.By Theorem 4, we know that there exists a Nash equilibrium in NNLCG when the conditions in Theorem 4 are satisfied.Each player in NNLCG has a strategy space [ min  ,  max  ],  min  > 0, and  min  ≤  max

Table 1 :
The parameters of the dc-motor subsystem.

Table 2 :
Parameters of the rest two subsystems.

Table 3 :
Simulation results of comparison test.

Table 4 :
Statistics of the 30 simulation results.