UCB-Based Route and Power Selection Optimization for SDN-Enabled Industrial IoT in Smart Grid

As an essential building block for smart grid, the industrial internet of things (IIoT) plays a significant role in providing powerful sensing capability and ubiquitous connectivity for differentiated power services. +e rapid development of smart grid imposes higher data monitoring and transmission requirements in terms of delay and energy efficiency. However, due to the severe electromagnetic interference (EMI) caused by massive electrical equipment, the transmission performance of IIoT becomes inferior. +e traditional single-hop transmission mode evolves towards a multihop cooperation mode to satisfy differentiated quality of service (QoS) requirements. In this paper, we propose an upper confidence bound(UCB-) based joint route and power selection optimization algorithm to support multihop cooperation mode evolution, which adopts a software-defined networking(SDN-) enabled IIoTnetwork framework to simplify network configuration andmanagement. Compared with existing local-sideinformation-based route selection (LSI-RS) and random route selection (RRS) algorithms, simulation results demonstrate that the proposed algorithm has superior performances in total delay, energy efficiency, and utility.


Introduction
e industrial internet of things (IIoT) is an essential building block for smart grid, which has powerful sensing capability and ubiquitous connectivity. With the development of smart grid, a large number of IIoT devices need to be deployed to collect information such as voltage, current, power, temperature, and humidity and transmit the information back for real-time analysis. IIoT has strict requirements on transmission delay, energy efficiency, and network coverage [1]. However, electrical equipment in smart grid emits electromagnetic interference (EMI), which affects the transmission performance of IIoT. erefore, the traditional single-hop transmission mode needs to evolve towards a multihop cooperation mode to satisfy the quality of service (QoS) requirements [2]. IIoT utilizes massive devices laid in different routes to form a mesh network for multihop transmission. In multihop transmission, dynamic route selection can avoid worse routes with long distance and low quality and enable IIoT to reduce transmission delay and enhance energy efficiency.
Routing selection needs to be optimized according to the dynamic network environment. However, the traditional network architecture with tight coupling between control and data planes cannot adapt to complex IIoT application scenarios. Software-defined networking (SDN) provides a solution by separating the control plane from the data plane [3]. SDN can manage and control IIoT networks through a standard and open programmable interface, which supports more efficient and flexible route selection solutions [4]. However, the research on route selection optimization for SDN-enabled IIoT in smart grid still faces many challenges, which are summarized below.
First, considering the highly time-varying channel states and complex EMI, the global state information (GSI) is incomplete [5]. Traditional GSI-based route selection optimization cannot be applied. Second, IIoTdevices based on battery have strict requirements on energy efficiency. Improving energy efficiency via dynamic power selection not only makes route selection more complicated but also possibly leads to larger transmission delay. erefore, how to meet differentiated QoS requirements through joint route and power selection optimization is also a challenge. Finally, electric equipment such as inverters and insulation switches emits EMI [6,7], which greatly reduces QoS performance and brings severe challenges for joint route and power selection optimization.
Route selection of IoT has always been a research hotspot. In [8], Desuo et al. employed an improved Dijkstra algorithm to find the shortest path between two consecutive points for IoTnetworks. In [9], He et al. proposed an energy-aware route selection algorithm for simultaneous information and power transfer to decrease energy consumption. However, these works do not consider SDN architecture and only consider a single QoS metric. In [10], Saha et al. proposed a traffic-aware QoS route selection scheme by exploiting the flow-based nature of SDN and obtained the optimal route based on Yen's K-shortest path algorithm. In [11], Li et al. proposed an SDNenabled IoT adaptive transmission architecture for different delay flow situations. However, these works assume that perfect GSI is available, which is not applicable for smart grid with incomplete information.
Upper confidence bound (UCB) as a reinforcement learning algorithm has emerged as a powerful solution to address problems without perfect GSI [12]. In [13], Sun et al. designed an energy-aware mobility management (EMM) scheme based on UCB to optimize energy consumption. In [14], Maghsudi and Stanczak proposed two joint power and channel selection strategies based on UCB to maximize energy efficiency. However, these works only consider energy consumption optimization, which ignore delay and other QoS requirements. In [15], Zhao et al. proposed a delay minimization algorithm based on UCB, but neglected the joint optimization of energy consumption and delay. In [16], Bae et al. proposed a downlink network routing algorithm based on UCB to jointly optimize throughput and delay, but ignored the influence of complex EMI and service priority. Moreover, all the abovementioned works do not consider the impact of complex EMI and service priority of smart grid on the joint optimization of route and power selection.
To address the abovementioned challenges, we propose a UCB-based joint route and power selection optimization algorithm. Firstly, considering the influence of EMI, we construct an SDN-based multihop IIoT framework and formulate the joint route and power selection optimization problem. e objective is to maximize the overall network utility function under the threshold constraints of signal-tointerference-plus-noise ratio (SINR) and energy efficiency. Second, we model the joint optimization problem as a multiarmed bandit (MAB) problem, where the options of route and power are combined to form an arm. Finally, we utilize UCB to learn the optimal route and power combination based on local and historical information. e main contributions of this work are summarized as follows: (i) We propose an SDN-enabled multihop IIoT framework for smart grid, which greatly simplifies network management through separating control and data planes. In addition, the control plane also supports the configuration of intelligent route and power selection algorithms.
(ii) e route and power options are combined to form a set of arms in MAB. e proposed algorithm dynamically learns the optimal combination by interacting with the environment. (iii) rough dynamically adjusting the values of weight parameters, the proposed algorithm can satisfy differentiated QoS requirements of smart grid by adjusting the tradeoff between delay, energy efficiency, and service priority. e rest of this paper is organized as follows: Section 2 describes the system model and problem formulation. e proposed joint route and power selection algorithm is introduced in Section 3. Section 4 provides simulation results. Finally, the conclusion is provided in Section 5.

System Model
In this section, the system model and the problem formulation are introduced.

Network Model of SDN-Enabled IIoT for Smart Grid.
e SDN-enabled IIoT for smart grid is shown in Figure 1, which consists of two planes, i.e., the data plane and the control plane [17]. e data plane mainly contains IIoT devices which provide data forwarding services. e control plane mainly contains the SDN controller, which locates in the gateway. e SDN controller can obtain IIoT network topology, learn the optimal route and power selection strategy, and send the strategy to the IIoT source device (SD) [18]. e SDN-enabled IIoT network topology is represented by a directed graph G � (V, L) [19], where V denotes I IIoT devices. e set is defined as V � v 1 , . . . , v i , . . . , v I . v 1 and v I are the SD and destination device (DD). v i , i � 2, 3, . . . , I − 1, is the relay device. L denotes physical links, and the set is defined as is the set of devices connected with v i . ere exist M routes between v 1 and v I , and the set is represented as F � f 1 , . . . , f m , . . . , f M . Each route consists of K devices, which are SD v 1 , DD v I , and K − 2 relay devices.
e set of devices in f m is denoted as In this paper, the set of T time slots is represented as e slot length depends on the transmission delay from SD to DD [20]. At the beginning of the t-th slot, v 1 generates a data packet of size U t , U min ≤ U t ≤ U max , which needs to be transmitted to v I . Each data packet can only be transmitted in one route [21]. e transmission is unsuccessful if the delay exceeds τ.

Delay Model.
We assume that the data packets are transmitted by wireless channels. We denote P h as the transmission power, which contains H levels.
e set of transmission power levels is given by e achievable transmission rate from d k m to d k+1 m is given by where g e transmission delay from d k m to d k+1 m and the total forwarding delay on the route f m are given by We denote the route selection variable x t (m) ∈ 0, 1 { }. x t (m) � 1 represents that v 1 selects f m ; otherwise, x t (m) � 0 [23]. e total forwarding delay is given by

Energy Efficiency
Model. e energy consumption for data packet transmission from d k m to d k+1 m and the total energy consumption on route f m are given by We define c t (m, h) as the energy efficiency of data packet transmission on route f m with power P min + ((h − 1)(P max − P min )/(H − 1)) in the t-th time slot, which is given by erefore, the total energy efficiency is given by

Problem Formulation.
Since the data packets have different QoS requirements, the service priority needs to be taken into account. We use η t to represent the priorities of different data packets. We define the overall network utility function related to the total forwarding delay, service priority, and total energy efficiency as where V is the weight used to balance the order of magnitude.  Wireless Communications and Mobile Computing erefore, the objective is to maximize Φ by optimizing the route and power selection strategies. e optimization problem is formulated as x t (m) � 1, ∀t ∈ T, where SINR min and α min represent the thresholds of SINR and energy efficiency, respectively. C 1 is the route selection constraint; i.e., each data packet can only select one route. C 2 is the power selection constraint; i.e., each data packet can only select one power level. C 3 is the transmission power constraint. C 4 is the SINR constraint. C 5 is the energy efficiency constraint.

UCB-Based Route and Power Selection Optimization for SDN-Enabled Industrial IoT in Smart Grid
It is impractical to obtain the perfect GSI due to the dynamic network topology and complex EMI, and IIoT devices should optimize route and power selection based on the local-side information. MAB is an effective solution to solve decision-making problems with incomplete information [24]. In each slot, the decision maker pulls an arm. en, the pulled arm generates a reward. e decision maker's goal is to maximize the cumulative reward. We transform P1 into an MAB problem. e decision maker, arm, and reward are modeled as follows: (i) Decision maker: the decision maker generates the decision. In this paper, we define the SDN controller as the decision maker.
, satisfy C 3 and C 4 , If SINR d k m ,d k+1 m (h) ≥ SINR min and α t ≥ α min , the reward is D t (m, h)). Otherwise, the reward is zero.
We propose a UCB-based joint route and power selection algorithm for SDN-enabled IIoT in smart grid to address the MAB problem. UCB is a low-complexity learning-based algorithm to balance exploitation and exploration [25]. e proposed algorithm enables the SDN controller to take action based on local state information such as delay. Afterwards, the obtained reward and updated state information is perceived by the SDN controller for the next selection. e implementation of the proposed algorithm is shown in Figure 2.
e proposed algorithm consists of three phases, which is summarized in Algorithm 1.
(i) Phase I: x t (m), y t (h), θ t (c m,h ), and n t (c m,h ) are initialized as zero. When t ≤ |C|, the controller sequentially selects each arm and obtains the initial value. (ii) Phase II: based on (12), the preference of the SD towards arm c m,h in the t-th slot is given by where θ t−1 (c m,h ) is the average reward of c m,h up to the (t − 1)-th time slot. n t−1 (c m,h ) is the number of times to select c m,h . ω is the weight of exploration. e second term allows the controller to explore arms with selections to improve estimation and to focus on the exploitation when arms have been estimated enough. After obtaining θ t (c m,h ), the selected arm is given by c m * ,h * represents SD selects f m * and P h * , which is given by (iii) Phase III: the controller observes delay and energy efficiency performances as well as service priority. en, θ t (c m * ,h * ) is updated as (11). Accordingly, θ t (c m,h ) and n t (c m,h ) are updated as Finally, the algorithm terminates until t > T.

Simulation Results
In this section, we firstly introduce the simulation parameter setting. en, the simulation analysis is described.

Simulation Parameter Setting.
In this section, we evaluate the proposed algorithm through simulations. e 4 Wireless Communications and Mobile Computing considered IIoT route topology is shown in Figure 3, which includes 9 IIoTdevices and 6 routes. v 1 and v 9 are the SD and DD, respectively. e distances of adjacent devices on each route are shown in Table 1. In the case of large-scale fading, the channel gain is calculated according to g i,j (t) � 127 + 30 log(r i,j ) [26], where r i,j is the distance between v i and v j . e EMI varies from 28 dBm to 30 dBm. e service priority η t is set as [0.1, 0.2, 0.3, 0.4, 0.5]. e setting of simulation parameters is summarized in Table 2 [27,28]. We consider two existing algorithms for comparison. e first one is the UCB-based route selection algorithm named UCB-RS [29]. e other one is the shortest route selection algorithm named SRS [30]. Both UCB-RS and SRS neglect the optimization of power selection. Figure 4 shows the average utility versus time slot. Compared with UCB-RS and SRS, the simulation result demonstrates that the proposed algorithm improves the performance of utility by27.28% and 37.31%, respectively. e reason is that the proposed algorithm jointly optimizes the route and power selection. In contrast,  (1) Input: ω, η t , D t (m, h) , c t (m, h) .

Wireless Communications and Mobile Computing
UCB-RS neglects the power selection. SRS always selects the shortest route, which cannot overcome the adverse impact caused by the dynamic change of channel state, thereby performing the worst. Figure 5 shows the average delay versus time slot. e simulation result shows that the proposed algorithm outperforms UCB-RS and SRS by 17.93% and 23.17% in delay performance, respectively. Both UCB-RS and SRS do not take the optimization of power selection into consideration, which result in worse delay performance. Figure 6 shows the average energy efficiency versus time slot. Compared with UCB-RS and SRS, the proposed algorithm improves the performance of energy efficiency by 28.61% and 51.15%, respectively. e proposed algorithm can select suitable power to optimize energy efficiency. Figure 7 shows the ratio of optimal route selection versus time slot. SRS performs the worst. e reason is that the proposed algorithm and UCB-RS can dynamically adjust the route selection strategy. However, SRS always selects the shortest route fixedly and cannot get rid of the adverse impact of EMI. Figures 8-10 show the average energy efficiency, average delay, and average utility versus SINR min . With the increase of SINR min , the energy efficiency and delay of the proposed algorithm decrease, while the utility increases Path   Figure 3: IIoT route topology.  first and then decreases. When SINR min � 14 dB, the utility reaches the maximum value. e performance of UCB-RS fluctuates. e reason is that the proposed algorithm can learn to optimize power selection to meet more stringent SINR constraint. UCB-RS neglects power selection, which makes it difficult to adapt to different SINR constraints. Figure 11 shows the impact of V on the delay and energy efficiency. As V increases, the proposed algorithm lays more emphasis on energy efficiency rather than delay. e proposed algorithm can dynamically balance the tradeoff between energy efficiency and delay. Moreover, the simulation results provide a reference for the setting of the weight V.

Conclusions
In this paper, we proposed an UCB-based joint route and power selection optimization algorithm for SDN-enabled IIoT. e proposed algorithm can effectively optimize route and power selection strategies based only on local information and historical observations. It can provide a lowcomplexity route and power selection strategy while maximizing the overall network utility. Simulation results show that the proposed algorithm has superior performances in delay, energy efficiency, and utility. Compared with existing LSI-RS and RRS algorithms, the proposed algorithm reduces the delay by 17.93% and 23.17%, improves the utility by 27.28% and 37.31%, and improves the energy efficiency by 28.61% and 51.15%. In the future, we will use deep reinforcement learning to optimize the multidimensional resource allocation in SDN-enabled IIoT.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.  Wireless Communications and Mobile Computing