Intelligent Forwarding Strategy for Congestion Control Using Q-Learning and LSTM in Named Data Networking

,


Introduction
e rapid development of the Internet has led to a significant increase in the amount of content transmitted every year.However, these changes have been difficult to adapt to because the current Internet architecture depends on IP addresses and is designed for end-to-end communication. is drawback causes problems such as transport efficiency and security.
Information-centric networking (ICN) was proposed as a solution to the problem caused by the rapid increase of content.e goal was to change the communication paradigm from a content-oriented model to the IP-based model [1].Named data networking (NDN), one of the most wellknown ICN architectures, is attracting attention as a hotspot for research [2][3][4].
NDN is a future network architecture that alters the current IP-based Internet as the Internet environment changes, replacing IP addresses with named content for communication.Compared with traditional TCP/IP, it has the following new features in the transmission method.First, NDN communication is a consumer-driven pull mode and is connectionless.e consumer sends an interest packet to request the content, and the producer with the requested content returns the matched data.e second is a multisource feature.NDN has a content store (CS), where the returned content can be temporarily stored in the intermediate nodes in a network.erefore, the consumer can receive the requested data from multiple sources, including the CS of the intermediate node and the producer with the original data.
ird, NDN has multipath features, so it supports dynamic multipath forwarding.
e NDN node provides multiple paths from the consumer to sources via a forwarding information base (FIB) that stores the interface information where the packet can go to next.It then decides how to use the path provided through a forwarding strategy.Although this change solves the limitations of the current Internet to some extent, if NDN is applied to a service such as video streaming, congestion may occur at a node, where data is concentrated when people are crowded during a certain period.erefore, congestion control is one of the major research tasks of NDN.
Congestion control of NDN has been proposed by applying the TCP/IP method.TCP/IP congestion control detects congestion via the retransmission timeout (RTO) and adjusts the sending rate via an additive increase/multiplicative decrease (AIMD) window-based mechanism.However, congestion detection through RTO is not a reliable indicator in NDN, where different round-trip times (RTTs) are measured for each source as it has a multisource feature.Furthermore, the window control method targeting a single path of TCP/IP is not suitable for NDN due to its characteristics of multiple sources and multiple paths.e reason is that when a consumer receives data from two sources through different paths, if one path is congested and the consumer reduces the window size, the throughput of the other path that is not congested also decreases.As such, the direct application of existing solutions does not adequately consider the characteristics of NDN, so network congestion control methods must also change.erefore, it is necessary to propose a new congestion control method for NDN.
In this paper, we propose an intelligent forwarding strategy for congestion control using Q-learning and LSTM in named data networking (IFS-QLSTM) using a dynamic forwarding method to utilize multiple paths.First, the IFS-QLSTM uses the LSTM model to train the number of entries that change due to packets added to the pending interest table (PIT) in the NDN node (we use the term PIT entry rate interchangeably in the rest of the paper).Second, the PIT entry rate predicted by the trained LSTM model is used for the reinforcement learning to judge the congested node.e node is then bypassed and the packet is forwarded.e rest of this paper is organized as follows.Section 2 explains the background of NDN and related research.Section 3 describes an intelligent forwarding strategy for congestion control using Q-learning and LSTM in NDN.Section 4 presents the performance evaluation and analysis of the results through simulation.Finally, Section 5 concludes the paper.

Related Works
In recent years, NDN has been studied as a future network architecture that will replace the current Internet.One of the core technologies of NDN architecture is congestion control.We survey related studies in two aspects: (1) studies on control of the interest sending rate for congestion control and (2) studies on adaptive forwarding strategy [5][6][7].
Researches on the interest sending rate for congestion control include a receiver-based window control method and a hop-by-hop interest sending rate control method.In [8], the authors describe a receiver-based window control scheme that controls the interest sending rate by adjusting the congestion window using a TCP-like mechanism in the receiver of RTT.Similarly, both ICTP and CCTCP use a method of adjusting the congestion window based on RTT [9,10].However, the NDN caches data through the CS added to the router, which causes the RTT to change irregularly.In addition, when a consumer requests data from multiple sources and one source is congested, the consumer reduces the window size. is means that the amount of transmission to the source where congestion does not occur is also reduced.
erefore, the traditional receiver-based window control congestion control method is not suitable for NDN.In [11], the authors demonstrate a representative hop-by-hop method that detects congestion in intermediate nodes and adjusts the interest sending rate using interest shaping.Wang et al. [12] proposed a method that improves [11] by adding NACK feedback to inform the downstream nodes of congestion.
A forwarding strategy can dynamically select one or more interfaces in the FIB to forward the interest packet.e BestRoute strategy forwards the interest packet using the path available at the lowest routing cost [13].In [14], the authors propose a forwarding strategy based on calculating the weight value of the number of pending interests corresponding to each output interface of FIB.In [15], the authors design adaptive forwarding to retrieve data through optimally performing paths, quickly detect, and recover from packet transmission problems.In [16], the authors propose an adaptive SRTT-based forwarding strategy (ASF).
e ASF periodically measures the SRTT of an adjacent node at each node, arranges the transmittable nodes based on this, selects the node with the lowest SRTT, and transmits the interest packet.If a problem such as a timeout occurs, the node in which the problem occurs is penalized and sent to the end of the sequence.
In this paper, we design an intelligent forwarding strategy for congestion control using Q-learning and LSTM in NDN.In the first phase, we predict the change in the PIT entry rate in the next time step through time series prediction based on a pretrained LSTM model.In the second phase, based on the predicted PIT entry rate, an appropriate alternative route is selected through Q-learning in congestion situations.

Basic NDN Forwarding Mechanism.
e NDN node is composed of three elements: PIT, CS, and FIB.
e PIT records where the interest packet originated from when it came into the node and tells where to return the data packet when it comes in.CS is a place to temporarily store data and is a feature of the NDN nodes.FIB is a place where nodes that can go to each prefix are recorded, and when an interest packet comes in, it searches for the prefix and informs the path to go next.
Figure 1 illustrates the forwarding process of NDN.When an interest packet arrives, the NDN node first searches for the CS and then returns it to the incoming interface if there is matching data.If not, it goes to PIT and lookup.If duplicated data is already requested in the PIT, the path on which the interest packet came in is added.However, if not, 2 Mobile Information Systems it is recorded in the PIT and sent to the FIB.Finally, in FIB, if there is a node that can search for the name of a received packet and transmit it, it transmits it to the optimal path according to the forwarding strategy.However, if there is no transmittable node, the packet is discarded.Next, when a data packet arrives at the NDN node, it first searches the PIT and checks whether there is a request for the received data.If there is a request, it returns through the recorded reverse path, and if there is no request, the incoming data packet is discarded.Data that comes in before being transmitted over the reverse path is retrieved from the CS, and if there is no cached data, it is stored in the CS so that it can quickly respond to the next request.

Proposed System Model.
e system model of an intelligent forwarding strategy for congestion control using Q-learning and LSTM in named data networking is shown in Figure 2. When the NDN node receives the interest packet, it checks whether there is a matching name in CS and PIT, and if not, the FIB searches the outgoing interface and forwards it to the interface chosen by the forwarding strategy.As shown in Figure 2, the IFS-QLSTM proceeds in the same way up to the PIT but shows the difference in the forwarding strategy to bypass the congested nodes.First, the PIT entry rate of neighboring nodes is predicted through pretrained LSTM using the PIT entry rate of the nodes obtained from the data packet.After that, congestion is detected using the predicted value as the state of Q-learning, and an appropriate alternative path is selected as the action and forwarded.

Pretrained LSTM.
NDN's PIT is a place to record the incoming interface of the received interest packet, so it can predict the amount of returned data.Since it changes with time, it can also be viewed as time-series data.us, if we train using the LSTM model, a deep learning that is widely used for predicting time series data, we can predict the new PIT entry rate in the next time interval.Based on this data, it is possible to know the arrival rate of data packets, and the congestion can then be forecast in a timely manner.
In advance, the PIT entry rate for each node is measured and normalized to use as an input to the LSTM model.en, as shown in Figure 3, we train the LSTM model to predict time t + n+1 by inputting time t through t + n.Finally, the trained model is saved and used to predict the next time step PIT entry rate of the neighboring nodes.

Q-Learning Structure
3.4.1.State.Reinforcement learning agents must be given enough information to accurately know their current state.However, in the case of the Q-learning used in this paper, if you use too much state to generate the q-table consisting of states and actions, it may cause problems with the q-table by becoming too complicated.erefore, it has to choose an appropriate state variable that can represent the current state.In this paper, it shows the two following state variables: First, it is necessary to know where to make a decision, so the current node that has received the interest packet is set as the state.Second, to know the congestion condition of the nodes that can be transmitted by the current node, the predicted value of the PIT entry rate of the transmittable nodes using a pretrained LSTM model is set as the state.Based on these two states, it is possible to know where the agent is currently located and the congestion condition of the neighboring nodes.

Action.
Since the IFS-QLSTM is a method of transmission by selecting an appropriate path for a congestion condition, when the NDN node receives an interest packet, one of the neighboring nodes that can be transmitted is selected as an action.

Reward.
Since the reward is an indicator of the direction of training, the definition of reward is important in reinforcement learning.erefore, to train in the desired direction, it is necessary to define a reward suitable for the training direction.us, the reward is defined as follows: Mobile Information Systems where N represents a node, and α, β, and c are the weight values for controlling the throughput, packet loss, and RTT, respectively.roughput represents the number of packets processed per second by node N, and packet loss represents the number of packets discarded per second by node N. RTT represents the time when a packet is transmitted and received by node N. We thought that if we set only the packet loss as a reward, it may be trained not to consider packet transmission time or throughput, although congestion paths were well avoided.erefore, we designed the reward in the direction of increasing packet throughput and reducing packet loss and RTT while avoiding congested paths.

Q Value Update.
In this paper, the Q value was updated every second.e update formula of the Q value is the general Q-learning update formula as shown in Equation (2).Q(s, a) represents the Q value when action A is performed in state S. e value of r is the reward when action A is taken in state S. e discount factor, c, is a number between 0 and 1 which has the effect of valuing rewards received earlier as higher than those received later. (2) 3.5.Q-Learning-Based Forwarding Strategy.Figure 4 shows the Q-learning packet transmission process when an interest packet is received by the NDN node.When the interest packet arrives, the NDN node first checks the CS and PIT for a matching name; if a matching name does not exist, it looks up the name in the FIB.If there is a matching name in the FIB, the PIT entry rate of the nodes corresponding to the matching name (transmittable nodes from the current node) is predicted using the pretrained LSTM.If not, the interest packet is discarded.After that, it is forwarded to the most optimal path through Q-learning.Specifically, the predicted PIT entry rate and the current node are used as the state of Q-learning to obtain the Q values of the transmittable nodes from the q table.Next, a random value between 0 and 1 is selected, and if it is less than the current epsilon value, the reinforcement learning agent selects the exploration method.e exploration method selects a random node among the remaining nodes except for the node with the highest Q value and forwards the interest packet.e reason for the exploration is that as the path that was not good in the past may improve, always making the optimal decision may not be good for reinforcement learning training, it is a method used to gain various experiences.Next, if the random value is greater than the epsilon value, the exploitation method is selected.is method selects the node with the largest Q value among the transmittable nodes in the q table and forwards the interest packet.In this way, exploration and exploitation are performed according to the epsilon value, but if the exploration is excessive, the performance is reduced, so the epsilon value is set to decrease over time.

Simulation and Analysis
4.1.Simulation Environment.In this section, we implemented by using the open-source ndnSIM [17,18], an NS-3 based simulator that was developed for NDN.We then evaluated the performance of the IFS-QLSTM through simulation results.Two evaluation metric criteria were selected to quantitatively evaluate the effectiveness of our method.
e first criterion was the rate of InData as an indicator for evaluating the utilization of the bottleneck links and alternate links.InData represents the amount of incoming data in the node and guarantees that this amount of data packets was actually transmitted during the congestion.
e second criterion is the packet drop rate.If the packet drop rate of IFS-QLSTM is low, it can be seen that IFS-QLSTM effectively mitigates packet dropping.
e topology used in the experiment is shown in Figure 5.In the topology, the consumer (Node0) forwards an interest packet, and the producer (Node8) returns data matching the requested interest packet.e link bandwidth and delay in this topology are set to 10 Mbps and 10 ms, respectively.In our experiment, we cause congestion by setting a specific link bandwidth as low as 1 Mbps according to the requirements of various congestion scenarios.
Next, the Q-learning parameters of the IFS-QLSTM are as follows.First, a random variable (between 0.0 and 1.0) was assigned for comparison with epsilon.
e epsilon value, which determines exploration and exploitation, decreased with time until it reached 0.01.e discount factor, which is the weight to control the future compensation compared to the current compensation, was set to 0.9.In the case of LSTM, Adam was used as the optimizer, and the learning rate was set to 0.001.We chose BestRoute and ASF because BestRoute is a basic NDN forwarding method used as a comparison algorithm in many papers, ASF is a more advanced forwarding algorithm, and the main reason is that both methods are verified algorithms.erefore, we simulated them and compared them with the IFS-QLSTM.e graph in Figure 7 shows the average of the data packets received per second from the consumer in the three cases of Figures 6(a)-6(c).
e IFS-QLSTM showed almost similar performance to that of ASF and a 17.3% higher data receiving rate than the BestRoute.e graph in Figure 8 is the average of the total packet drops in Figures 6(a)-6(c).Since there are 35,750 packets transmitted, ASF, BestRoute, and the IFS-QLSTM show packet drop rates of 0.07%, 15.9%, and 0.09%, respectively.Like the data receiving rate, the packet drop rate is similar to ASF and is 15.81% lower than BestRoute.
In detail, looking at the data rates in Figures 6(a)-6(c), you can see how each method transmits the packet.In the case of ASF, the SRTT of the adjacent nodes is measured periodically, so it quickly detects bottleneck links, finds alternate links, and sends packets to show a high InData rate.In the case of BestRoute, an alternative route is selected only when the FIB is updated, but because the update is not performed frequently or is not performed at the optimal time, packets are transmitted through the bottleneck link to show a low InData rate.Finally, the proposed method has a slightly lower initial InData rate because it transmits even paths with a low Q value due to exploration at the beginning.However, through reward, the model trains the PIT entry rate that does not cause the packet drop and the appropriate 6 Mobile Information Systems amount of transmission according to the PIT entry rate for each node.rough this, the packet is properly divided into a bottleneck path and an alternate path and transmitted.erefore, it shows an InData rate similar to ASF.In addition, looking at the packet drop rate in Figure 6, BestRoute cannot find an alternative path, resulting in high packet drops on the bottleneck link.On the other hand, in the ASF and IFS-QLSTM, a packet drop occurs briefly at the beginning, and a packet drop does not occur after finding an alternative path.
e link delay is commonly set to 10 ms.erefore, as shown in Figures 9(a)-9(c), the bottleneck links exist no matter which path from the consumer to producer is selected.e graph in Figure 10 shows the average of the data packets received per second from the consumer in the three cases of Figures 9(a)-9(c).IFS-QLSTM showed 15.3% and 21.1% higher data rates than ASF and BestRoute.e graph in Figure 11 is the average of the total packet drops in Figures 9(a)-9(c).Since there are 35,750 packets transmitted, ASF, BestRoute, and the IFS-QLSTM show packet drop rates of 14.7%, 18.8%, and 0.16%, respectively.In the case of this experiment, IFS-QLSTM shows overall higher performance than ASF and BestRoute.
In detail, by looking at the InData rate and the packet drop rate in Figures 9(a 8 Mobile Information Systems of ASF, unlike previous cases, it shows poor performance.e reason is that if the adjacent nodes have the same SRTT, the path is not updated in time, and thus packets are transmitted over the bottleneck link.erefore, it shows a low InData rate.Unlike the previous case, many packet drop rates occur in the bottleneck link because the alternative path cannot be found properly.In the case of BestRoute, as before, due to the slow FIB update, a low InData rate and a high packet drop rate are shown.In the case of IFS-QLSTM, as described above, since an alternative path is selected and transmitted according to the PIT entry rate of the neighboring node, the stable packet transmission is shown even in the bottleneck link.erefore, by achieving a high InData rate and low packet drop rate, we prove that the performance is more effective than those of ASF and BestRoute.

Conclusions
In this paper, we propose IFS-QLSTM, an intelligent forwarding strategy for congestion control using Q-learning and LSTM in named data networking.e proposed method first trains the LSTM model using the PIT entry rate which can be used as a congestion detection indicator by knowing the amount of data to be returned in the future.After this step, Q-learning detects the congestion of the adjacent node through the PIT entry rate predicted by the trained LSTM model and forwards it to the appropriate path.As a result of the simulation, it was verified that IFS-QLSTM has a high data rate and low packet drop compared to BestRoute and ASF by selecting the bottleneck link and the alternative link well and transmitting the packet.erefore, it is shown that the proposed method is efficient and reliable.is suggests that there is potential for it to be used as an effective congestion control algorithm for applications to which NDN will be applied in the future.
Future work will focus on evaluating our approach in various topologies and linking it with window-based congestion control algorithms. is approach will lead to improving the congestion control performance of IFS-QLSTM.

Data Availability
e data used to support the findings of this study have not been made available because this work has been supported by the Korean government and the data cannot be publicly open.Mobile Information Systems Figure 3: Pretrained LSTM

Figure 11 :
Figure 11: Comparison of the average packet drop between IFS-QLSTM and ASF and BestRoute.
Figure 2: System model of the IFS-QLSTM.