MEC-Based Cooperative Multimedia Caching Mechanism for the Internet of Vehicles

Multimedia applications are expected to widely deploy over vehicular networks. In order to meet the low-latency and high-speed transmission requirements of multimedia applications, edge caching is introduced to reduce the network tra ﬃ c and the transmission delay. Due to the limited storage of the edge cache server, an e ﬃ cient approach for the content management plays a decisive role for the edge cache performance. This paper proposes a vehicle-to-infrastructure-based cooperative caching mechanism for Internet of Vehicles to improve the edge cache utilization. The system model is established with the goal of maximizing the cooperative caching hit rate. To jointly consider the collaborations between macrobase stations (MBS) and multiple roadside units (RSU), we propose a reinforcement learning algorithm to adaptively control the cache management. According to the content popularity and the network status, the proposed algorithm can dynamically adjust cached content across relevant MBSs and RSUs. The simulation results show that the proposed cooperative caching mechanism signi ﬁ cantly improve the cache utilization and the quality of services.


Introduction
With the rapid development of the Internet of Vehicles (IoV) and 5G communication technology, a large number of multimedia applications, such as traffic video processing, in-vehicle infotainment, and transportation environment monitoring, are emerged to enrich the intelligent transport system [1][2][3].
To provide high quality of services for multimedia applications in IoV, mobile edge computing (MEC) has attracted the attention as an emerging technology to improve system performance, resource utilization, and reduced transmission delay [4][5][6]. By introducing computation and storage capabilities to network edge nodes, such as roadside units (RSU) and base stations, the transmission pressure on the core network can be effectively relieved, and at the same time, the content transmission delay can be reduced [7][8][9][10]. However, the limited cache space of edge nodes, the time-varying content popularity, the high speed of vehicles, and the constant change of the IoV topology are challenging for the edge cache performance. It is necessary to design an efficient management strategy to efficiently manage the edge cache [11][12][13].
Currently, there are numerous studies conducted on the subject of edge cache management for IoV. Huang et al. [14] proposed a cache location selection mechanism based on the vehicle trajectory, which can effectively reduce the system load and cache energy consumption. Shi et al. [15] proposed a deep learning communication model based on multimodel compression, which exploited the redundancy between deep learning models in different scenarios to accelerate content transmission in edge networks. In [16], a mixed integer nonlinear programming method was proposed to minimize the cooperative delay between edge servers, and the Lyapunov optimization method was used to optimize the delay problem. In [17], the authors comprehensively considered the mobility of vehicles and proposed an edge caching scheme with perceptible mobility probability. By dividing the data into data blocks of different sizes and buffering these data blocks in the edge server close to the vehicle, the overhead and transmission delay of backhaul traffic were reduced. Meng et al. [18] studied the cache service strategy of offline networking in the edge computing environment and proposed a cache storage algorithm on node core. In [19], a new information-centric heterogeneous network framework was designed, using a distributed algorithm with alternating-direction multipliers to solve the problem of cache resource allocation. In order to further reduce the transmission delay and improve the response rate, the authors [20] proposed a cooperative cache allocation and calculation offload scheme, and the MEC servers were cooperated to perform calculation tasks and data caching. With the rapid development of artificial intelligence, the deep reinforcement learning [21] has been widely used in edge caching and resource allocation of vehicle networks with its unique perception and decision-making capabilities [22,23].
The main contributions of this paper are as follows: to take full advantage of edge cache resources, a hierarchical cooperative architecture, including MBSs, RSUs, and vehicles, are introduced. We establish a Markov decision model based on the proposed architecture to describe the cooperative caching process. We propose a reinforcement learning cache management algorithm, which follows the Deep Deterministic Policy Gradient (DDPG) scheme. The proposed algorithm has fast convergence rate and can self-adapt to the complex network environment.
The rest of this paper is organized as follows. Section 2 presents the system model for cooperative edge caching. Section 3 discusses the proposed cooperative caching mechanism. The experiment settings and result analysis are presented in Section 4. Finally, in the Section 5, we discuss concluding remarks and our future work.

Cooperative Edge Caching Model.
In order to make full use of the storage of MBSs, RSUs, and vehicles, we construct a three-layer cooperative cache architecture, as shown in Figure 1. The core layer includes MBSs and the backhaul network, and the MBSs are connected to the RSUs through wired links. For the cooperative RSU layer, it consists of RSUs distributed in different areas, and the RSUs communicate through wireless links. The vehicle layer includes vehicles running in different areas. MBSs, RSUs, and vehicles have storage to temporarily buffer certain amount of content. Initially, the vehicle sends a content request. If the vehicle itself has the content, it will obtain directly from its cache. If not, it will send the request to the local RSU. If the local RSU does not store the content, the local RSU queries the cooperative RSUs. If neither the local RSU nor the cooperative RSUs have the content, the request is sent to the core layer.
MBS is responsible for collecting system status information, controlling global resource management, and content caching decisions. Compared with obtaining content from a remote server, the cooperative caching model can effectively reduce the transmission delay and transmission cost. The set of RSUs can be expressed as R = f1, 2, 3, ⋯, Rg. V = fv 1 , v 2 , ⋯, v N g represents the set of vehicles under the coverage of the RSU. The RSU is responsible for collecting relevant information of vehicles under its own coverage area and uploading to the MBS.

Content Delivery Model.
In the multilevel cooperative edge caching model, the vehicle v i can send content requests to the RSU or adjacent vehicles. Vehicles within the coverage area of one RSU use the same frequency band to communicate, and it is leading to interference between vehicles. Therefore, the transmission rate from RSU r to vehicle v can be obtained from Shannon's formula as where b r,v represents the channel bandwidth allocated by the RSU r to the vehicle v, B R represents the channel bandwidth of the RSU r, and p r is the transmission power of the RSU r. h r,v is the channel gain between the RSU r and the vehicle v, and σ 2 represents noise power. [24].
Orthogonal frequency division multiple access (OFDMA) is used between MBSs and vehicles. Vehicles associated with the MBS are assigned an orthogonal subcarrier, and the transmission rate from the MBS to vehicle v i is where B m is the channel bandwidth of the vehicle, and p m represents the transmission power of the vehicles. h m,v is the channel gain between the vehicle v and the MBS, and σ 2 represents the noise power [25].

Content Popularity Model.
Assuming that there are K content requests, then the request probability of these K contents are P 1 , P 2 , P 3 , ⋯, P K , and the probability obeys the Zipf distribution [26]. The relationship between the content request probability and the content popularity level can be expressed as [27] where s represents the content popularity level, and θ is the Zipf impact factor, also known as the popularity slope. If θ is getting larger, the distribution of Zipf is steeper, and the popularity tends to be concentrated [28,29]. The value of the Zipf factor depends on the users' behavior. The relationship between the request probability and popularity level can be further expressed as Wireless Communications and Mobile Computing Figure 2 shows the relationship between the popularity level and the request probability. It can be seen that the influence of the popularity inclination on the request probability distribution. The content with high request probability only accounts for a small part of all content [30].

Cooperative Caching Mechanism
3.1. Problem Model. The cooperative caching is able to theoretically achieve a high cache hit rate than the noncooperative caching. We use a binary variable where the binary variable δ n,i ∈ f0, 1g indicates whether the RSU n is cooperated with the RSU i . Therefore, the average cooperative cache hit ratio of the system can be expressed as For RSU i , the size of the cache space is S RSU i , and then the optimization problem of maximum average cooperative cache hit rate can be expressed as the following: Regarding to the vehicle cache, the cache hit rate of the vehicle j can be expressed as Therefore, the average cache hit rate for all vehicles is expressed as The size of the cache space of the vehicle j is S v j . Under the limitation of the cache space, the problem of the maximum average cache hit rate of vehicles can be expressed as the following: Maximizing the cache hit rate of the system is to maximize the average cache hit rate of the cooperative caches and vehicles, as the following form: 3.2. Cooperative Caching Algorithm Based on DDPG. To solve the optimization problem in the previous section, it is necessary to build the Markov decision process for the cooperative edge caching scenario. The Markov decision process is a tuple including state, action, and reward. The components are defined as follows: the system state at each time t is defined as s t = ½S MBS , R * S RSU , V * S v , q t , that is, at time t, the cooperative cache space, cache state information, vehicle cache information, and vehicle content request. The action space at each time t is defined as a t = ½a 0 , a 1 , a 2 , a 3 , where a 0 represents the content cached in the MBS, a 1 means the content cached in the RSU, a 2 represents the content cached in the vehicle itself, and a 3 is for the content cached in the randomly.
The system joint reward function R ðtÞ is expressed as RðtÞ = CP∑∑ v ð hðtÞ + h j ðtÞÞ, where C is the characteristic constant, and hðtÞ + h j ðtÞ is the average cache hit rate of the cooperative cache and the average cache hit rate of the vehicle j at time t, respectively. P represents the penalty coefficient given by the vehicle.
The system block diagram of the cooperative edge caching algorithm is shown in Figure 3. The environment consists of an actor network, a critical critic, and an experience replay memory. The actor and critic network are both composed of two different deep neural networks. The online network is used 1: Initialize Actor online network parameters θ Q , Critic online network parameters θ μ , experience replay memory M 2: Initialize Actor target network parameters θ Q′ , Critic target network parameters θ μ′ 3: Initialize caching state of RSUs, MBS and Vehicles, content popularity 4: for episode =1, M do 5: Environment state space initialization, initialization system cache hit rate 6: Randomly choose action N as action exploration 7: for t =1,2,3,…,T do 8: Select action a t = μðs t , θ μ Þ + N t according to observed state s t and current strategy 9: Calculate reward RðtÞ based on current selection action a t and state s t , update state s t ⟶ s t+1 10: Update the reward RðtÞ = CP∑∑ v ð hðtÞ + h j ðtÞÞ and store ðs t , a t , RðtÞ, s t+1 Þ in M 11: Randomly sample N samples from the experience replay memory M,  for actions, and the target network is used for the evaluation of actions. The agent receives the environmental state information and executes the corresponding the action. Algorithm 1 shows the flow of the cooperative edge caching algorithm [16]. First, it initializes the network parameters of the actor network, the critic network, and the experience replay memory. After the parameter initialization is completed, the agent obtains the environmental state information and makes a decision for the content caching. The agent receives immediate reward feedback from the system, and the system enters the next new state. The agent store the current status information into the experience replay memory for future training.

Experiment Results and Analysis
In the simulation environment, the cache capacity of the MBS is 15 TB, and the coverage radius is 2 km. The RSU   [15]. The detailed settings of parameters are shown in Table 1.
To verify the performance of the cooperative edge caching strategy, we compare the cooperative caching scheme with noncooperative caching scheme under the parameters setting of Table 1. The learning process of cooperative and noncooperative caching strategies is shown in Figure 4. The average reward of cooperative caching rapidly increases to 170 after 50 episodes and then gradually stabilizes. For noncooperative caching, with the increase of training episodes, the average reward has stabilized around 100. The cooperative caching strategy not only make the most use of the cache space but also effectively improve the system performance. The advantage of noncooperative caching is that it does not need to consider the content caching status of other edge servers, and the system complexity is low. Figure 5 shows the comparison of the cache hit rate for different caching strategies. With the continuous increase of training times, the cooperative cache hit rate obtained by the system is stable above 85%. For the noncooperative caching, the system cache hit rate is roughly 10% lower than the cooperative caching. When the training reaches 400 rounds, the hit rate of the noncooperative cache gradually tends to 75%. The reason for the gap is that the noncooperative caching cannot make full use of the cache space, which causes a waste of storage and a low system caching hit rate.
The relationships of the system caching hit rate under different Zipf distributions is shown in Figure 6. When the Zipf distribution parameter is large, it indicates that the vehicle users have more requests for the content with high popularity. Caching the high popularity content is beneficial to the improvement of the system cache hit rate. When the content requests increase, the noncooperative caching is difficult to meet the vehicle requests. The proposed cooperative cache strategy fully considers the cooperation between RSU and MBS, and the system cache hit rate increases significantly. Figures 7 and 8 show the comparison of the average cache hit rate of different algorithms under the different schemes. With the increase of training times, the average caching hit rate of MBS gradually tends to a stable value above 80%. For the (deep Q network) DQN algorithm, the caching hit rate fluctuates greatly in the first 250 episodes, because DQN is difficult to deal with the complex state information. For the average cache hit rate of RSU, the DDPG algorithm has better performance in the first 250 episodes. After 250 training episodes, the effect is slightly lower than that of the (policy gradient) PG and the DQN algorithm. Figure 9 presents the cooperative cache performance under the different numbers of vehicles. Compared with PG and DQN based algorithms, the DDPG-based algorithm can bring better benefits to the system and tend to be stable when dealing with the complex environment. At the same time, it also verifies that the DDPG-based algorithm has unique advantage for improving the overall average hit rate.

Conclusions
This paper focus on the improving of cache performance in the IoV environment and proposes a V2I-based cooperative cach-ing strategy. We propose MBS-RSU-vehicle three layer architecture and model the problem as maximizing the average cooperative cache hit rate. The objective function is solved by using the reinforcement learning algorithm based on DDPG. In order to verify the performance of the proposed cache strategy, the effects of cooperative caching and noncooperative are compared under different system parameters. In future work, we will further consider the content transmission delay and use game theory to solve the problem of resource competition between cooperative cache servers.

Data Availability
The data used to support the study are available within the article.

Conflicts of Interest
The authors declare that they have no conflicts of interest.