Adaptive VR Video Data Transmission Method Using Mobile Edge Computing Based on AIoT Cloud VR

Aiming at the high requirements of cloud service-based virtual reality in AIoT for data transmission rate and delay sensitivity, a cloud VR system scheme based on MEC (Mobile Edge Computing) is proposed, which mainly incorporates viewpoint-based VR video data processing and hybrid digital-to-analog (HDA) transmission optimization and can be served for AIoT transmission ﬁ led. Firstly, a learning-driven multiaccess MEC o ﬄ oading strategy is designed, in which the VR terminal automatically selects the optimal MEC server for task o ﬄ oading, thereby e ﬀ ectively improving network e ﬃ ciency and reducing service delay. Secondly, the progressive transmission of the VR data is realized through viewpoint-aware dynamic streaming based on RoI (region of interest) and the priorities of di ﬀ erent objects. The transmission priority of each object in the scene is determined through the ROI layering, which e ﬀ ectively solves the contradiction between the large data volume in the VR scenes and the network bandwidth limitation when applied in AIoT domain, and further improves the real-time performance of the system. Then, the HDA (hybrid digital-analog) technique is introduced to optimize the transmission. Finally, the base station protocol stack is modi ﬁ ed on the basis of the LTE (Long-Term Evolution) system, and the MEC technology is integrated to realize a complete cloud VR system in AIoT. The experimental results show that compared with other advanced schemes, the proposed scheme can achieve more robust and e ﬃ cient data transmission performance and provide better VR user experience.


Introduction
Recently, it is well known that VR (virtual reality) systems combined with AIoT, incorporating multiple technologies such as digital image processing, computer graphics, multimedia, computer simulation, sensors, and computer networks, have a wide range of applications in many fields such as entertainment, simulation training, aerospace, scientific and computing visualization, art, [1]. VR has three most prominent features: immersion, interactivity, and imagination, which are also known as the 3I characteristics of VR [2] and can realize human-computer interaction based on these characteristics. Cloud VR is a kind of real-time VR technology where can be used for amount data transmission in AIoT based on cloud computing, in which cloud servers are used to replace users' local computing devices, which promotes the popularization of VR applications [3]. The cloud VR architecture consists of four parts: the content layer, the platform layer, the network layer, and the terminal layer. Among them, the content provided by the platform layer to the content layer includes cloud VR video services and cloud VR strong interaction services and is responsible for video import, transcoding, storage, broadcast control and distribution processing, logical calculation, and realtime rendering for strong interaction services. The network layer consists of four parts: backbone network, MAN (metropolitan area network), access network, and home network, which meet cloud VR's requirements for large bandwidth and low latency. Finally, the terminal layer is connected to the platform layer by accessing 5G/Wi-Fi to realize functions such as VR content presentation and user authentication [4]. Cloud VR Solution Architecture is shown in Figure 1.
However, due to the huge amount of VR video data, in addition to cloud computing and rendering, network transmission bandwidth and delay limitations have become new bottlenecks for the entire system [5]. For basic 4 K resolution cloud VR services, the network bandwidth needs to reach at least 40 Mbit/s, and the RTT (round trip delay) of transmission should be controlled within 40 ms to provide users with a good viewing experience [6]. In the current mobile network architecture, the distance between the user and the server is at least at the metro distance level. Regardless of device forwarding and image transmission, the RTT of fiber transmission only is as high as 30-40 ms, which is difficult to meet the requirements of cloud VR [7].
With the development of 5G technology and the need of AIoT, the bandwidth of mobile networks has greatly increased. By sinking computing nodes to the vicinity of the application scenario (gateways), MEC puts the data collection and analysis operations close to the user side and realizes the closed loop of data processing on the edge, which can alleviate the network transmission pressure and shorten the data processing time [8]. Under the MEC architecture, deploying VR applications on MEC nodes will bring the following advantages: (1) the MEC computing nodes are directly deployed close to the mobile gateways, which can reduce the number of hops for network transmission between VR application data and end users, and reduce network processing delay [9]; (2) the VR application runs on the MEC node, which can realize local data processing. For the users under the same UPF (User Plane Function), the data does not need to enter the Internet, which can reduce the transmission pressure on the Internet [10]; (3) edge processing of VR content, for example, in VR live broadcast scenarios, the process of VR video splicing, encoding and transcoding, and distribution can be performed directly on the MEC nodes, data offloading can be realized for local users nearby, and OTT (Over the Top) users or users covered by other MEC nodes can be quickly distributed through CDN (content delivery network) [11]; and (4) the distributed networking of MEC can realize continuous VR experience in mobile scenarios, such as watching VR live broadcasts or participating in video conferences on high-speed mobile carriers (such as cars and high-speed trains) [12].
In addition, MEC can work with low-latency application-layer protocols such as QUIC (Quick User Datagram Protocol Internet Connection) and RTP (Real-Time Transport Protocol) to make cloud VR possible [13]. On the other hand, in the MEC scenario, the source ends (edge servers) are more closely connected with the channel ends (base stations), and the bandwidths are sufficient to support the transmission of baseband data between the servers and the base stations, which greatly improves the feasibility of using pseudoanalog, HAD (hybrid digital-analog) and other source-channel joint coding techniques [14].
When a user watches a VR video, due to the limitation of the FoV (field of view) of the display device, the user often can only watch a certain part of the whole video at a moment. If the server transmits the entire video, most of the bandwidth resources will inevitably be wasted. Therefore, the adaptive block transmission method based on DASH (Dynamic Adaptive Streaming over HTTP) is the most widely used method for VR video streaming [15]. In the block transmission mode, a complete VR video is divided into many video blocks, and each video block is encoded into different quality levels. The server adaptively selects an optimal quality level for each video block according to factors such as network bandwidth and transmits it to the user. There are two basic schemes in the adaptive block transmission mode: view adaptation and rate adaptation [16]. The former is central for predicting changes in users' viewpoints, and the latter is central to resource allocation by the servers.
Based on the mobile edge computing technology in AIoT, this paper modified the LTE (Long-Term Evolution) base station protocol stack to build a mobile edge computing platform to serve for AIoT that expands to support HDA (hybrid digital-analog) transmission, in order to realize an efficient and reliable cloud VR system. The main contributions of this paper are as follows: (1) A learning-driven MEC server unloading strategy is adopted, so that users can automatically select the optimal MEC server (2) Realize a complete cloud VR system through viewpoint-aware dynamic streaming based on ROI and object priorities (3) Based on HDA technique, the system transmission efficiency is optimized to provide high quality VR videos under limited bandwidth

Related Research
The main content in the VR system is virtual scenes, and there will be a large number of 3D scenes in the virtual scenes. In the current network environment, the system is faced with the problem of how to solve the contradiction between the data transmission of 3D scenes and the limited network bandwidth in practice in AIoT. This aspect involves the processing, sending, and receiving of the 3D scenes, and at the same time, it is also necessary to ensure that the userside scenes can be generated with a good visual experience [17]. Whether these problems can be solved are fundamentally related to the successful implementation of the VR system. The researchers have proposed FOV transmission schemes for differential transmission of panoramic video information based on viewpoint areas [18]. These include the pyramid projection transmission scheme proposed by Facebook and the Tile Wise transmission scheme promoted by Huawei.
In the pyramid projection transmission scheme, a fullview nonuniform quality code stream is prepared for each view, and high-quality coding is used in the user's viewpoint region, while low-quality coding is used in other regions [19]. This method greatly reduces the bandwidth 2 Journal of Sensors requirements of system users when watching panoramic videos and improves the effective utilization of network bandwidth, but the sum of all perspective video files in the system server is more than 6 times that of the original files. The Tile Wise transmission scheme combines low-quality full-views and high-quality viewpoint regions. The server side does not need to prepare for each viewing angle area but divides the panoramic video image into multiple tiles at the same time, each area corresponds to a stream that can be decoded independently, and the server will prepare a low-quality panoramic full-view video stream. The client obtains a full-view stream and a high-quality tile selected according to the viewpoint information [20]. For the construction of panoramic images with nonuniform qualities, Hosseini et al. [21] proposed a viewpoint-aware adaptive VR transmission framework based on extended MPEG-DASH SRD (moving picture experts group-dynamic adaptive streaming over HTTP spatial relation description). Similarly, Kim et al. [22] proposed a SSAS (spatial segmented adaptive streaming) scheme based on the HLS (HTTP live streaming) protocol to realize real-time adaptive streaming based on user viewpoints. These solutions draw on existing HTTP adaptive transmission protocols such as DASH and further expand time-based tiling to space to achieve dynamic adaptive streaming. According to the situational information such as the computing capability of the users' mobile terminals, appropriate MEC servers are selected for efficient task offloading, so as to ensure the network delay performance and reduce energy consumption. Guo et al. [23] proposed a MEC task offloading strategy based on nonorthogonal multiple access, which takes into account the constraints of different access technologies. By considering different business quality of service (QoS) constraints, Henri et al. [24] proposed an offloading strategy that can guarantee a strong delay boundary based on game theory. Based on the Stackelberg game theory Hosseini et al. [25] proposed a price-based distributed MEC task offloading algorithm, which enables users to make autonomous decisions. In addition, Liu and Liu [26] proposed an energy-efficient MEC task offloading algorithm for ultradense wireless network scenarios in which energy overheads are minimized by optimizing offloading decision variables and power bandwidth allocation. In the existing research on MEC task offloading, it is assumed that the computing power and storage capacity of the MEC are known, and based on the research scenario of a single MEC server, the offloading decision of computing tasks is made with the goal of optimal delay or optimal energy. However, with the densification of base station deployment in 5G networks, a large number of MEC servers will be deployed on base stations or access points (APs) that are closer to user mobile terminals. The computing and storage capabilities of different MEC servers are different. Therefore, the mobile terminals on the user side need to independently decide and select the optimal MEC server access strategy according to  3 Journal of Sensors the situational information such as service characteristics and network environment, so as to minimize network delay and network energy consumption at the same time, thereby realizing an energy-efficient MEC server task offloading strategy.
Liu et al. [27] proposed an efficient VR transmission mechanism based on source-channel joint coding. After tiling the VR videos with reference to the users' FOV information, different levels of error correction strategies are used to maximize the viewing quality within users' FOV. Feng et al. [28] defined a new QoE (Quality of Experience) metric to measure the user's viewing experience and presented an efficient modulation control algorithm to maximize the QoE value under different channel conditions. Zhang and Ma [32] proposed multiobject crowd real-time tracking in the dynamic environment based on a novel neural network, which can be used in AIoT cloud VR.

VR Adaptive Transmission Scheme Based on MEC and Viewpoint Awareness
3.1. System Architecture. The proposed solution integrates the MEC technology on the basis of the LTE system and expands the base station to realize the HDA transmission mode, so as to meet the high requirements of the interactive VR services for latency and network quality and provide high-performance network support for the cloud VR service especially in AIoT data transmission [32]. The system structure is shown in Figure 2. By modifying the protocol stack of the base station, while introducing the MEC function, the compatibility of the system to standard LTE terminals is maintained. The tunneling protocols are used to redirect traffic at the network layer to filter and offload sensitive traffic from the edge services. The base station maintains a sensitive traffic table of the edge services, which records the IP addresses, protocols, and port numbers of data packets that need to be forwarded to the edge servers. Each passing data packet is matched. If the data packet matches the entry in the sensitive table, the destina-tion IP in the tunnel packet header of the GPRS Tunneling Protocol (GTP) is reconstructed, and the original core network IP is replaced with the edge server IP; that is, the data packet is forwarded to the edge server. For the returned downlink data, the edge server masquerades the source IP address as the real public network address of the application server.
The proposed architecture implements cloud VR based on the MEC system. The computing tasks and services are moved down to the edge of the base stations, so as to minimize the transmission delays from both the network structures and the physical distances, and the RTT will be controlled within 10 ms. It not only improves the response speed of the server but also ensures the stability of the network service quality and greatly improves the users' viewing experience. The introduction of HDA technique provides a more flexible transmission mode for edge servers, enabling them to make full use of bandwidth resources and alleviating the saturation effect of existing digital transmission in AIoT.

Learning-Driven MEC Server Adaptive Offloading
Strategy. In the MAB (multiarmed bandit) model [24], there are N gambling arms and one player for multiple rounds of selection. Each time the player selects one of the gambling arms and receives the corresponding reward, the player can only obtain the reward value of the selected arm after selection. The reward value of each gambling arm follows some unknown specific distribution and is independent of each other. The player learns the reward distribution of different gambling arms through exploration and utilization. After J rounds of games, the optimization goal of the player is to maximize the expected value of the reward.
It is assumed that there are U users and M base stations in the 5G wireless network scenario, and each base station contains an MEC server (to simplify the description, the base station and the MEC server are collectively represented by M). Let the total system bandwidth be B, and there are K subcarriers in the system bandwidth. Assuming that a user can only access one base station at time t, and at most, one user can access a subcarrier, then we have where k ∈ K denotes a resource block. The SINR (signal to interference plus noise ratio) of user terminal i and base station m on resource block k is where P km im represents the transmission power from base station m to user i in resource block k, the channel gain between base station m and user i is g k,j,m , and N 0 is the noise power that follows N ð0, δÞ distribution. The transmission rate from the user to the base station is For delay-sensitive services, it is assumed that the arrival rate of the data packets conforms to the Poisson distribution with the arrival rate of λ ds , and the fixed length of the data packets is L ds . In order to meet the QoS constraints of the delay-sensitive services, based on the effective bandwidth theory, the effective bandwidth with the transmission delay bound is defined as where Wðθ v Þ is the effective bandwidth, θ v is the QoS value of the user terminal, Z ðtÞ is the number of packets reached within the period (0, t), and E ð:Þ represents the mathematical expectation.
Assuming that the maximum computing frequency of each MEC server (base station) is f max m ð∀m ∈ MÞ, f ðtÞ = ½ f i , mðtÞ, m ∈ M, and i ∈ U, the amount of computation that the MEC server m can provide for the user's mobile terminals can be expressed as where f i,m represents the computing frequency of each server, and b i represents the computing load of the userside tasks, which can be measured in an offline fashion. The MEC network architecture is shown in Figure 3.
In the proposed learning-driven MEC-MAB autonomous offloading algorithm, the user's mobile terminal i is the player, and the MEC server m is the gambling arm. If the user i chooses to access the MEC server m, the corresponding random reward value Q i,m will be obtained. The reward value of each MEC server obeys a specific distribution with mean value as π = ½π 1 , π 2 , ⋯, π m and is independent of each other, where π m is the real reward of MEC server m. Since the user cannot always choose the server with the highest real reward, the regret value R j is defined as the difference between the actual reward value obtained after j selections and the expected maximum reward value: where π * = max 1≤s≤M π s , and N j ðmÞ is the number of times the MEC server m has been selected in the previous j rounds.
Since in the MAB model, the real reward value of the gambling arm is the reward value generated after the action is performed, it is necessary to estimate the reward value for the selection behavior of the gambling arm as follows: Using the Thompson-Sampling algorithm [29], the probability of each selection of the reward value of the MEC server in the MAB model is regarded as a Beta (α, β) distribution, and then the reward value distribution probability function of the MEC server selection behavior can be mathematically expressed as The parameter update rule for Beta distribution is given as Initially, the user mobile terminal observes situational information such as QoS of its computing task and sets t = 0 and γ = 0. at time t, t ≤ T, the reward estimation of the user's mobile terminal for the MEC server selection behavior satisfies WðmÞ~Betaðα m , β m Þ. The user selects the MEC server with the largest reward value arg max m WðmÞ ⟶ MEC t . The network applies the selected access behavior and measures the corresponding reward value r t , and the parameters are updated as ðα 1 , β 1 Þ + ðr t , 1 − r t Þ ⟶ ðα 1 , β 1 Þ.
In the proposed MEC-MAB algorithm, as the number of observations from the MEC server selections increases, the confidence interval of the beta distribution becomes narrower, enabling the users to automatically select the optimal MEC server with maximum reward.
3.3. Viewpoint-Aware Progressive Dynamic Streaming. This paper proposes a viewpoint-aware dynamic streaming based on ROI and object priority. The strategy first divides the data into the current scenes, potential scenes, and future scenes based on ROI judgment. And the concept of ROI is extended, and the priorities of vertical and horizontal objects are determined through ROI layering, so as to determine the transmission priority of each object in the scene.
The visibility judgment and elimination of object space can reduce the transmission of unnecessary scenes in order 5 Journal of Sensors to achieve the minimum transmission volume, thereby improving the real-time performance of interaction. To achieve this goal, the whole scene must first be analyzed to determine the visible scenes according to the position and angle of the user, and the visibility should be calculated according to the relevant algorithm to remove those unnecessary or unimportant scenes. Then, the current visible scenes are firstly transmitted, and as the viewpoint moves, the incremental part of the scenes is gradually transmitted, which is also the original intention of progressive transmission.
As shown in Figure 4, the viewpoint range is divided into three areas: CPVS (Current Potential Visible Scenes), IPVS (Incremental Potential Visible Scenes), and FPVS (Future Potential Visible Scenes), based on the visual habit of looking directly in front and then looking around, and observing the near area first and then the far area.
CPVS Zone is the immediate and nearest currently visible scene area. All object models in this area are visible to the user and are relatively close to the user; so, objects in this area should have the highest priority in the process of transmission and interaction. IPVS Zone is as follows: this zone consists of two parts. To the user, the objects in this area are not immediately visible and are relatively far from the user. If the user's viewpoint moves (walks forward or turns), then the objects in this area are immediately visible. Therefore, the object model in this area should have a relatively high priority. After the object model in the CPVS Zone is accessed, the scenes in this area are downloaded first, so that these models can be displayed in time when the user roams. FPVS Zone is as follows: all object models in this area are not currently visible to the user and are relatively far away from the user. Object models in this area should have lower priority, and these objects can be prefetched to the client for use in scene roaming only when the network is idle or if additional network bandwidth is available. Thus, the visual areas are divided into CPVS, IPVS, and FPVS. The object models in this three zones correspond to three queues, queue 1, queue 2, and queue 3, respectively, and the priority of the three queues is queue 1 > queue 2 > queue 3.
When there are many objects in the scene, in order to better distinguish different object models and realize the progressive transmission of the scene, it is necessary to further determine the access sequence of a certain scene with multiple objects in it; that is, it is necessary to determine the visual importance of different objects in a certain area. This paper extends the concept of ROI and determines the order of a specific object in the access queue by layering a certain ROI area and considering the horizontal and vertical importance of different objects.
Level of ROI is from the servers' perspective, the ROI area of the current user is further subdivided into several levels, and the transmission order of the objects is determined accordingly. As shown in Figure 5, if there are multiple objects in the currently visible ROI, the distance and viewing angle can be used to determine the order in which the objects are accessed: objects in level A have the highest priority, objects in level B have the next priority, and objects in level C have the lowest priority.
In this way, the system mainly uses limited bandwidth resources to transmit videos within the user's visible range, compresses redundant content as much as possible, and provides the best viewing effect. At the same time, each frame of video transmitted contains full-view information, which can achieve "device-cloud asynchronous" rendering. When the user's posture changes, the local display device does not need to wait for the server to send back data and completes the rendering locally in real time, updating the scene with the shortest delay to ensure the complete and smooth transition.
The essence of progressive streaming is "download while browsing" to achieve the optimal real-time effect. According to the principle of human vision, the closer the object is to the viewpoint, the smaller the angle that the object deviates from the viewpoint, and the higher the resolution of the object that the viewpoint can observe. To save bandwidth, it is not necessary to download the full model increments,   Journal of Sensors instead, we can just download the ORM of the scene. The ORM of an object can be determined based on its visual importance to the viewpoint: where W ðO i Þ represents the visual importance of the object O i , R is the ROI radius of the avatar, D i is the distance between O i and the avatar's viewpoint, and θi represents the deviation angle between the object i and the avatar's viewpoint (0 ≤ θ ≤ 180°).

Optimized HDA Transmission Scheme.
In the existing wireless communication system, the channel end encodes the video data into a bit stream for transmission. If there is a bit error in the transmission of the video code stream, the decoding of the video data will cause serious visual distortion or even a decoding failure. Although the current wireless video soft transmission scheme can realize seamless adaptation of video transmission quality to channel conditions, its transmission efficiency is not satisfactory. Combin-ing the high efficiency of traditional digital transmission and the robustness of video soft transmission, the HDA transmission technique has the potential to provide stable, reliable, and high-efficiency VR video transmission in AIoT.
In the proposed HDA transmission system, a timedivision multiplexing HDA video soft transmission scheme is designed. The video in the user window is decomposed into two layers: the first layer is the base layer signal, which is generated from the video source compressed by the HEVC encoder. The second layer is the enhancement layer signal, which is the residual value after subtracting the original video signal and the reconstructed signal of the first layer. The two layers of video signals are transmitted in a timedivision multiplexing manner. On the one hand, in order to achieve reliable transmission of the digital part, the target bit rate is controlled by the quantization parameter, and the channel coding rate and modulation order are determined by the SNR. On the other hand, overall video quality is directly dependent on the MSE (mean squared error) of the analog signal, which can be expressed as a function of the data variance of the analog part, the power and bandwidth allocated to the analog part, and the noise power of the channel.
In terms of power allocation, it is first necessary to ensure that the base layer can be successfully decoded. Therefore, the overall video quality of HDA video transmission is determined by the data variance of the enhancement layer (analog source), the resources allocated to the analog part, and the channel noise power. Based on the SNR, the bitstream signals at the base layer are turbo-coded with a selected channel coding rate, and the coded signals are subjected to quadrature amplitude modulation. Considering  7 Journal of Sensors that HEVC (high efficiency video coding) has basically removed the interframe correlation of video sequences, the residual between the original video and the reconstructed video basically does not contain interframe redundancy. The residual part is further decorrelated by 3D-DCT transform, and the power-scaled DCT coefficients are used to modulate the signal amplitude.
Since the signals in the second layer are the enhancement signal of the video, under the condition of limited bandwidth and power, restoring as many of the enhancement layer signals as possible helps to improve the quality of the reconstructed video. After the decorrelation operation of the enhancement layer signals, the energy distribution of the analog coefficients is relatively concentrated, which is manifested in some large coefficients that are concentrated in the upper left corner. In time-division multiplexing coding, appropriate parameters should be selected for the code rate and channel coding modulation mode of the first layer to ensure the correct decoding. Since the first layer is designed to be decoded correctly at a given channel noise power, the overall system distortion is determined by the reconstruction distortion of the second layer. In order to reduce the interference of the large coefficients of the analog part to the digital signal, we try to transmit the large coefficients by time-division multiplexing. Due to the bandwidth limitation, the small coefficients will be discarded. Although discarding small coefficients saves bandwidth, the highfrequency component information carried by these small coefficients cannot be recovered at the receiving end, which will bring additional performance loss.
Without loss of generality, we use MSE as the distortion measure. Let D a and D d be analog distortion and digital distortion, respectively. In order to successfully decode the digital base layer, the SNR of the digital part must be greater than the signal-to-noise ratio threshold SNR th . The spectral efficiency corresponding to SNR th is e f , which depends on the MCS (modulation and coding scheme), and it must satisfy where P d is the average power distribution coefficient of the base layer digital signal, and σ 2 n is the channel noise power with Gaussian white noise added. Since the sum of digital power and analog power is limited by the total power P T , the power constraint relationship can be expressed as where P a is the average power allocated to the enhancement layer analog signals, B a is the bandwidth occupied by the transmitted analog signal, and B d is the bandwidth occupied by the transmitted digital signal. After the enhancement layer is transformed by 3D-DCT, each group of video frames is further divided into N blocks, and the variance of the i -th block is defined as λ i . The analog sig-nals transmitted through the channel are interfered by the channel noise, and the distortions can be expressed as [30] On the other hand, the discarded coefficients cannot be recovered at the receiver, which also introduces additional distortion D a2 : Therefore, the optimal power allocation problem can be defined as The variance of the i -th block can be expressed as [31] where QP is the quantization parameter, and k i and w i are two parameters in the i -th block that represent the exponential relationship between λ i and QP. After the video is digitally compressed and encoded, the number of quantized bits per pixel can also be further fitted with an exponential function, and the fitting parameters are a and b, respectively. 8

Journal of Sensors
The relationship between the quantization parameter QP and the number of bits produced per pixel R is When a group of video frames has M pixels, the total number of bits obtained after digital compression can be calculated as

Experiment and Results
Based on the software radio platform from the laboratory and the modified LTE protocol stack of the MEC architecture, a complete MEC platform is build, and it is used as the bearer network to develop the cloud VR system which is suitable for AIoT. The system can use standard commercial terminals or professional VR headsets as client display devices. A complete performance evaluation of the proposed system is performed on this platform.

MEC Offloading Algorithm Verification.
Firstly, the proposed learning-driven MEC task offloading strategy is verified by simulation. Assuming that the number of users is 10, the computing tasks of the users' mobile terminals follow the Poisson distribution, and the path loss exponent is set to 2. The variation of the simulation regret value with respect to the number of iterations for the number of MEC server nodes (base stations) of 3, 5, and 10 is shown in Figure 6. It can be observed that the network regret value converges in a short time for different numbers of MEC servers. As the number of MEC servers increases, the convergence speed of the algorithm becomes slower, but the overall convergence speed is still reasonable, which shows that the proposed MEC-MAB offloading strategy has good convergence performance.

VR Transmission Scheme
Validation. MATLAB experimental simulation of the proposed HDA transmission scheme is carried out. The HDA transmission system proposed in this paper consists of a data channel and a control channel. The data channel executes the function blocks of the transmit ends and receive ends, respectively, and is based on the power distribution calculation to solve P a and QP with minimized distortions. The data channel combines digital transmission and pseudoanalog transmission. The digital transmission scheme uses HEVC for source coding and the LTE-based adaptive modulation and coding scheme for transmission. Different combinations of channel coding rates and modulation modes can be selected.
Experimental simulations were performed using standard HD sequences with a resolution of 1664 × 1664 pixels. For analog transmission, each frame in the video sequence is divided into 64 blocks. In the experiment, each picture group is set to consist of 16 frames of images; so, the analog symbols of each picture group are divided into 1024 coefficient blocks. When the video frame rate is 30 fps, the source bandwidth N s is 41.5 MHz. The bandwidth used for data transmission is defined as N c , and a specific implementation process is designed to make the number of symbols used for source coding and channel coding in the digital part less than or equal to N c .
The performance of the HDA transmission scheme proposed in this paper is compared with the existing digital video transmission scheme HEVC. PSNR is measured at the receiving ends to evaluate the quality of video transmission, and same bandwidth and power are used for the two schemes. According to the LTE adaptive modulation and coding scheme, the channel coding adopts LTE turbo coding, and the code rate is R = 1/3. The modulation scheme supports QPSK, 16QAM, and 64QAM. Taking the channel SNR = 5:5 dB as an example, the spectral efficiency is about 1.47. Table 1 gives the results of the HEVC scheme and the proposed HDA scheme on the test sequence under different QPs when the target channel SNR = 10 dB and the spectral efficiency of the digital part is 1.47. The ratio of the available video channel bandwidth to the source bandwidth is set to β, i.e., β = N c .
It can be seen from Table 1 that when the digital part adopts a certain QP value, the video quality of the receiving  if the available channel bandwidth is only 1/4 of the source bandwidth, the data length of the digital part will exceed the available bandwidth, causing the digital part to not be decoded correctly. In contrast, with the proposed method, under the condition that the available bandwidth resources are severely limited, increasing the QP can realize the encoding and transmission of the digital part of the data.
Next, further performance comparisons between the proposed HDA scheme and HEVC scheme are carried out. Under different channel conditions, the classic HEVC scheme is inevitably affected by the cliff effect. As the SNR increases, so does the spectral efficiency, at which point HEVC will have the opportunity to choose a lower QP. Consider an SNR of 0 to 20 dB, β = 0:5, and 10% of the bandwidth is reserved for hybrid automatic retransmission of the digital part. As shown in Figure 7, the average PSNR of the proposed HDA scheme is 0.41 dB higher than that of the HEVC scheme. By adding analog signals to the existing digital transmission scheme and dividing part of the bandwidth to the analog signals, the saturation effect of the video quality at the receiving ends can be improved. When the SNR of the target channel is high, analog signal transmission can further achieve greater performance gains and benefit for AIoT.

Conclusions
With the rapid development of multimedia services such as VR, there are great requirements for the data transmission in AIoT; also, the resolution of video data is increasing day by day, along with the growing challenge for network bandwidth. Therefore, new business models, such as cloud VR, attracted more attention on the transmission network latency. In this paper, we design and build a cloud VR system based on AIoT using MEC technology to perform dynamic streaming relying on user viewpoint information. Through the learning-driven MEC autonomous offloading strategy, in the absence of prior information such as MEC server computing and storage capabilities and channel status, the optimal MEC server is autonomously selected for task offloading, and the energy consumption is minimized while satisfying user delay constraints. Combined with the viewpoint-aware progressive streaming based on ROI and object priorities used on the server side, the bandwidth requirements are reduced, and a good VR viewing experience is achieved. Meanwhile, based on the relationship between edge servers and base stations in the MEC architecture, HDA transmission technology is merged to further optimize the transmission bandwidth and efficiency in AIoT cloud VR systems. It is known that the use of the AIoT data transmission can enrich the data information and enhance the human-computer interaction in AIoT cloud VR systems. However, it needs more rapidly data processing and more faster network support. In subsequent studies, the availability of better compression schemes for panoramic video and AIoT data transmission in AIoT cloud VR will be further explored, and the channel fading will be further considered with a view to achieving more performance gains.

Data Availability
The data we used is available, and the performance optimization scheme proposed in this paper can be used in adaptive data transmission of VR videos. And part of them are available from the corresponding authors upon request (923785608@qq.com; zfq@mju.edu.cn).

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the study of this work and publication of this paper.