5G Network Slicing: Methods to Support Blockchain and Reinforcement Learning

With the advent of the 5G era, due to the limited network resources and methods before, it cannot be guaranteed that all services can be carried out. In the 5G era, network services are not limited to mobile phones and computers but support the normal operation of equipment in all walks of life. There are more and more scenarios and more and more complex scenarios, and more convenient and fast methods are needed to assist network services. In order to better perform network offloading of the business, make the business more refined, and assist the better development of 5G network technology, this article proposes 5G network slicing: methods to support blockchain and reinforcement learning, aiming to improve the efficiency of network services. The research results of the article show the following: (1) In the model testing stage, the research results on the variation of the delay with the number of slices show that the delay increases with the increase of the number of slices, but the blockchain + reinforcement learning method has the lowest delay. The minimum delay can be maintained. When the number of slices is 3, the delay is 155 ms. (2) The comparison of the latency of different types of slices shows that the latency of 5G network slicing is lower than that of 4G, 3G, and 2G network slicing, and the minimum latency of 5G network slicing using blockchain and reinforcement learning is only 15 ms. (3) In the detection of system reliability, reliability decreases as the number of users increases because reliability is related to time delay. The greater the transmission delay, the lower the reliability. The reliability of supporting blockchain + reinforcement learning method is the highest, with a reliability of 0.95. (4) Through the resource utilization experiment of different slices, it can be known that the method of blockchain + reinforcement learning has the highest resource utilization. The resource utilization rate of the four slices under the blockchain + reinforcement learning method is all above 0.8 and the highest is 1. (5) Through the simulation test of the experiment, the results show that the average receiving throughput of video stream 1 is higher than that of video stream 2, IOT devices and mobile devices, and the average cumulative receiving throughput under the blockchain + reinforcement learning method. The highest is 1450 kbps. The average QOE of video stream 1 is higher than that of video stream 2, IOT devices and mobile devices, and the average QOE is the highest under the blockchain + reinforcement learning method, reaching 0.83.


Introduction
Relieving users' network congestion, reducing network latency, and offloading the network are the top priorities for 5G networks. As a core technology, the 5G network slicing technology can effectively solve the challenges of business creation and exclusive network access for different users, as well as the coexistence of multiple application scenarios. e 5G network is expected to meet the different needs of users [1]. 5G network slicing may be a natural solution [2]. A wide range of services required for vertical specific use cases can be accommodated simultaneously on the public network infrastructure. 5G mobile networks are expected to meet flexible demands [3]. erefore, network resources can be dynamically allocated according to demand. Network slicing technology is the core part of 5G network [4]. e definition of 5G network slicing creates a broad field for communication service innovation [5]. e vertical market targeted by 5G networks supports multiple network slices on general and programmable infrastructure [6]. e meaning of network slicing is to divide the physical network into two virtual networks so that they can be flexibly applied to different network scenarios. e future 5G network will also change the mobile network ecosystem [7]. e 5G mobile network is expected to meet the diversified needs of a variety of commercial services [8]. 5G mobile networks must support a large number of different service types [9]. Network slicing allows programmable network instances to be provided to meet the different needs of users. Blockchain can establish a secure and decentralized resource sharing environment [10]. Blockchain is a distributed open ledger [11] and is used to record transactions between multiple computers. Reinforcement learning algorithms can effectively solve large state spaces [12]. Reinforcement learning is mainly used to solve simple learning tasks [13]. 5G networks are designed to support many vertical industries with different performance requirements [14]. Network slicing is considered an important factor in enhancing the network and has the necessary flexibility to achieve this goal. Network slicing is considered one of the key technologies of 5G network [15]. You can create virtual networks and provide customized services on demand.  [16]. When facing the different needs of different users, the network is divided into many pieces to meet customer needs. Moreover, it provides targeted services and assistance.

Network Slice Classification.
e ultimate goal of 5G network slicing is to organically combine multiple network resource systems to form a complete network that can serve different types of users. Network slices can be divided into independent slices and shared slices as shown in Table 1: e application scenarios of 5G networks are divided into three categories: mobile broadband, massive Internet of ings, and missioncritical Internet of ings [17]. e details are shown in e blockchain consists of a shared, fault-tolerant distributed database, and a multi-node network [18].

Blockchain Structure.
e block chain is composed of a block header and a block body, which forms into a chain structure through the hash of the parent block [19]. e structure is shown in Figure 1: e structure contains the parent block hash, timestamp, random number, difficulty, and the Merkle root [20]. Its functions are shown in Table 3:

Blockchain Properties.
Blockchain technology has three attributes of distribution, security, and robustness [21], as shown in Table 4

Reinforcement Learning Process.
In the process of reinforcement learning, the agent needs to make decisions on the information in the environment [22]. At the same time, the environment will also reward the agent for the corresponding behavior, and the agent will enter a new state after the behavior. e process is shown in Figure 2:

Model Design.
e 5G network slicing architecture is composed of network slicing demander, slice management (business design, instance orchestration, operation management), slice selection function, and virtualization management orchestration. e process of the 5G network slicing model is as follows: network services enter the slice manager through the network slice demander, and the slice manager includes business design, instance arrangement, and operation management. After the slice manager enters the slice selection function, it is divided into shared slice function and independent slice special function, and it can also enter the virtualization management orchestration as shown in Figure 3:

Scalability within Shards.
In the process of verifying the block consensus, the scalability within the shard [23]is as follows: Among them, b I is the average transaction size, B Ih is the block header size, and K is the number of shards.

Scalability of Directory Fragmentation.
Assuming that the average transaction size is b F , the block header size is B H , and the scalability of the directory fragmentation is as follows:

Independent slice
A slice with a logically independent and complete network function. e slice includes a user data plane, a network control plane, and various user business function films, which can provide a logically independent end-to-end private network service for a specific user group. If necessary, only part of the services of specific functions can be provided.

Shared slice
A shared slice is a specific network slice whose network resources can be used by different independent slices. e slice can provide end-to-end services, and when necessary, it can also only provide partial sharing functions.   Distributed e blockchain connects the participating nodes through a peer-to-peer network to realize resource sharing and task allocation between peer nodes. Each network node does not need to rely on the central node and can directly share and exchange information. Each peer node can not only be an acquirer of services, resources, and information but can also be a provider thereof, which reduces the complexity of networking while improving the fault tolerance of the network.

Scalability of Sharded
Blockchain. e scalability of the entire sharded blockchain is composed of the internal scalability of the shards and the scalability of the catalog shards [24]. Assuming that the block packing time within the fragment and the directory fragment is the same as T I ′ and the block header size is the same as B H ′ , the formula is as follows:

Value Function Method.
e value function method is to give an estimate of the value for different states. 0 is the given value, and V π (s) starts from state V π (s). e formula is as follows: (4) e optimal strategy π * has a corresponding state-value function V * (s), which is expressed as follows: In the RL setting, it is difficult to obtain the state transition function P. So, a state-action value function is constructed.

Safety
Blockchain can use encryption technology to asymmetrically encrypt the transmitted data information. e task request for writing data in the blockchain needs to be accompanied by the private key signature of the task initiator. e changed signature is broadcasted together with the task request among participating nodes in the network. Each node can verify its identity, so the task request is not allowed for forgery and tampering. At the same time, the blockchain data structure in the blockchain further ensures that the content in the block cannot be tampered with at will. Even if some nodes in the chain are maliciously forged, tampered with, or destroyed. It will not affect the normal operation of the entire blockchain.
Robustness e consensus mechanism determines the degree of agreement between the voting weight and computing power between subjects. e entire blockchain system uses a special incentive mechanism to attract more miners to participate in the process of generating and verifying data blocks, perform mathematical calculations in a distributed system structure, use consensus algorithms to select a node, and then create a new one. e effective block of is added to the entire blockchain, and the entire process does not rely on a third-party trusted institution.

(6)
Given Q π (s, a), in each state, the optimal strategy argmax a Q π (s, a) can be adopted. Under this strategy, V π (s) can be defined by maximizing Q π (s, a)as follows: At present, mature deep learning methods such as SARSA and offline Q learning can all be used to solve the value function. SARSA: Offline Q learning:

Strategy Method.
e strategy method is to directly output the action by searching for the optimal strategy π * . e objective function J(θ) is defined as the cumulative expected reward.
e policy parameter ∇ θ J(θ) is estimated in the discounted cumulative expected reward gradient θ and obtained based on a certain learning rate (α l ). e formula of the strategy gradient method is as follows:

MDP.
MDP mainly solves the problem of learningrelated experiences in the interaction between the agent and the environment to achieve the goal [25]. Assuming that the state space is S, it is defined as follows: Among them, h represents the state of all wireless channels in the 5G network slice, H represents the channel state space, and H is represented as follows: Among them, h m represents the channel state and H m represents the channel state space.
x means connection status, X means connection status space. X is defined as follows: d represents the state of all data transmission rates in the slice, and D represents the data transmission rate state space. D is defined as follows: φ represents the topological state of the physical network, and ψ represents the topological state space in the physical network. ψ is defined as follows: A r means that the action space is allocated for unlimited resources, which is defined as follows: A r � a r,1 , a r,2 , . . . , a r,|U| |∀u ∈ U, a r,u ∈ A r,u .
Among them, a r,u is the 5G network radio resource allocation action, and A r,u is its corresponding network action space, expressed as follows: Among them, v u,m ′ represents occupied wireless resources. A i � a 1 , a 2 , . . . , a n , the calculation level of A i is denoted as S i � s 1 , s 2 , . . . , s n , and the link set composed of nodes is denoted as L n � l 1 , l 2 , . . . , l n . e first dynamic dispatch queue state transition function is as follows:

Model Building. Suppose the weighted undirected graph of the physical network is C � (A i , S i ), where the set of network nodes is denoted as
e second dynamic scheduling queue state transition function is as follows: Combining the above analysis, the 5G network slicing model, the formula is expressed as below:

Variation of Time Delay with the Number of Slices.
is article mainly studies 5G network slicing methods to support blockchain and reinforcement learning. First, we will test the model and compare the blockchain + reinforcement learning method with the blockchain, reinforcement learning, and unused methods. e results are shown in Figure 4.
Computational Intelligence and Neuroscience e comparison results show that the delay increases with the increase of the number of slices, but the blockchain + reinforcement learning method has the lowest delay and can maintain the minimum delay. When the number of slices is 3, the delay is 155 ms. e overall delay of the blockchain is lower than the delay of reinforcement learning because the blockchain will give priority to nodes with rich resources and strong data processing capabilities when selecting nodes and link mappings, so the delay is lower.

Delay Comparison of Different Slice Types.
Under different slice types, set the number of users to 30 and compare the delays generated by several methods. We compare 5G network slicing, 4G network slicing, 3G network slicing, and 2G network slicing in blockchain + reinforcement learning, blockchain, reinforcement learning, and unused methods. e results are shown in Figure 5.
rough the comparison results, it can be seen that the latency of 5G network slicing is lower than that of 4G, 3G, and 2G. 5G network slicing has the lowest latency of only 15 ms in the method of blockchain and reinforcement learning. is is because the greater the number of VNFs, the more nodes that the slice will pass through to process the same data packet, the longer the link that passes, and the greater the delay.

System Reliability.
System reliability is an indispensable step before the experiment. We will compare the system reliability of different methods (blockchain + reinforcement learning, blockchain, reinforcement learning) under different numbers of users. e comparison result is shown in Figure 6: It can be seen from the graph that the reliability decreases with the increase of the number of users because reliability is related to delay. e greater the transmission delay, the lower the reliability. e reliability of the  supporting blockchain + reinforcement learning method is the highest, with a reliability of 0.95. is means that 5G network slicing that supports blockchain + reinforcement learning methods can provide services for more businesses.

Resource Utilization of Different Slices.
is article studies the methods that support blockchain and reinforcement learning. We will study the resource utilization of blockchain and reinforcement learning for different slices. Set up 4 slices and perform three tests on each slice, namely, blockchain + reinforcement learning, blockchain and reinforcement learning, and finally compare their resource utilization experiment results as shown in Figure 7: According to the experimental results, it can be concluded that the method of blockchain + reinforcement learning has the highest resource utilization rate. e resource utilization rate of the four slices under the blockchain + reinforcement learning method is all above 0.8, and  Computational Intelligence and Neuroscience   e experiment will compare 4 types of equipment using three methods: blockchain + reinforcement learning, blockchain, and reinforcement learning. By comparing the average cumulative received throughput (kpbs), which method is better is decided. roughput refers to the number of requests processed by the system in a unit of time. e results are shown in Table 5. e result is plotted as a histogram, and the result is shown in Figure 8.
According to the experimental results, the average receiving throughput of video stream 1 is higher than that of video stream 2, IOT devices, and mobile devices, and the average cumulative receiving throughput is the highest under the blockchain + reinforcement learning method, reaching 1450 kbps.

Average QOE.
Under three different methods, compare the average QOE of different devices to prove which method is more suitable for 5G network slicing. QOE refers to the user's comprehensive experience of the quality and performance of the network system. e results are shown in Table 6: e result is plotted as a histogram, and the result is shown in Figure 9.
According to the experimental results, the average QOE of video stream 1 is higher than that of video stream 2, IOT devices, and mobile devices, and the average QOE is the highest under the blockchain + reinforcement learning method, reaching 0.83.

Conclusion
With the advent of the 5G era, current technologies can no longer meet the needs of users. Network congestion and slow network speeds are major problems currently facing. In order for users to use network services more smoothly, network services are more convenient. is article designs 5G network slicing: a method model supporting blockchain and reinforcement learning. is model will perform better distribution management of the network, increase the transmission rate of users in the business, and reduce the transmission delay. e research results of the article are given below: (1) In the model testing stage, the results of the study on the variation of the delay with the number of slices show that the delay increases with the increase of the number of slices, but the blockchain + reinforcement learning method has the lowest delay and can maintain the minimum delay When the number of slices is 3, the delay is 155 ms. (2) e comparison of the delay of different slice types shows that the delay of 5G network slicing is lower than that of 4G, 3G, and 2G. 5G network slicing has the lowest delay in the method of blockchain and reinforcement learning, only 15 ms. (3) In the detection of system reliability, reliability decreases as the number of users increases. is is because reliability is related to delay. e greater the transmission delay, the lower the reliability. Supporting blockchain + reinforcement learning method has the highest reliability. (4) In the resource utilization experiment of different slices, it can be known that the method of blockchain + reinforcement learning has the highest resource utilization. e resource utilization rate of the four slices under the blockchain + reinforcement learning method is all above 0.8 and the highest is 1. (5) rough the simulation test of the experiment, the results show that the average receiving throughput of video stream 1 is higher than that of video stream 2, IOT devices, and mobile devices, and the average cumulative receiving throughput under the blockchain + reinforcement learning method e volume is the highest, reaching 1450 kbps. e average QOE of video stream 1 is higher than that of video stream 2, IOT devices, and mobile devices, and the average QOE is the highest under the blockchain + reinforcement learning method, reaching 0.83.
Although the results of this experiment are obvious, it has certain limitations and is limited to the use of 5G network slicing. A lot of research is needed in the future to enhance its universality and apply it to more scenarios. In future research, the methods for supporting blockchain and reinforcement learning proposed in this article can be improved, so that blockchain and reinforcement learning methods can be realized in the network service requirements with more goals.

Data Availability
e experimental data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding this work.
Computational Intelligence and Neuroscience 9