A Time Delay Prediction Model of 5G Users Based on the BiLSTM Neural Network Optimized by APSO-SD

To address the problems of 5G network planning and optimization, a 5G user time delay prediction model based on the BiLSTM neural network optimized by APSO-SD is proposed. First, a channel generative model based on the ray-tracing model and the statistical channel model is constructed to obtain a large amount of time delay data, and a 5G user ray data feature model based on three-dimensional stereo mapping is proposed for input feature extraction. Ten, an adaptive particle swarm optimization algorithm based on a search perturbation mechanism and diferential enhancement strategy (APSO-SD) is proposed for the parameters’ optimization of BiLSTM neural networks. Finally, the APSO-SD-BiLSTM model is proposed to predict the time delay of 5G users. Te experimental results show that the APSO-SD has a better convergence performance and optimization performance in benchmark function optimization compared with the other PSO algorithms, and the APSO-SD-BiLSTM model has better user time delay prediction accuracy in diferent scenarios.


Introduction
Cellular systems are evolving towards 5G wireless systems due to the explosive growth of mobile devices and mobile trafc. Compared to 4G networks, the 5G network make new standards across generations in spectrum, air interface, and network architectures to meet future 5G application scenarios. Tese new standards and technologies bring challenges to 5G wireless network planning [1][2][3]. In addition, large-scale antenna arrays increase the difculty of network optimization, and the complex spatial correlation makes it difcult for traditional fxedpoint road tests and quantitative coverage models to meet the requirements of fast, comprehensive, and accurate assessment of raster-level network performance. Although traditional network performance simulations can achieve accurate analysis, the massive computation and time overhead make the simulation impossible to apply in the actual feld [4]. search (CS) optimization. Owing to the high cost of sample collection and the need to regenerate the sample data when the network parameters are changed, these methods are difcult to apply in actual networks.
Te literature [13] used neural networks to predict 5G channel state information online, but only the wireless channel characteristics were predicted without network performance, and there has been shortcoming such as limited prediction parameters, small application range, and long data collection cycle. Te literature [14] predicted the path loss of a wireless channel using a feedforward neural network algorithm based on the path loss data. Tis approach had a good performance in path loss prediction, but it could not collect data before the existing network was set up, and the cost of obtaining data through transmitters is relatively high. Te network time delay prediction model based on gated recurrent neural networks was designed and validated based on real delay data between ofsite server rooms at Amazon in the literature [15]. Te model had a good result in terms of enterprise device latency but had a narrow scope of application for the complex 5G user usage scenario. Zhu et al. [16] implemented network latency prediction by deep neural networks for grid-level users. Tis method was efective in realistic 5G network planning and optimization, but it did not consider the impact of deep neural network parameter settings on the network time delay prediction performance.
From the previous research, it can be seen that the current research on 5G user time delay prediction had achieved certain results, but there had some problems in the user time delay modeling process and model prediction accuracy. Addressing the shortcomings of the previous methods, this paper proposes a 5G user ray data feature model based on three-dimensional stereo mapping, and it uses the deep neural network BiLSTM to learn the ray data features. Te trained network model is used to predict the grid-level user network time delay. However, BiLSTM has shortcomings such as slow convergence speed and easy to fall into local optimization due to its random assignment of initial weight thresholds. In order to improve the accuracy of 5G user time delay prediction, this paper proposes an adaptive particle swarm optimization algorithm based on a search perturbation mechanism and diferential enhancement strategy (APSO-SD) to optimize the initial parameters of the BiLSTM network.
Te main contributions of this paper are follows: (1) An adaptive particle swarm algorithm with a migration strategy and search disturbance mechanism (APSO-SD) for the parameter optimization of BiLSTM neural networks is proposed (2) A 5G user time delay data feature model based on three-dimensional stereo mapping is proposed (3) A 5G user time delay data prediction model (APSO-SD-BiLSTM) based on BiLSTM optimized by APSO-SD is proposed Te context of the article is as follows: Section 2 introduces the correlation theory. Section 3 introduces the 5G user ray data feature model based on three-dimensional stereo mapping. Section 4 introduces the adaptive particle swarm optimization based on the search perturbation mechanism and diferential enhancement strategy. Section 5 introduces the 5G user time delay prediction based on the BiLSTM neural network optimized by APSO-SD. Te experimental and simulation results are introduced in Section 6. Te conclusion is put at the end.

Time Delay Simulation Model.
Te main fowchart of the time delay simulation model used in this article is shown in Figure 1. Tis model frst utilizes the fusion of the ray-tracing model and the statistical model to generate a channel matrix and then inputs the channel matrix to the 5G simulation platform to generate grid-level user delay. A large amount of the 5G user time delay data can be obtained through the delay simulation model.

Channel Generative Model Based on the Ray-Tracing
Model and Statistical Channel Model. Statistical channel models can generate channel matrices based on specifc scenario confguration model parameters. Te ray-tracing model can output the geometric information of all emission points between the starting and ending points of the ray, including the three-dimensional coordinate information of the starting point, ending point, and refection point, as well as the large-scale path loss information of each ray [17]. Due to the ray-tracing model not containing small-scale information, only the geometric position information of the rays cannot calculate multipath efects. Terefore, the missing part must rely on the probability distribution information of the statistical model. To calculate the complete channel fading coefcient matrix, it is necessary to supplement the radiation data with antenna layout information, antenna pattern, and power delay distribution information. Te antenna layout information and antenna pattern support the standard antenna confguration of 3GPP38.901, as well as antenna template input. Te power delay distribution information is based on the probability distribution and parameter generation success rate delay distribution information of diferent scenarios defned by 3GPP38.901. Tis article proposed a channel generative model based on the ray-tracing model and the statistical channel model to generate the channel matrix. Te new model calculates the channel matrix according to the process and method defned in 3GPP38.901. Te information that the ray-tracing model can provide is based on the ray-tracing model, while the remaining information is based on a statistical model. Te specifc parameters of the channel matrix are shown in Table 1.
Te fnal equation of the channel coefcient is shown as follows: where N represents the ray cluster, m represents the rays within the cluster, P n represents the number of receiving and transmitting antennas, M represents the number of rays within the cluster, θ n,m,ZOA and φ n,m,AOA represent the horizontal and vertical azimuth angles of the receiving antenna u, θ n,m,ZOD and φ n,m,AOD represent the horizontal and vertical azimuth angles of the transmitting antenna s, R rx,u,θ and R rx,u,φ are directional maps of the feld of the receiving antenna u in the vector direction of the spherical base θ and φ, R tx,s,θ and R tx,s,φ are directional maps of the feld of the transmitting antenna s in the vector direction of the spherical base θ and φ, r rx,n,m and r T tx,n,m are the spherical unit vectors with azimuths φ n,m,AOA and φ n,m,AOD , and d rx,u and d tx,s are the direction vectors of the receiving and transmitting antennas.

Time Delay Calculation
Method. Te latency in this paper refers to the airport delay on the wireless side, mainly considering the single downstream delay of the packet from the MAC layer of the base station to the MAC layer of the user. It is assumed that packets do not need to be split and merged, the FULL BUFFER model is used in the service model, the hybrid automatic retransmission request (HARQ) channel confguration is 8, the packet single transmission duration (no retransmission) is 4 ms (1TTI), and the maximum number of packet retransmissions is 4. Te packet retransmission process is shown in Figure 2. Tis paper calculates the downlink unidirectional transmission delay of the user's wireless side by counting the number of packet retransmissions. By counting the number of retransmissions of a single packet, the transmission delay of the packet in this transmission task can be calculated, and the user's delay can be obtained by counting the transmission delay of all the user's packets during the simulation period, as shown in the following equation:   where T denotes the transmission delay of a single packet, t denotes the transmission delay of the frst successful transmission of a packet, N denotes the number of retransmissions of a packet, and T d denotes the time required for a retransmission to occur. Te user's time delay calculation process is shown in the following equation: where T denotes the user's transmission delay, T S denotes the total transmission delay of all packets from the user during the simulation, and P N denotes the total number of packets sent during the simulation.

Bidirectional Long Short-Term Memory (BiLSTM)
Network. In 1997, Schmidhuber et al. proposed a variant recurrent neural network long short-term memory (LSTM) network [18], which introduces a gating mechanism to simply and efectively solve the gradient explosion or disappearance problem of traditional recurrent neural networks. Te LSTM controls the information transfer between each cell by means of a gating mechanism. Te calculation of each gate in the LSTM model is shown in the following equations: where i t , f t , and o t represent the input gate, forgetting gate, and output gate, c t represents the cell unit, σ and tanh are two activation functions, W f represents the weight matrix connected by the forgetting gate, b f represents the ofset value of the forgetting gate, W i represents the weight matrix of the input gate connection, b i represents the ofset value of the input gate, W o represents the weight matrix of the output gate connection, b o represents the ofset value of the output gate, and ⊕ represents the multiplication of two matrix elements.
Although LSTM solves the problems of gradient vanishing and long-term dependence, for user delay prediction problems, the current state of the network is not only related to the previous state but may also be related to the subsequent state. In order to improve the prediction efect, the bidirectional long short-term memory (BiLSTM) network is introduced to predict user delay [19]. BiLSTM is composed of two LSTM layers stacked forward and backward, and its structure is shown in Figure 3.
Te output of BiLSTM is determined by both LSTM layers together, the forward LSTM layer can be seen as a forward calculation from the starting moment to the last moment and the reverse LSTM layer can be seen as a reverse calculation from the last moment to the starting moment, with both layers being processed in the same way during calculation. Finally, the outputs of the forward and reverse layers are combined at each moment to obtain the output for that moment.
Te state calculation at each moment in the BiLSTM model is shown in equations (10) and (11). Te output is jointly determined by the states of LSTM in these two directions, as shown in equation (12): where W t represents the weight matrix of the forward output, v t represents the weight matrix of the reverse output, and b t represents the ofset at time t.

The 5G User Ray Data Feature Model Based on Three-Dimensional Stereo Mapping
In Section 2.1.1, the ray data are the input of the channel model and the frequency-domain channel matrix is the output of the channel model. Te ray data contain the spatial characteristics of ray propagation, a large-scale path loss, and delay information of rays. Te spatial feature vector of the channel matrix is contained in the geometric parameters of ray propagation. Once the position of the ray refection point relative to the user and base station is determined, the spatial feature vector of the channel matrix is determined accordingly. Using the frequency-domain channel matrix generated by channel generation as an input to the 5G wireless simulation platforms, combined with other infuencing factors, the user delay on the wireless side can be output.
Using neural network models to predict user delay requires designing reasonable input features. For this reason, this paper proposes a three-dimensional feature model that extracts the features of ray data as input features for the neural network model's training.
As shown in Figure 4, the user's ray data contain spatial features of ray propagation. Once the location of the base station, user, and ray refection point is determined, the propagation path of the user ray is determined. Te threeview feature model projects all refection points of user ray data onto three planes: XOY, XOZ, and YOZ. Each plane contains a part of the spatial features of user ray data. By combining the spatial features of the three planes, the spatial features of user ray data can be fully restored.
Taking the refection points on the XOY plane as an example, all refection points of user ray data are projected onto the XOY plane. Simple processing of the XOY plane is required before projection. Because the base station coverage range in this article is 1000 m, therefore, the base station is used as the coordinate origin to extend 1000 m to the positive and negative directions of the X and Y axes, respectively, to form a coverage area. Similar to the pixel data in image processing, the 1000 * 1000 base station coverage area of the XOY plane is divided into grid data such as 32 * 32. After dividing the cells, it is necessary to determine the projection position of the refection point, which is the small grid on the XOY plane where the refection point falls. In the XOY plane, there are some cells without projection points and some cells with multiple projection points. After the projection position of the refection point is determined, if there are no projection points in the cell, zero will be flled in the cell. If there are multiple projection points in the cell, the large-scale path loss or delay information of the ray will be averaged and flled in the cell to construct grid data as the input data for subsequent neural network models. In addition, due to the refection points being all above the ground, the Z-axis only extends 100 m in its positive direction (height) to construct the coverage area.

Adaptive Particle Swarm Optimization Based on the Search Perturbation Mechanism and Differential Enhancement Strategy
PSO is a global search algorithm proposed by Kennedy and Eberhart by observing the foraging behavior of birds [20]. Te algorithm initializes a group of particles randomly and assigns each particle a random speed and position, and each particle represents a random solution. During the iteration process, each particle completes the update by tracking the individual extreme value and the global extreme value. Te standard PSO is described as follows: where ω represents the inertia weight, k represents the number of iterations, n represents the vector dimension, c 1 and c 2 are the random numbers between 0 and 1, V k i,n , X k i,n , Pbest k i,n , and Gbest k i,n are, respectively, the speed, position, individual extreme value, and global extreme value of the ith particle in the nth dimension of the kth iteration.
Similar to other intelligent algorithms, the PSO algorithm is prone to premature convergence and falling into local optimizations when solving complex high-dimensional functions. From the perspective of improving the particle iteration mechanism, this paper proposes an adaptive particle swarm optimization based on the search perturbation mechanism and diferential enhancement strategy (APSO-SD). Te APSO-SD algorithm mainly proposes corresponding optimization strategies for the diferent problems that exist in the population generation and optimization stages of the PSO algorithm. Te optimization strategy is divided into the following two aspects:

Journal of Electrical and Computer Engineering
(1) In order to expand the search space of particles and increase the diversity of solutions, disturbance factors are introduced to perturb the optimization state of particles to achieve intelligent search. (2) To avoid the loss of excellent genes from poorer individuals, genetic mechanisms are utilized to prevent inefective operations caused by "inbreeding." Ten, particle irrelevance is introduced, and the "distant relatives" of the diferential individuals were found for diferential variation to increase the population diversity.

Search Disturbance Mechanism.
In APSO-SD, in order to balance the global search and local search of particles, this paper introduces the conversion mechanism in the fower pollination algorithm (FPA) [21]. In the FPA, the simulated pollen heterogeneous pollination method is global search and the simulated pollen self-pollination is local search. Te two search methods are controlled by the transition probability P, where P is a random number between 0 and 1. Te smaller the transition probability P is, the easier it is for the particle to perform a local search. Te larger the transition probability P is, the easier it is for the particle to perform a global search. To address the above problems, this paper uses a linearly varying transition probability to make it decrease linearly from the maximum value P max to P min , as in the following equation: Based on experimental data in the literature [21], generally, it is better to set P max at 0.95 and P min at 0.4.
In the process of global search, in order to strengthen the algorithm's search capability, this paper will modify the search strategy of the particles by combining the advantages of the iterative mechanism of lioness foraging in the lioness algorithm [22] to randomly select a particle from the population to assist the current particle in the global search, and it increases the exchange of information between particles, as shown in equation (16). In the process of local search, to help the particles jump out of the local optimum, a sinusoidal disturbance factor is introduced in this paper, as in Eq. (17): where x c i is the historical best position of a randomly selected collaborating partner from the remaining particles, α f is the disturbance factor, x max and x min are the maximum and minimum values of the particle activity space, c is a (0.1) uniformly distributed random number, and r 1 is a random number generated by the uniform distribution of (0, 2π).

Diferential Enhancement Strategy.
Te entire optimization process of the PSO algorithm is only driven by iterations of the historical optimal positions of individuals and society. Te evaluation results are only used as a measure of the optimization efect. Te results after each iteration cannot be efectively fed back to the population, so the population cannot make corresponding adjustments to the next operation based on the current search results. In order to efectively feed back each search result of the PSO algorithm to the population and make the population energetic, the population is required to perform corresponding transformations based on the optimization results to achieve the adaptive adjustment. Tis paper introduces the reconstruction probability, which is the probability that a particle is selected to rebuild the intermediate population based on the ftness value of the particle. According to the reconstruction probability, some poor individuals are selected in the optimization process of the PSO algorithm to build the intermediate population. By optimizing and enhancing some poor individuals, particles can accelerate their convergence to the global optimal solution.
Second, all operations of the population are only aimed at excellent individuals, which is easy to ignore other particle information in the population, especially for poor individuals. Te genes in poor individuals and excellent individuals often difer signifcantly, which leads to the gradual loss of genes, and the population is prone to falling into local extreme values and unable to escape. In order to continue the excellent genes of the poorer individuals in the intermediate population, this paper proposes a diferential enhancement strategy, which introduces particle irrelevance based on the genetic concept of "distant breeding, hybridizing to yield advantages." Particle irrelevance is used to calculate the selection probability. Te selection probability is used to identify the "distant" individuals for hybrid breeding, efectively ensure the inheritance of excellent genes, keep the population active during the search process, and avoid the algorithm from falling into local extremes.

Te Reconstruction Probability.
Assuming the ftness function value of the ith particle generation k is f k i , the reconstruction probability of the particle being selected to form the intermediate population is shown in the following equation: where i � (1, 2, 3, . . . , NP/3) and NP is the population size. Equation (18) indicates that when constructing an intermediate population, particles in the population are generated based on the reconstruction probability. When the particle ftness value is lower, the corresponding reconstruction probability is greater, and the probability of particles being selected to form an intermediate population is greater. Journal of Electrical and Computer Engineering

Te Irrelevance of Particles.
Assuming that X k i1,n and X k j1,n are the nth dimensional vectors corresponding to the i1th and j1th particle in the k-generation population, the irrelevance between individuals X k i1,n and X k j1,n is shown in the following equation: where t�(1.2, . . ., n) and t is the specifc one dimension of the nth dimensional vector. Equation (19) indicates that if there is a signifcant diference in the values of variables among individuals, irrelevance is greater.

Selection Probability.
Assuming that Pc(X k j1,n /X k i1,n ) is the probability that an individual X k j1,n will be selected and undergo diferential variation with X k i1,n , the selection probability calculation equation is shown in the following: where np is the remaining individuals in the intermediate population except the individuals X k i1,n and 1/np is the average probability of individuals being selected, and Ravg, R max, and R min are, respectively, the average, maximum, and minimum values of irrelevance between the remaining individuals and the selected individuals.
Te diferential enhancement strategy frst uses the reconstruction probability to reconstruct the intermediate population based on the roulette wheel to select some individuals with low ftness values and then randomly selects the diferential individuals in the intermediate population.

Implementation Steps and Pseudocode of APSO-SD.
Te APSO-SD algorithm uses a randomization method to initialize the population. After the population is generated, frst the current local and global optimal solutions of the population are calculated, the particle search perturbation mechanism are used to update the speed and position of the particles, and the population enters the local diferential enhancement stage. Second, the updated population ftness value is calculated, and the reconstruction probability is used to select individuals with lower ftness values to form an intermediate population. Finally, the average ftness value of the initial intermediate population is calculated, the selection probability is calculated based on the irrelevance of particles in the population, and two individuals with signifcant diferences from the mutated individuals are selected for mutation, crossover, and selection operations. Te calculation equation for the mutation operation is shown as follows: where x k r1 is the ith individual of the kth generation, which is the selected individual in the current population, x k r2 、x k r3 is a randomly selected diferent individual, x k r2 − x k r3 is a difference vector generated by two random individuals, and F is a mutation operator that controls the scaling scale of the diference vector. Reasonable scaling can balance the search step size and search rate of individuals. V k+1 i is a mutant intermediate of x k r1 in the (k+1)th generation. Te cross-operation calculation equation is shown as follows: where CR i is the crossover probability. When the random number rand (0, 1) generated by individual i is less than the crossover probability, the check vector U k+1 i,n selects the mutation intermediate individual v k+1 i,n ; otherwise, it inherits the parent vector x k i,n . Te selection operation calculation equation is shown as follows: When the ftness value of the test vector u k+1 i is smaller than that of the parent vector x k i , the algorithm selects the test vector u k+1 i to enter the next generation population, and the population undergoes a successful update in generation kth.
After the local diference enhancement of all individuals in the intermediate population is completed, the average ftness value of the intermediate population is recalculated and compared with the average ftness value of the initial population. If the ftness value is optimized, the algorithm returns to the particle swarm optimization algorithm, replaces the initial intermediate population with the enhanced intermediate population, and continues the particle swarm optimization algorithm. Otherwise, it continues to perform local diferential enhancement operations on these intermediate individuals until the ftness value of the intermediate population is optimized or the calculation times Journal of Electrical and Computer Engineering of the intermediate population reach a preset value and then ends the local diferential enhancement operation. When the evolution number of the entire particle swarm optimization algorithms reaches a preset value, the APSO-SD algorithm ends its optimization.
Te implementation steps of the APSO-SD algorithm are as follows: (1) Step 1: Initialize the population. (2) Step 2: Calculate the ftness value of each particle based on the speed and position of the current particle and obtain the individual historical best position Pbest and the global best position Gbest.
Step 3: Judge that the condition rand < P 1 is satisfed or not. If the condition is satisfed, the particle position is updated by equation (16). Otherwise, the particle position is updated by equation (17). Recalculate the ftness value for each particle and update Gbest and Pbest. Otherwise, continue with the particle swarm optimization algorithm. (10) Step 10: Determine whether the algorithm meets the termination conditions for the iteration. If the algorithm meets the termination conditions for the iteration, the algorithm jumps to the next step. Otherwise, the algorithm jumps to Step 3 for the next iteration optimization. (11) Step 11: Output the global optimal value and the algorithm ends.
Te pseudocode implementation of the APSO-SD algorithm is shown in Table 2.

5G User Time Delay Prediction Based on the BiLSTM Neural Network Optimized by APSO-SD
For BiLSTM networks, the selection of parameters in the structure is crucial to the efect of the model, such as the number of hidden layers, weights, the number of hidden layer cells, and learning rate. Many researchers determine these parameters based on experience or trial-and-error methods, which makes the robustness and accuracy of the model unreliable. Terefore, in this paper, a particle swarm algorithm with simple principles, low complexity, fast convergence, and suitable for dealing with real-valued problems is selected to optimize the structure parameters of the BiLSTM network.

Construction of the BiLSTM
Model. Te BiLSTM model used for the experiments in this paper is shown in Figure 5.
(1) BiLSTM layers: by two BiLSTM layers, their combined before-and-after capabilities can be fully exploited to enhance the model's learning capability
In order to reduce the infuence of the randomness of the PSO algorithm on the experimental results, the average of 30 independent run trials is used to evaluate the performance of the algorithm in this paper. For all the benchmark functions, the search dimension of the algorithm is set to 30 and 50 for experiments. Te maximum number of iterations per run is set to 3000.

Function Optimization Results.
In this section, a comparison of the PSO algorithms over six benchmark functions is carried out, and the mean values of the optimization results for each algorithm are shown in Tables 4-5.
It can be observed that whether D � 30 or D � 50, the quality and stability of APSO-SD search solutions are superior to the other seven algorithms for most functions, which indicates that APSO-SD uses a diferential enhancement strategy to maintain the diversity of the population and enable it to continuously search for the optimal solution. Te accuracy of the search solution is also better than that of other algorithms, which indicates that the search perturbation mechanism adopted by APSO-SD is conducive to maintaining the vitality of the population and the diversity of particles.

Te Efectiveness Verifcation of Individual Improvement Measures
(1) Search Perturbation Mechanism. In order to verify the efectiveness of the search perturbation mechanism (SPM), this section uses test functions f 2 and f 6 to verify the basic if P 1 > rand() then 13: x t+1 Constructing an intermediate population with a scale of (NP/3) through roulette 20:  Using the BiLSTM with optimal parameters for the time delay prediction Figure 6: Flowchart of the APSO-SD-BiLSTM model.
Holder table [− 16,32] Journal of Electrical and Computer Engineering 11 PSO, APSO-SD-SPM (cancel the SPM mechanism in APSO-SD), and APSO-SD. Te experimental results are shown in Table 6.
It can be seen from Table 6 that the variance of the ftness value of PSO in function f 2 and f 6 is 6.57E + 02 and 3.44E + 06 times of APSO-SD. When operating for 1000, In conclusion, the optimization results of APSO-SD-SPM are signifcantly improved compared to the basic PSO, but there is still a certain gap between APSO-SD and APSO-SD. Tis is because the APSO-SD and APSO-SD-SPM is through crossover and mutation to generate new individuals. However, the APSO-SD algorithm uses the search perturbation mechanism to increase the diversity of the particle population in each iteration, which makes it have better results in the process of optimization.
(2) Diferential Enhancement Strategy. To verify the efectiveness of the diferential enhancement strategy (DES), this section uses test functions f 2 and f 6 to validate the basic PSO and APSO-SD-DES (canceling the DES mechanism in APSO-SD) and APSO-SD algorithms. Te experimental results are shown in Table 7. Table 7 shows that compared to the basic PSO, the optimization results of APSO-SD-DES are signifcantly improved, but there is still a certain gap between APSO-SD-DES and APSO-SD. Tis is because both APSO-SD and APSO-SD-DES increase the diversity of particle populations through search perturbation mechanisms. However, the APSO-SD algorithm generates two populations through DE and PSO in each iteration, and it compares the two populations to select the best Pbest and Gbest, thus achieving better optimization results.
From the results of the previous two single validation experiments, it can be concluded that the perturbation strategy and the local distantly related diferential enhancement strategy proposed in this paper have signifcant advantages in both the convergence rate of the algorithm and the maintenance of the population diversity.

T-Test and Friedman Test.
In the comparative analysis of swarm intelligence optimization algorithms, researchers usually use the T-test [30], Friedman test [31], Wilcoxon signed-rank test [32], and Mann-Whitney U test [33] to compare the signifcant diferences between the algorithms. Based on this, this article selects the T-test and Friedman test to test the performance of eight algorithms on six test functions. Te experimental results are shown in Table 8.
Here, "+" indicates that the APSO-SD algorithm outperforms the other algorithms, " � " indicates that there is no signifcant diference between the algorithms, "− " indicates inferiority to the other algorithms, and w/t/l indicates the   Te T-test results show that APSO-SD has better performance on four test functions compared with that of the PSO-HS algorithm, and the two tests have no diference. Compared with that of PSONHM, APSO-SD has better performance on two test functions, three with no diference and one with worse performance. Compared with that of CS-PSO, APSO-SD has better performance on four test functions, one with no diference and one with worse performance. Te performance diference between SE-PSO and APSO-SD is signifcant. APSO-SD has better performance on four test functions, one with no diference and one with worse performance. APSO-SD has better performance on fve test functions and one worse compared to that of AERPSO. APSO-SD has better performance on fve test functions, one with no diference, compared to that of AdPSO. From the Friedman test results, it can be concluded that APSO-SD has the smallest rank mean value and the best performance compared with other algorithms.
Based on the previous experimental results, it can be seen that, to the single-peak and multipeak function, the APSO-SD algorithm can obtain high-quality optimization results. Compared with other algorithms, the proposed algorithm has better stability and search ability. Te algorithm alleviates the contradiction between precocity and convergence speed and balances the global search and local search efectively.

Dataset Construction.
In this section, the refective points are frst projected onto a 3-dimensional plane. Ten, the refective points are extended 1000 m in both the positive and negative directions on the X and Y axes with the base station as the center and 100 m above the base station in the Z-axis direction starting from the ground. Finally, all the three planes are divided into 64 * 64 grids to determine the intervals where the refective points are located and to build the dataset for neural network training.

Evaluation Indicators.
Te expression for the relative error in time delay can be given by the following equation: where X i is the predicted value of the network model, Y i is the actual value, and n is the number of users. Te relative error in the number of message packet retransmissions could be expressed as follows: where N max is the maximum value of the sum of the number of message packet retransmissions. Te expressions for the mean and standard deviation of the errors are shown as follows:

Parameter Selection.
In the APSO-SD BiLSTM model, the parameter settings of the APSO-SD algorithm are consistent with those in Section 5.1. Te number of input layer nodes in the BiLSTM neural network is 5, the depth of the hidden layer is 2, and the number of output layer nodes is 3. Tis section uses diferent time steps and batch sizes to train the APSO-SD-BiLSTM model. Figure 8 shows the training results of the APSO-SD-BiLSTM model. Te time step and batch size gradually converge to the optimal values as the algorithm is updated. Figure 8 shows that the batch size of the model training data is 2 and that the optimal time step is 5.

Wireless Side Network Latency Prediction.
In this section, network latency is frst predicted directly using the APSO-SD-BiLSTM model in the experiments. Ten, the number of message packet retransmissions predicted by the APSO-SD-BiLSTM model is processed as a regression task to predict network latency indirectly. Finally, the number of message packet retransmissions predicted by the APSO-SD-BiLSTM model is processed as a classifcation task to predict network latency indirectly. Among them, the number of training samples is 90,000 and the number of prediction samples is 9,000.
(1) Direct Prediction of User Network Latency. In this section, the APSO-SD-BiLSTM model is used to directly predict the network delay, and the predicted delay is shown in Figure 9. From Figure 9, it can be seen that the trend of the predicted value and the actual value is basically matched.
Te evaluation metrics for the network delay prediction results are shown in Table 9. As can be seen from Table 9, both the relative error and the mean value of the error are small.
(2) Regression-Oriented Tasks User Network Latency Indirect Prediction. Te number of message packet retransmissions predicted by the APSO-SD-BiLSTM model is treated as a regression task to indirectly predict network latency. Te prediction results are shown in Figure 10. As can be seen from Figure 10, the trend of the predicted and actual values is generally consistent.
Te evaluation metrics for the prediction results are shown in Table 10. Table 10 shows that the relative error and the mean value of the error for the indirect prediction of the network delay for the regression-oriented task are large compared to the results for the direct prediction of the network delay.
(3) Indirect Prediction of User Network Latency for Classifcation Tasks. Te number of message packet retransmissions predicted by the APSO-SD-BiLSTM model is treated as a classifcation task to indirectly predict network latency. Te prediction results are shown in Figure 11. As can be seen from Figure 11, the trend of the predicted and actual values is generally consistent, and only a few points difer.
Te evaluation metrics for the prediction results are shown in Table 11. Table 11 shows that the relative error and the mean error of the indirect prediction results for the classifcation-oriented task network time delay are slightly improved compared to the indirect prediction results for the regression-oriented task network time delay.
(4) Error Analysis for Predicting Network Delay. Te relative error probability distributions for direct prediction of network latency and indirect prediction of the number of message packet retransmissions are shown in Figures 12 and  13, respectively. From Figure 12, it can be seen that the relative error of direct prediction of network latency is concentrated at 1%. From Figure 13, the relative errors of indirect prediction of user network latency for regression task and indirect prediction of user network latency for classifcation task are not signifcantly diferent, and the relative errors of message packet retransmission number for both are concentrated at 10%.
Te results of the experimental comparison are as follows: (1) Compared with BiLSTM, PSO-BiLSTM, APSO-BiLSTM, IPSO-BiLSTM, and ASPSO-BiLSTM, the prediction results of APSO-SD-BiLSTM achieved better results in all three evaluation metrics. Tis is mainly because the APSO-SD algorithm has better outcome-seeking performance relative to the PSO algorithm and can fnd the relevant parameters that give the BiLSTM neural network better prediction performance. (2) Compared with APSO-SD-LSTM and APSO-SD-RNN, the prediction results of APSO-SD-BiLSTM have a large degree of improvement in all three evaluation metrics. Tis is mainly due to the fact that BiLSTM can better capture the premoment and postmoment dependence information of time-delay data compared to LSTM and RNN, which is conducive to improving the prediction accuracy of the temporal data.
(3) Compared with existing models such as CNN, CNNDP, AMCA, PABAFT, OCEAN, CNN-LSTM, and CNN-KF, the APSO-SD-BiLSTM model has the highest prediction accuracy and can efectively achieve the user's time-delay prediction. Tis is mainly because the APSO-SD-BiLSTM model makes full use of the dependence of user latency data on the latency data of the preceding and following  predictive value realdata Figure 9: Results of direct network delay prediction.

Conclusions and Future Work
In this paper, a 5G user time delay data prediction model is proposed based on the BiLSTM neural network optimized by APSO-SD. First, a large amount of delay data is obtained by using the delay simulation model based on the raytracing model and statistical model fusion. Ten, a user ray data feature model is proposed based on 3D stereo mapping. Finally, the 5G user network time delay prediction model (APSO-SD-BiLSTM) is carried out based on the BiLSTM neural network optimized by APSO-SD. Te experimental results show that the APSO-SD-BiLSTM model has better prediction accuracy than the existed prediction models, and it can efectively achieve the network delay prediction.
Te APSO-SD BiLSTM model proposed in this paper has two main limitations. First, the feature model of 5G user ray data proposed in this paper only considers the user's temporal characteristics; second, the APSO-SD BiLSTM model only focuses on optimizing the structural parameters of BiLSTM. Notwithstanding its limitation, the APSO-SD BiLSTM model can still efectively predict 5G user time delay.
In the future research, the following three aspects can be carried out: (1) On the basis of extracting ray refection point features, the temporal and spatial features of 5G user time delay data are fused and user time delay prediction is performed based on the fused features. (2) Te APSO for the joint optimization of the initial weight parameters and structural parameters of BiLSTM is used, and then, optimized BiLSTM to further improve the accuracy of user latency data prediction is used. (3) Te diference between 5G user delay characteristics and 6G user delay characteristics is studied, and 6G user delay based on the combination of the swarm intelligence algorithm and neural network is predicted.

Data Availability
Te data used to support the fndings of this study are available from the corresponding author upon request.

Conflicts of Interest
Te authors declare that they have no conficts of interest.