Research on Biological Population Evolutionary Algorithm and Individual Adaptive Method Based on Quantum Computing

On the basis of classical computer, quantum computer has been developed. In dealing with some large-scale parallel problems, quantum computer is simpler and faster than traditional computer. Nowadays, physical qubit computers have many limitations. Classical computers have many ways to simulate quantum computing, the most effective of which are quantum superiority and quantum algorithm. Ensuring computational efficiency, accuracy, and precision is of great significance to the study of large-scale quantum computing. Compared with other algorithms, genetic algorithm has more advantages, so it can be more widely used. For example, strong adaptability and global optimization ability are the advantages of genetic algorithm. Through the research in Chapter 4, we can conclude that the variance of A2C is obviously smaller than that of PPO. Furthermore, it can be concluded that A2C has better robustness.


Introduction
The purpose of this paper is to summarize all the important aspects and results affecting the field of quantum computation and quantum information. First, we are familiar with the characteristics and principles of quantum mechanics and provide more effective and secure information processing. Then, their applications in general information theory, cryptography, algorithms, computational complexity, and error correction are discussed [1]. If one day the bits of a computer are reduced to the size of a single atom, the quantum mechanical effect may profoundly change the nature of computing itself [2]. This special issue on mathematical structure in computer science contains several contributions related to modern quantum information and quantum computing [3]. Computer is usually regarded as a generalpurpose computing device. In other words, it is believed that it can simulate any physical computing device at a computational time cost of up to one polynomial factor [4]. Quantum information is stored in a state containing multiple quasiparticles, which have topological degeneracy. The unitary gate operation required by quantum computation is realized by weaving quasiparticles and then measuring the states of multiple quasiparticles. The fault tolerance of topological quantum computer comes from the nonlocal coding of quasiparticle state, which makes quasiparticle immune to errors caused by local disturbance [5]. Decoherence in quantum computer is expressed by semigroup method. It is proved that these subspaces are stable to perturbation, and it is possible to carry out universal quantum computation in these subspaces [6]. It has been proved that the combination of genetic algorithm and K-nearest neighbor classifier can extract protein-water binding information from X-ray crystal protein structure data well [7]. Combining the advantages of traditional real genetic algorithm and structured genetic algorithm, this paper proposes a real structured genetic algorithm to deal with biological logic circuit design problems [8]. The interference cancellation system is designed by using LM algorithm of artificial neural network with adaptive learning ability, and the number of hidden layer neurons, learning rate, and momentum factor of neural network is optimized by using swarm intelligence algorithm [9]. We use the main biological learning methods instead of machine learning to adjust synaptic weights. We use evolutionary algorithm to optimize the learning meta-parameters. These algorithms are implemented in parallel and use the island model approach to achieve sufficient speed [10]. In this paper, the biological evolutionary algorithm is combined with other artificial intelligence algorithms, and the adaptive and self-learning ability of improved support vector machine optimized by genetic computing is used to predict the early failure of gears [11]. The algorithm proposed in this paper is based on the concept of variable step-size control. We prove that this weakness of variable step size control can be avoided by making relatively small changes to the algorithm, and the derandomization scheme of variable step size control is helpful to reliably adapt to a single step size [12]. We attempted to analyse the interaction networks leading to individual differences in perceived odor quality by assessing the effects of self-adaptation and cross-adaptation on odor intensity and quality reporting in 18 human subjects [13]. In this paper, we will study a generalized evolutionary minority game model, in which agents are divided into several groups, and the performance of each agent is averaged in each group. We find that there are three different effects in the generalized model: (1) group average effect, (2) left-right asymmetry effect, and (3) self-interaction effect [14]. The statistical results show that the positive coping style of night shift nurses significantly positively predicts the positive self-suggestion of sleep and negatively predicts the importance of sleep. Negative coping style can negatively predict the positive self-suggestion of sleep, while coping style can well predict the cognitive aspect of individual sleep adaptation [15].

Genetic Algorithm
2.1. Standard Genetic Algorithm. The standard genetic algorithm has two important characteristics for simulating the whole learning process of the population, which are the population search strategy and the information exchange among individuals in the population. First, create a specific function to test the adaptability of individuals. The algorithm steps are as follows: (a) Randomly select N starting points to form a group.
The total number N is the size of the whole population. In order to express it conveniently, it is neces-sary to use a specific coding form to represent each individual in the population. An encoded content may represent a specific feature of a chromosome, and the content it represents varies according to the type of problem to be solved (b) Choose suitable individuals according to certain selection requirements. We need to select the M individuals with the best adaptability within the standard to form a new population to breed the next generation (c) Among the selected M individuals, pairwise grouping is carried out, and then each group is hybridized and recombined. Hybridization is an important way for two chromosomes to exchange information (d) If the situation requires, we can select some suitable individuals from M individuals by the given mutation probability, and then carry out mutation operation on the suitable individuals according to certain requirements. Mutation operation plays a very important role in the ability of genetic algorithm to find the optimal solution (e) It is judged whether the bundle condition of the junction is met, if not, it returns to process B, continues the process, and if the condition is met, it ends the process. Its basic workflow is shown in Figure 1 2.2. Adaptive Genetic Algorithm. Its most important idea is to adjust the size of Pc, Pm at any time according to the actual situation of the population in the evolution process. That is, when a population converges, reduce Pc and increase Pm, that is, increase the probability of mutation and reduce the probability of hybridization, thus ensuring the diversity of the population and avoiding "premature maturity;" when the population deviates, Pc is increased and Pm is decreased, that is, the probability of hybridization is increased and the probability of mutation is reduced, so that the population tends to converge and the concentration rate of the algorithm is increased.

Wireless Communications and Mobile Computing
The index that defines the degree of "premature maturity" of population is Δ. F t max Is the fitness of the optimal individual, and the difference Δ between F t max and F t max is defined as follows The "premature maturity" degree of the population can be obtained by the size of formula (1). According to the index Δ, a new strategy is proposed, so that Δ can influence the values of hybridization probability Pc and mutation probability Pm. Its mathematical description is shown in the following formulas where k 1 , k 2 > 0. Since Δ is always greater than or equal to 0, the value range of P c is between [0.5, 1] and the value range of P m is between [0, 0.5]. It can be seen from the above formula that Δ can affect the value of Pc, Pm in the evolution process: when Δ increases, Pc increases, Pm decreases, and the ability to develop excellent individuals increases.

Quantum Machine Algorithm
3.1. Quantum Nearest Neighbor Algorithm. QNN algorithm firstly loads the training sample set v j ðj = 1, 2,⋯,MÞ and the sample to be classified v 0 ≔ u onto the quantum state by probability amplitude coding, as shown in the following formulas. v where v ji = r ji e iφji ; r max is the upper bound of eigenvalue; r ji is a number greater than 0. You can get the square of the inner product of jvi and jui as shown in the following formula Although Euclidean distance is widely used to calculate the Euclidean distance of quantum states, the experimental results show that the method using Euclidean distance is more complicated than the inner product method.
3.2. Quantum Support Vector Machine Algorithm. The combination of quantum computing and support vector machine is to use the second acceleration effect of Grover search algorithm to deal with the optimization calculation of SVM model at the earliest. QSVM algorithm first encodes j eigenvectors of each data sample point x i = ðx i1 , x i2 ,⋯,x ij Þ , j = 1, 2, ⋯m into quantum states by probability amplitude coding, as shown in the following formula.
where jx i j −1 is the normalized vector and m is the feature dimension. The quantum state preparation of the training set is shown in the following formula.
where ffiffiffiffiffiffi N χ p = ∑ n i=1 jx i j 2 and x i are the i training samples. The inner product operation K ij = x i ⋅ x j of the training data set can be obtained by solving the offset trace of the density matrix jχihχj, as shown in the following formula. where The above formula can be transformed into solving a linear equation, as shown in the following formula.
where K ij = x i ⋅ x j is the kernel matrix, ðb, αÞ T can be solved according to HHL algorithm to obtain the following formula. where Finally, the quantum state jũi and the quantity jxi to be classified are constructed, and the class of jxi can be obtained by calculating the similarity between the two quantum states by using the control exchange gate, as shown in the following formulas x where Nũ = b 2 + ∑ M k=1 a 2 k jx k j 2 , Nx = Mjxj 2 + 1.

Wireless Communications and Mobile Computing
3.3. Quantum Linear Regression. The purpose of quantum linear regression algorithm is to use quantum algorithm to solve the optimal fitting parameter w, as shown in the following formula where y = ðy 1 , y 2 ,⋯,y n Þ T ; X = ðx 1 , x 2 ,⋯,x n Þ T is the data matrix. In order to calculate w more conveniently, formula (14) is decomposed into two formulas, and the two formulas are solved, respectively, namely, y ′ = X +y and w = ðX + XÞ −1 y ′ . For y ′ = W + y, the matrix y is first loaded on the quantum state by probability amplitude coding, as shown in the following formula Since matrix X is not a Hermite matrix, we will transform it into a Hermite matrix, as shown in the following formula At this time, y′ = X + y can be transformed into jy′i = IðX Þjyi to solve. For w = ðX + XÞ −1 y ′ , the matrix ðX + XÞ −1 needs to be transformed into the Hermite matrix IððX + XÞ −1 Þ, as shown in the following equation At this time, for w = ðX + XÞ −1 y′, it can be transformed into solving jwi = IððX + XÞ −1 Þjy ′ i. jy ′ i = IðXÞjyi and jwi = I ððX + XÞ −1 Þjy ′ i solve the transformed formula by quantum algorithm and then get the quantum state jwi of fitting parameters.

Quantum K-Means
Algorithm. In 2013, the quantum version of the nearest center algorithm was proposed, which can be regarded as a subprocess of the quantum K-means algorithm and can accelerate the classical algorithm exponentially, as shown in the following equations.

Return value Description
Observation Returns the pixel picture of the video game Reward The scale of the reward for the previous action varies according to the environment Done Used to determine whether a reset environment is required for the surface. When the value is true, the surface game ends Info Contains diagnostic information for debugging

Breakout
Initialization has 5 lives. Move the flat plate to make the ball rebound and break the color block to score. The ball falls into the bottom line and loses a life. The more the color block is hit, the faster the rebound speed is.

MsPacman
Initialize three lives, avoid the enemy, hit the enemy, and lose one. Swallow the pink square on the blue road to get a reward. The enemy will become weak periodically. At this time, you can swallow the enemy to get a higher reward.

Qbert
Initialize 3 lives, jump on the pyramid to light up each square to get a reward, all light up to enter the next level to give a large reward, jump outside the pyramid, or hit the enemy will lose a life.

Seaquest
Initialize 3 lives, destroy sharks and enemy submarines to get rewards, avoid sharks and enemy submarines and launch torpedoes, hit and lose one life, submarine runs out of oxygen and loses one life, and returns to the sea without saving people to lose one life SpaceInvaders Initialize 3 lives, destroy the enemy to get rewards, and lose one life when hit by the enemy. The enemy will invade at a faster and faster speed. If the enemy reaches the bottom line, no matter how many lives are left, the game will end.
where Z = juj 2 + 1/M∑ j jv j j 2 . Quantum K-means algorithm includes minimum quantum algorithm and phase estimation algorithm. Its application can greatly reduce the overall computational complexity.

Quantum Principal Component Analysis.
On the feature space, ρ can be expressed as the following formula.
where λ j is the eigenvalue; u j is its corresponding eigenvector. The most critical part of QPCA is the construction of controlled e −iρt . The method of constructing e −iρt can be expressed by the following equation: where tr p is the bias trace for the first variable; matrix S is a commutative operator, δ = juihuj is a feature vector. Thus, e −iρt can be realized by partial tracing operation with the aid of N replica density operator ρ and a sparse commutation matrix S. Controlled e −iρt can be realized by applying unitary operator ∑ n jnΔtihnΔtj ⊗ Q n j=1 e −iS j Δt to jnΔtihnΔtj ⊗ δ ⊗ ρ ⊗ ⋯ ⊗ ρ, and then performing partial trace operation on each subsystem, as shown in the following equation:

Experimental Research on Deep Reinforcement Learning
Deep reinforcement learning has the characteristics of various methods, extreme sensitivity to super parameters and great randomness. Even the same algorithm will bring about performance differences due to different implementation methods or different test environment platforms. The best way to integrate these problems is to hope that the domain can unify the model and test environment, so that the domain can replicate and improve and try new ideas. Therefore, many research institutions have begun to develop their own algorithm models and test environments.
According to the algorithm model, DeepMind, OpenAI, and Baidu have all opened up some self-implemented deep reinforcement learning algorithms and applications, among which OpenAI is the most abundant, including the highquality implementation of nine algorithms: TRPO, PPO, HER, GAIL, DQN, DDPG, AVKTR, ACER, and A2C, which is also the preferred benchmark code for researchers in the field at present. For the test environment, there are mainly OpenAIGym, MuJoCo, rllab, DeepMindLab, TORCS, PySC2, and so on. Among them, MuJoCo is mainly a robot control problem, Gym, rllab, and DeepMindLab are similar, mainly focusing on video games, TORCS is a racing car control problem, and PySC2 is the environment of StarCraft II. At present, Atari video game environment is widely used in the field of machine game, and Gym interface is usually used.
Deep learning of machine games is divided into many parts, the most important of which is to interact with some simulators. If the algorithm supports multiprocess mode, the training period for obtaining state information will be shortened. Therefore, A2C, DDPG, and PPO, which support multiprocess mode, are currently the most used algorithms in the field of machine game. In this section, the A2C, DDPG, and PPO algorithm for some related experimental research and analysis, to provide reference for scholars in the field.

Test Environment.
Related experiments will be conducted in Atari 2600 video game environment. There are  BeamRider  2711  4302  5408  6495  Breakout  227  290  376  413  MsPacman  2022  2357  3051  3762  Qbert  10737  15061  18211  21695  Seaquest  1072  1642  1850  1881  SpaceInvaders  782  1536 2021 2137  Gym environment is usually needed for interaction in reinforcement learning. The specific step is to enter the action produced by the agent into the function step, and then you will get four return values: done, reward, observation, and info, as shown in Table 1.
It is used by agent in a very simple way. In order to ensure the reproducibility of the experimental results, all the video game experiments involved in this paper are completed on Gym platform. In this paper, six popular classic video games: BeamRider, Breakout, MsPacman, Qbert, Seaques, and Spacelnvaders are selected as specific test environments, and the task description of each game is shown in Table 2.

Experimental Setup and Configuration.
In order to ensure the reproducibility of the results, this paper uses A2C and PPO provided by baselines of OpenAI as benchmark experimental codes, Gym version is 0.10.5, baselines version is 0.1.5. All the experiments were completed on a PC with the following configuration: CPU: Inter (R) Core (TM) i9-7900X @ 3.3 GHz, GPU: NVIDIAGTX1080Ti, and memory: 32 G. All the experiments were repeated five times under its related settings.

Performance Analysis of Reinforcement Learning for
Machine Game. As OpenAI updated the V4 version of Gym and announced that the * NoFrameSkip-v4 environment is considered the standard Atari environment, it was reminded that the V4 version will not provide the same experimental results as the previous V3 version. The above problems lead to the fact that when researchers need to do comparative experiments, they often need to reexperiment in the new version environment, which is very timeconsuming. At the same time, our research found that there is no brand-new baseline result for industry reference. In addition, different researchers often need the performance results of benchmark algorithms under different computational budgets, but there is no complete and detailed experimental results under different computational budgets at present. In this paper, six mainstream classic video games are used as the test environment, and the latest Gym version is used to do standard experimental tests on A2C, PPO, and DDPG under 10 M, 20 M, 30 M, and 40 M calculation budgets, respectively, in which the default number of processes is 16. The final performance data of its model are shown in Tables 3-5.
Sampling data tables of A2C, PPO, and GGPG algorithms with different thread numbers are shown in Table 6.
In addition, we also give the performance comparison diagrams of A2C, PPO, and DDPG under different environments and different computing budgets, as shown in  By observing graph 2, graph 3, graph 4, and graph 5, we can conclude that PPO is the largest in any environment, DDPG is the middle, and A2C is the smallest, which is particularly easy to observe in Qbert.
We use the sampled data of A2C and PPO algorithms with different thread numbers to draw an efficiency diagram as shown in Figure 6.
By observing the data table and the schematic diagram of the performance curve obtained from the experiment, some new discoveries are also obtained in this paper. For example, in some test tasks, the performance of A2C will be lower than PPO and DDPG in the early stage, but with the increase of computing budget, the performance will exceed PPO and DDPG. However, in PPO's paper, only the experiment was carried out at the calculation cost of 10 M, and it was concluded that its effect was better than A2C, which is not rigorous and correct at present. Usually, due to the long training time of video games and the limitation of computing resources, researchers often only conduct experiments under a small computing budget, so it is difficult to find this 7 Wireless Communications and Mobile Computing phenomenon, which also provides a new focus for the comparison of algorithms in the future. In addition, it can be found that the variance of A2C is significantly smaller than that of PPO algorithm in almost all test environments, especially in Seaquest and Spaceliwaders games. Therefore, it can be considered that A2C has better robustness.
In this chapter, we first briefly introduce the background of reinforcement learning and deep learning, which leads to the deep reinforcement learning method. On this basis, the deep reinforcement learning methods for machine game are systematically investigated, including reinforcement learning methods based on value function and reinforcement learning methods based on strategy gradient. In addition, hierarchical reinforcement learning and multiagent reinforcement learning are introduced. In the experimental research part, we briefly introduce the current mainstream code base and standard test platform for deep reinforcement learning. Finally, the mainstream reinforcement learning algorithms are studied experimentally, the performance of A2C, DDPG, and PPO under different computational costs and the specific training curves is given, and the experimental results are analyzed.

Conclusion
In this paper, the current situation of biological population evolutionary algorithm, quantum machine learning algorithm, and deep reinforcement learning is analyzed and studied. This paper analyzes that in the upcoming era of "quantum computing," we are facing many unprecedented opportunities and challenges: how to cross-integrate " biological population evolutionary algorithm + individual adaptation + quantum computing " to quantitatively and accurately describe the uncertainties of big data; in the coming era of "quantum computing," how to make use of the advantages of parallel and efficient processing of quantum computing to efficiently deal with the complexity problems of biological population evolutionary algorithm and individual adaptive method. These opportunities and challenges will become important problems to be solved urgently in the era of "quantum computing" and will also become research hotspots in the field of big data in the era of intelligence.

Data Availability
The experimental data used to support the findings of this study are available from the corresponding author upon request.