Cooperative Evolution Mechanism of Unmanned Swarm within the Framework of Public Goods Game

Institute of Command and Control Engineering, Army Engineering University of PLA, Nanjing 210007, China Institute of Communication Engineering, Army Engineering University of PLA, Nanjing 210007, China School of Computer of Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China Turbomachinery and Propulsion Technology Institute of Zhejiang Province, Hangzhou, Zhejiang 310000, China


Introduction
With the continuous advancing of the third wave of artificial intelligence, "group evolutionary intelligence" developed from "single-agent autonomous intelligence" has become one of the important characteristics of the new generation of artificial intelligence. Particularly in the military field, unmanned swarm (unmanned vehicle cluster [1], unmanned boat cluster [2], and unmanned aerial vehicle cluster [3]) operations have received unprecedented attention over the past two years. e US military has listed unmanned swarm operations as a "subversive technology" that can change the rules of war.
ere are mainly two kinds of control modes of unmanned swarm: centralized control and autonomous collaboration. Under the premise of good communication, the Command and Control (C2) center can implement centralized control on the swarm. However, in the complex electromagnetic environment of the battlefield, there is a real risk of communication failure [4]. In such a predicament, the centralized control mode fails, and the unmanned swarm must make effective response on the spot according to the external situation and achieve self-management and self-coordination. An issue that has led to considerable interest is how unmanned swarms autonomously and cooperatively complete established military operations. e sketch of autonomous cooperation of unmanned swarm is shown in Figure 1.
Overall planning and reallocation of operation resources (communication, firepower, intelligence, etc.) within the unmanned swarms is required when autonomous collaboration occurs. However, in the process mentioned above, there are often contradictions between individual partiality and swarm needs, which are difficult to reconcile. For example, in the fire strike task, the "rational" unmanned units with intelligence and decision-making ability will choose to "contribute" ammunition to the swarm as little as possible in order to maintain its combat effectiveness, while on the other hand, the more ammunition each unit contributes to the swarm, the higher the survival rate and the greater the combat effectiveness of the whole swarm will be. e contradiction between the two will lead to "tragedy of the commons" [5]; therefore, how to increase the number of units' willing to positively contribute ammunition to the swarm and avoid the tragedy has become a crucial and urgent problem in both technology research and practical application of unmanned swarm.

Related Work
Evolutionary game theory [6][7][8][9] combines "equilibrium" in economics with "adaptability" in biology to depict the process that individuals adapt to the external environment through learning, imitation, and trial-and-error under boundary rationality and asymmetric information. And the evolutionary game of public goods (PGG) [10] provides a basic theoretical framework for revealing the cooperative evolution mechanism and coping with the tragedy of the commons. PGG reflects that investors (collaborators) and hitchhikers (betrayers) play strategic games over time based on cost, multiplication factor, selection intensity, etc., which makes the proportion of collaborators and betrayers in the population change dynamically, and finally tends to an evolutionarily stable state (ESS). e research focus of PGG is to calculate the mathematical expectation of the proportion of collaborators in a population after multiround game, that is, the average abundance, and then analyze the relationship between average abundance and parameters (cost, multiplication factor, selection intensity, etc.) to achieve the ultimate purpose of manual control.
At present, there are two main research directions to solve the problem of swarm cooperation with evolutionary game theory: one is to study the evolutionary dynamics process and cooperation mechanism of spatial structured population such as complex network based on graph theory [11,12], and the other is to study the evolutionary stability state of well-mixed population and the dominant condition of cooperation based on Markov stochastic process [13,14].
For the former, the team of Professor Nowak from Harvard University theoretically deduced the evolution of population in spatial structure such as circle, random graph, and scale-free network and creatively proposed the relationship between the ratio b/c and the network average degree k. ey pointed out that the smaller the network connectivity is, the more conducive the cooperation in natural selection is [15]. en, they used the pair approximation theory to theoretically deduce the cooperation phenomenon on the regular lattice and obtained the boundary conditions for the generation and expansion of cooperation [16]. On the basis of the above achievements, further comparative analysis was made on the differences between homogeneous and heterogeneous networks in promoting cooperative behavior, and simulation results show that weak connection can better promote cooperative behavior on heterogeneous networks [17]. At the same time, other researchers studied the dynamic process of multiparty game on the graph, and simulation results show that spatial structured population can promote the occurrence of cooperation better compared with unstructured population. In the recent two years, the team of Nowak has applied the evolution dynamics of cooperation in spatial structure to social network, analyzed the critical conditions of cooperation behavior in human society [18], initially explored the trade-off between the evolution convergence probability and the evolution convergence time [19], and extended the cooperative evolution on structural population to weighted graph [20]. Other representative studies include literature [21,22] on the specific model of multiplayer snowdrift game, the relationship curves between the ratio b/c and cooperation level on the well-mixed population and the structured population, respectively, and the significant differences between the homogeneous/heterogeneous network and the unstructured population in the promotion of cooperation are compared.  For the latter, the representative researches are as follows: Wang at Wuhan University obtained the average abundance function of the snowdrift evolutionary game using the stable distribution of Markov chain and simulated the effect of parameters on the average abundance [23]. Based on the research work of Tarnita et al. [24], Du at Peking University obtained the inequality of strategic dominance conditions in two-party evolutionary game through strict mathematical derivation; besides, the simulation results show that the average abundance under weak selection intensity is independent of aspiration level [25]. e aforementioned work on swarm cooperation is of great theoretical and engineering value. We have also conducted an exploration on the cooperation mechanism of unmanned swarm, and the relevant results can be referred to in [26][27][28][29][30]. Nevertheless, there are still two shortcomings in the above achievements in solving the issue of cooperative evolution of unmanned swarms: first, it has not focused on the public goods game; that is, although there are similarities between the snowdrift game and public goods game [31], there are essential differences in the game mechanism; in addition, the cooperative evolution of unmanned swarm is the multiple interaction of combat units; that is, the evolution result is not only related to the strategy selection of single unit, but also depends on the strategy of other units in the swarm, which is characterized by multiplayer games [32,33]. So far, the academic community has mastered the payoff matrix [34] of the public goods game with multiplayer and has simulated the influence of different selection intensity [35][36][37] and threshold values [31] on the cooperation level. In particular, in literature [34], the authors derived a general average abundance formula of multiparty games in a finite population under aspiration-driven dynamics, which can be applicable to any multiparty game under the aspiration-driven dynamics of a finite population. However, the average abundance formula for specific public goods game is not mentioned, so the important work of this study is to obtain the average abundance analytic expression of public goods game based on the existing payment matrix.
Furthermore, the generalized evolutionary game model can be simplified as Markov chain + strategy update rules in finite population, so the average abundance function is also closely related to the strategy update rules. By studying strategy update rules in the framework of evolutionary game theory, one can differentiate between imitation processes and aspiration-driven dynamics [38]. In the former case, individuals imitate the strategy of a more successful peer [39]. In the latter case, individuals adjust their strategies based on a comparison between their own payoff and the value they aspire, called the level of aspiration [40]. Unlike the imitation processes of pairwise comparison, aspirationdriven updates do not require additional information about the strategic environment and can thus be interpreted as being more spontaneous [41,42]. In the complex battlefield environment, the information acquisition is incomplete, asymmetric, which requires the swarm to achieve selfmanagement and self-coordination, and this requirement just coincides with the aspiration-driven dynamics. Moreover, the existing results show that, in both prisoner's dilemma game and public goods game, the dynamic mechanism driven by aspiration can improve the average abundance value and promote cooperation more than the traditional imitation dynamics [43,44].
Aiming at the cooperative evolution mechanism of the unmanned swarm, we modelled the evolution process based on multiplayer public goods game framework and aspiration-driven update rule and then deduced the average abundance function of the model by analyzing the stable distribution of the Markov chain; on this foundation, we studied the influence of relevant parameters on the average abundance through theoretical analysis and numerical calculation; finally, we studied the effect of parameter adjusting on swarm cooperation via case study and discussed the corresponding solutions and advice to avoid "the tragedy of the commons."

Model Hypothesis
In essence, the autonomous collaboration of unmanned swarms is a game process of multiparty and multiround, which focuses on the autonomous allocation of public resources. erefore, we use multiplayer public goods evolutionary game to model the cooperative evolution of unmanned swarms. e mapping between the concepts of cooperative evolution in unmanned swarms and multiplayer public goods evolutionary game is listed in Table 1.

Framework of Multiplayer Public Goods Evolutionary
Game. It is set that the autonomous cooperation of unmanned units takes place in a well-mixed swarm of size N, and every unit has two alternative strategies, A and B. Every d units interact simultaneously to get their payoffs; i.e., they are in a two-strategy and d-player game. e strategy update procedure is as follows: (iii) At the end of each round of the game, the focal individual X evaluates the benefits under different strategy choices and then updates its strategy according to aspiration-driven dynamics. e above process is repeated until the proportion of a certain strategy tends to be stable in the whole population.
Obviously, the value of k determines the payoffs-a k and b k . All possible payoffs of a focal individual are uniquely defined by the number of A in the group, and the payoff matrix is as follows.

Mathematical Problems in Engineering 3
For any group engaging in a one-shot game, we can obtain each member's payoff according to Table 2. When X chooses strategy A, the total contribution by individuals to the swarm is kc + c, the total gain of the swarm is r(kc + c) multiplied by the profit coefficient r, and the gain of each individual is r(kc + c)/d. As the cost of X is c, the net gain of X is r(kc + c)/d − c. When X chooses strategy B, the total contribution by individuals to the swarm is kc, the total gain of the swarm is rkc, and the gain of each individual is rkc/d. As there is no cost in such a case, the net gain of X is rkc/d. us, the payoffs for A and B are

Expected Payoff for Strategies A and B.
In a finite wellmixed population of size N, groups of size d are assembled randomly, so the probability of choosing a group that consists of another k players of type A and d − k − 1 players of type B is given by a hypergeometric distribution [45]. For example, the probability that an A player is in a group of k other A's is given by where i is the number of A players in the population. e symbol C k n denotes a combinatorial notation, which is the number of ways to choose a k element subset from an n element set. e expected payoffs for any A or B in a population of size N, with i players of type A and N−i players of type B, are given by
Aspiration-driven dynamics focuses on comparing the payoff with aspiration level to make new decisions. Players need not see any particular payoffs but their own, which they compare with an aspiration value. e aspiration-driven dynamics coincides with the requirement of self-management and self-coordination of unmanned swarm in the case of incomplete information acquisition in complex battlefield. e level of aspiration, α, is a variable that influences the stochastic strategy updating. e probability of switching strategy is random when individuals' payoffs are close to the level of α, reflecting the basic degree of uncertainty in the population. When payoff exceeds α, strategy switching is unlikely. At high values of α compared with payoff, switching probabilities are high.
To model stochastic aspiration-driven switching (from strategy A to B), we can use the following probability function: e aspiration level, α, provides the benchmark used to evaluate how "greedy" an individual is. Higher aspiration levels mean that individuals aspire to higher payoffs. e intensity of selection, ω, provides a measure of how important individuals deem the impact of the actual game on their update. Let Δ � π A − α; if Δ � 0, then P A⟶B � 1/2, which means that individuals have the same preference for strategies A and B; if Δ > 0 (i.e., the individual payoff π A is higher than aspiration level α), then P A⟶B < 1/2, which means individuals prefer strategy A; if Δ < 0 (i.e., individual payoff π A is lower than aspiration level α), then P A⟶B > 1/2, which means individuals prefer strategy B. As for whether an individual updates strategy or keeps strategy unchanged in a certain round of game, it can be further determined by other algorithms, such as roulette algorithm.
In the same way, the probability of the focal individual updating from strategy B to A is In the aspiration-driven dynamics, at each time step, the number of strategy A, i.e., i, can only increase by one, decrease by one, or stay the same. When the number of strategy A increases by one, two subsequent events happen: first, a B strategy individual is selected from the population; then it does not satisfy with the payoff it obtains and switches to the strategy A. A similar process holds for the number of strategy B. erefore, the probability that the number of A individuals changes at one time step is Because there is a stable distribution in the Markov chain without absorbing state, the average abundance function of multiplayer evolutionary game can be derived based on the above state transformation equation.

Average Abundance Function
At present, most of the research on average abundance is based on digital simulation, but no strict mathematical expression is given. In this part, we first give the definition of the average abundance of unmanned swarm and then derive its mathematical expression by analyzing the stable distribution of the nonabsorbent Markov chain to support the subsequent simulation analysis in Section 5. erefore, the definition of average abundance 〈X A (j)〉 can be expressed as e key to calculating the average abundance is to determine the probability distribution ](j). For Markov chains without absorbing state, ](j) is just the stable distribution φ j (j ∈ [0, N]), and it satisfies the detailed balance condition [48]: (10) is just a definition formula, which cannot be directly applied to the actual calculation and analysis. Next, we will theoretically deduce the average abundance formula based on the detailed balance condition so as to reveal the quantitative relationship between the average abundance and related parameters (cost, multiplication factor, selection intensity, etc.) and provide a theoretical calculation basis for the subsequent characteristic analysis.

Function Deduction.
It can be derived from the detailed balance condition: Further, we induce and summarize the above formulas; then we get where h(i) � T + i /T − i+1 is the strategy dominant function. If h(i) > 1, that is, the increasing probability of strategy A is greater than the decreasing probability, it means that strategy A is dominant in the swarm; otherwise, strategy B is dominant. Since Inserting (16) into (10), 〈X A (j)〉 can be written as Equations (17) and (18) are just a general expression for the average abundance of multiplayer evolutionary games under aspiration-driven dynamics, and the specific application depends on a k and b k . erefore, the combination of equations (1)-(4), (17), and (18)

Evolutionary Game Analysis
On the basis of the average abundance of unmanned swarm obtained above, we will analyze the impact of cost c, multiplication factor r, and aspiration level α on it. Set the basic parameters N � 100, d � 15, c � 1, r � 1.3, and α � 1, and when calculating the impact of one parameter, others remain unchanged. In addition, in order to highlight the different influence degree of parameters on average abundance under different selection intensities, ω � 0, 5, 10, 15, 20 is selected in each simulation scenario.

Average Abundance With Respect to Cost.
It can be easily proved through mathematical induction from equations (1) and (2) that increasing c will increase a k and b k and then increase π A (i) and π B (i), resulting in the decrease of both in the case of increasing c, the change of h(i) and X A are difficult to directly determined. Next, we will give a set of numerical solutions to intuitively observe the interaction within a certain range, so as to reveal the influence of c on X A through simulation.
Select the interval c ∈ [0.9, 1.8] and draw the average abundance curve of the strategy A as follows.
As shown in Figure 2, as c increases, X A will monotonically decrease; when ω � 0, X A � 0.5 (i.e., the proportion of collaborators and betrayers in the swarm is balanced), while ω ≠ 0, X A increases with ω; moreover, with the decrease of ω, the influence degree of c on X A increases: Δ〈X A (ω � 20)〉 ≈ 0.028, while Δ〈X A (ω � 5)〉 ≈ 0.063.

Conclusion 1.
e increasing of cost will decrease the average abundance, especially when selection intensity is small.

Average Abundance with Respect to Multiplication Factor.
Similarly, in the case of r increasing, the change of h(i) and X A cannot be determined only by deduction. Select the interval r ∈ [0.9, 1.8] and draw the average abundance curve of the strategy A under different selection intensities as shown in Figure 3.
As the multiplication factor increases, X A will monotonically decrease, which means the phenomenon of "free riding" appears, resulting in the weakening of cooperation and the decline of X A ; moreover, with the decrease of ω(ω ≠ 0), the influence of r on X A increases:

Conclusion 2.
e increasing of multiplication factor will decrease the average abundance, especially when selection intensity is small.

Average Abundance with respect to Aspiration Level.
Select the interval α ∈ [0.9, 1.4] and draw the average abundance curve of the strategy A under different selection intensities as shown in Figure 4.
As the aspiration level increases, X A will monotonically increase, which means the rising of aspiration level makes it more difficult for betrayers to satisfy their expectations, and thus more betrayers transfer to cooperators; moreover, with the decrease of ω(ω ≠ 0), the influence degree of α on X A increases: Δ〈X A (ω � 20)〉 ≈ 0.002, while Δ〈X A (ω � 5)〉 ≈ 0.037.

Conclusion 3.
e increasing of aspiration level will increase the average abundance, especially when selection intensity is small.
According to the simulation results, c, r, and α have an impact on the curve trend of average abundance. When c and r increase, the average abundance decreases monotonically, while, with the increase of α, the average  abundance increases monotonically. e conclusions from the simulation provide a theoretical basis for the regulation of swarms in practical application. Based on the conclusions above, in the following section, a case study is provided to further reveal the cooperative evolution mechanism of the unmanned swarm.

Case Study
Fire strike is a typical task in unmanned swarm operation. Limited by the ammunition loading/mounting capacity, when the unmanned swarm carries out the fire strike task in case of failure of centralized control mode, the "rational" unmanned units with intelligence and decision-making ability will strictly control the ammunition launching/delivery quantity with "free riding" mentality, while from the perspective of the whole swarm combat effectiveness, we hope that each unit can provide as much ammunition as possible to ensure the overall strike effectiveness on enemy.
e key to coping with this contradiction is how to raise the proportion of cooperators in the swarm through self-regulation and self-coordination. Consistent with the above section, setN � 100, d � 15, c � 1, r � 1.3, and α � 1 and draw the basic curve (see Figure 5). Since X A < 0.5, this case is a nondominant case; that is, most units choose strategy B. erefore, we try to regulate relevant parameters to increase the average abundance of unmanned swarm and promote cooperation.
As in Figure 5, reducing the cost or increasing the aspiration level can raise the proportion of cooperators. However, increasing the multiplication factor will cause the average abundance curve to deviate downward from the basic curve, which is because increasing payoff of cooperators and betrayers by the same margin will make the "free riding" situation more serious. Consequently, we try to separate the multiplication factor of cooperators from that of betrayers, only increase the multiplication factor r A of cooperators (the multiplication factor r B of betrayers remains unchanged), and find that the average abundance curve deviates upward from the basic curve.
Furthermore, we simulate the average abundance under different r A (see Figure 6). When r A � 2, the average abundance is approximately equal to 0.5, which indicates that the proportion of cooperators and betrayers in the swarm is basically balanced. With the further increase of r A , when r A � 2.65, the average abundance will be greater than 0.5 at ω ≈ 10, while when r A � 2.9, the average abundance will be greater than 0.5 at ω ≈ 5. us, we can reach the following conclusions.
(1) e adjustment on r A can switch the dominant strategy, making the average abundance of strategy A greater than 0.5. (2) e lower the ω is, the more stringent requirement for r A will be, and the higher the ω is, the looser requirement for r A will be: X A (ω � 5, r A � 2.9) > 0.5, while X A (ω � 10, r A � 2.65) > 0.5.
In order to investigate the regulation sensitivity of different parameters, we simulate the affecting degree of unit variation of c, α, and r A on the average abundance. We select the simulation results with ω � 0, 5, 15 to be discussed, as shown in Figures 7(a)-7(c), respectively.
(1) When ω � 0, the average abundance is identically equal to 0.5, and thus the parameter regulation loses its effect (see Figure 7(a)). (2) When ω ≠ 0 and unit variation of parameters (i.e., Δ) is small (note that the threshold of Δ is related to ω: Δ ≈ 1.70| ω�5 , Δ ≈ 1.53| ω�15 ), the change in value of average abundance caused by adjusting α and c is much greater than adjusting r A (see Figures 7(b) and 7(c)). e regulation of α and c is more sensitive than that of r A . Mathematical Problems in Engineering (3) When ω ≠ 0 and Δ is large, the regulation effect of r A is much better than that of α and c. And the larger ω is, the more sensitive r A is; i.e., a small Δ r A leads to a large increasing in average abundance: Figures 7(b) and 7(c)).
To improve the average abundance, the ideal measure is to increase the multiplication factor, reduce the cost of cooperators, or both. However, in order to ensure the effectiveness of the operation in the actual battlefield, the cost is difficult to reduce or even increase. erefore, it is necessary to consider increasing both r A and c. Figure 8 shows the change of average abundance when r A and c increase at the same time (c increases by 50%, and r A increases by 69% and 73%, respectively). Accordingly, as long as r A increases by more than 73%, not only can the adverse effect of cost increasing on average abundance be offset, but also the cooperation in swarm can be promoted.
Unfortunately, the above regulation can only achieve a limited increase in the average abundance; that is, it cannot make the average abundance greater than 0.5. According to the conclusions from Figures 5 and 6, the conversion of dominant strategies (a large increase in average abundance) depends on a large selection intensity ω and a large unit variation Δ r A , and thus we further increase r A under the premise of increasing c by 50% (see Figure 9). According to  the results in Figure 9, when r A � 2.52 and ω ≈ 15, X A will be greater than 0.5; when r A � 2.65 and ω ≈ 5, X A will be greater than 0.5. e increase of r A means that the hitchhiker will no longer get as much payoff as the cooperator, and the decrease of payoff will directly increase the strategy update probability P B⟶A , so then more units tend to cooperate (more betrayers transfer to cooperators).
According to the above simulation results and conclusions, we can consider the following measures from two dimensions of management and technology in the actual control of unmanned swarm cooperation: (1) Increase the multiplication factor value r A of cooperators as much as possible. For example, with the help of advanced management means, for each combat unit in the swarm, its investment (i.e., cost c) in previous operations can be accumulated, and those with higher cumulative investment will be given more supplies (e.g., ammunition) or higher supply priority in the follow-up operations.

Mathematical Problems in Engineering
In addition, since c,r, and α are closely related to specific operation tasks, it is also necessary to discuss specific control measures in combination with operation tasks under the limitation of parameter value range.

Conclusion
e advantage of unmanned swarm operation lies in its autonomy that it can continually conduct cooperative operation efficiently in case of combat unit damage or communication failure.
is work aims at the autonomous collaboration of the unmanned swarm under the failure of centralized control mode and proposes a cooperative evolution mechanism within the framework of multiplayer public goods evolutionary game. We get the average abundance function by theoretical derivation and then simulate the influence of different parameters (i.e., c, r, and α) on the abundance. e simulation results of unmanned swarm fire attack show that increasing the multiplication factor r A and reducing the cost c can improve the average abundance of cooperators; furthermore, when the unit variation Δ is large, r A not only has a high regulation sensitivity, but also can realize the switching of the dominant strategy. Finally, we suggest some proposals to provide an exploration for the transformation from theory to application. e evolution of cooperation is a fascinating topic that has been studied from different perspectives and theoretical approaches. Our approach by means of multiplayer public goods evolutionary game sheds new light on how to study and analyze the evolution of cooperation in the unmanned swarm. In our work, we assume that the units in the swarm are homogeneous, which indicates a globally consistent α in the process of strategy updating. However, in reality, different units (firepower units, intelligence units, etc.) probably have various requirements for α. us, how to get the average abundance and explore the cooperative evolution mechanism when multiple α coexist will be our further work.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this paper.