Using Genetic Algorithms to Represent Higher Level Planning in Simulation Models of Conflict

The focus of warfare has shifted from the Industrial Age to the Information Age, as encapsulated by the term Network Enabled Capability. This emphasises information sharing, command decision making and the resultant plans made by commanders on the basis of that information. Planning by a higher level military commander is, in most cases, regarded as such a difficult process to emulate, that it is performed by a real commander during wargaming or during an experimental session based on a Synthetic Environment. Such an approach gives a rich representation of a small number of data points. However, a more complete analysis should allow search across a wider set of alternatives. This requires a closed form version of such a simulation. In this paper we discuss an approach to this problem, based on emulating the higher command process using a combination of game theory and genetic algorithms. This process was initially implemented in an exploratory research initiative, described here, and now forms the basis of the development of a 'Mission Planner', potentially applicable to all of our higher level closed form simulation models.


Introduction
Since the Cold War period, the scenario context has widened considerably, reflecting the uncertainties of the future.
Moreover, decision cycles for our customer community in the UK Ministry of Defence (MoD) have significantly shortened.The focus of war has also shifted from the Industrial Age of grinding attrition to the Information Age, as encapsulated in the term Network Enabled Capability (NEC).NEC is a key goal for the MoD, with the emphasis on command, the sharing of awareness among commanders, and the creation of agile effects.These influences together have led to the need for simulation models which are focussed on command rather than equipment, which can consider a large number of future contexts, and which can robustly examine a number of 'what if' alternatives (Taylor and Lane, 2004).
In response to these demands, we have built a new generation of simulation models, with command (and commander decision making in particular) at their core (Moffat, 2000).These span the range from the single environment (e.g. a land only conflict at the tactical level) to the whole joint campaign, and across a number of coalition partners (Moffat, Campbell and Glover, 2004).They also encompass both warfighting and peacekeeping operations.These models have been deliberately built as a hierarchy, feeding up from the tactical (or systems) level to the operational (or system of systems) level, to give enhanced analytical insight, as shown in Figure 1.

WISE
As part of these development activities, we have constructed a stochastic wargame called 'WISE' (Wargame Infrastructure and Simulation Environment).As the name suggests, this is more that just a single model, and in fact provides a modelling infrastructure from which a number of tailored models can be created.The key development thus far has been the wargame itself (Robinson andWright 2002, Pearce et al 2003).However, a logistics simulation has also been developed and is being used to examine vehicle reliability and consequent repair.
The model addresses a previous gap in modelling capability relating to the representation of command decision making, and has utilised where possible novel techniques to represent this key aspect of Network Enabled Capability.The wargame represents operations up to Army Division level.Army commanders play the roles of Division and Brigade commanders in the game, on both sides, and they are supported by an underlying simulation environment which represents the evolution of events.The Synthetic Environment (SE) representation exploits the Rapid Planning process (Moffat, 2002) to determine the decisions made by the lower level commanders that are not explicitly represented by players.We define a Synthetic Environment (SE) as consisting of real and simulated people interacting with simulated environments.In contrast, a closed form constructive simulation consists of simulated people (i.e. computer algorithms) interacting with simulated environments, with no human intervention during the model run.Synthetic Environments are particularly good at exploring new situations and future contexts.

Tactical Level
In problem exploration, SEs give a rich understanding (i.e.qualitative information) of a small set of possible options.
The number of options which can be explored, however, is limited due to the high cost and time required to stage such wargaming events.In order to allow us to explore around these initial options, and thus develop a wider understanding of their robustness (a key aspect of understanding force 'agility') we needed to develop a closed form discrete event simulation equivalent of the WISE wargame -in essence replacing the human players by some form of artificial intelligence representation, to allow the running of the scenario without human intervention.This was done by exploiting the Deliberate Planning Process, an algorithmic representation of higher level command based on a combination of game theory and genetic algorithms.Our implementation was exploratory, and a test of the feasibility of generating 'sensible' higher level plans, within a realistic conflict context, using genetic algorithms rather than expert military players.Using the same model as both an SE and a closed form constructive simulation has the additional benefit that the algorithms derived for planning in the closed form model can be calibrated by running experiments using expert military players in the SE version of the same situation.

Deliberate Planning Process
The Deliberate Planner emulates the 'formal estimate process' whereby a high level commander develops an overall plan for the campaign.At this level of the command process, a 'Blue' (friendly) commander considers a number of potential courses of action, taking into account his intent (i.e.his primary goal or objective), and the intent of the enemy ('Red') force.The algorithms which we have implemented to represents this process firstly develop a 'picture' of the layout and intent of the enemy force, based on sensor inputs and a Bayesian approach to information updating.On the basis of this 'picture', the planner then decides on a layout of the friendly force which best achieves the commander's goals.It does this by 'breeding' plans in an innovative way, using a genetic algorithm, and then selecting a plan with a high 'fitness' level.Our approach to using genetic algorithms is based on that of (Goldberg, 1989).
A plan, in this context, is an allocation of forces to different potential areas of operation across the whole 'theatre' of operations.This is turned into a 'chromosome' by expressing each allocation to a specific region in binary terms, so that the plan is then a string of binary numbers (i.e. a string of 0s and 1s).The fitness of the plan is calculated using a number of historical analysis equations, exploiting the approach of (Rowland, 2006), which relate force layout to potential campaign outcome.These allow the model to calculate aspects of the plan such as the likely level of casualties (own and opposing forces), the likely rate of advance towards the objective, the probability of breakthrough, and the probability of overall success.These are individual contributors to the plan fitness function.They are weighted to allow for the representation of different 'styles of command' (for example a risk averse commander might put a high priority on keeping casualties to a minimum, while another commander might put more priority on getting to the objective).The fitness value also reflects the style of the commander through a game theory approach which seeks to maximise, across the different areas of operation, the Blue commander's minimum payoff in each such area, taking account of the courses of action (the strategies in our game theory formulation) available to the Red opponent.This is then a maximin solution corresponding to what we then term 'cautious command'.Alternative formulations of the fitness function attempt an even spread of risk (median command) or attempt to maximise his maximum payoff (bold command) (Moffat, 2002).
Bold command appears to be a high risk strategy.However, when played through the modelling environment, it can give rise to very 'manoeuverist' plans which can catch the opponent out.
The initial 'gene pool' of the genetic algorithm corresponds to 100 'random plans'.Each of these plan 'chromosomes' is represented by a random string of 0s and 1s.In general, the fitness values for these initial plans will be low, and we need to evolve this set of plans in order to breed a plan which is 'sufficiently good' (as measured by the plan fitness function).As a first step, all of these initial plans are evaluated, arranged in rank order of fitness, and then randomly selected for pairing, with higher ranked chromosomes being more likely to be chosen ('survival of the fittest').
Crossover operators act on these pairs.The probability of such a crossover being applied is user definable (with a default value of 0.7).Only a single fixed crossover point is used.If employed, the crossover operator then swaps the tails of the two chromosomes.We also employ a mutation operator, corresponding to the possibility of flipping a 0 to a 1 or vice versa, when applied to the binary representation of a particular plan.The probability of this occurring is also user definable, with a default value of 0.033 (Moffat, 2002).The gene pool is then updated across a number of generations.A form of 'elitism' is applied in which the best plan thus far, across the generations, is carried forward to the final stage.Here the best plan carried forward, together with the best plans from the final generation, are considered, and a final choice of plan is made.

Testing the Algorithm in WISE
In order for a plan to be generated there is an implicit assumption that the unit undertaking Deliberate Planning has an understanding (or 'picture') of what is happening around it in the model.This is derived from a number of sensor platforms or units which are feeding information in to allow the Recognised Picture to be compiled.At the beginning of a run an 'Assess Current Situation' task is called which sends out the initial orders to the sensors to search for information updating the picture.An intelligence fusion process and possible additional tasking of sensor units to add further information is then carried out to further build up the picture, and allow an analysis of enemy intent and likely courses of action (i.e.Red strategies in the game theory sense) to be completed.All of the sensor acquisitions are made using the 'Surveillance and Target Acquisition' model in WISE which are passed up the command chain.When a sensor asset completes a search of its tasked zone a 'fused' set of acquisitions is passed into the Intelligence Fusion process, and a new order is generated for that sensor asset.
As already discussed, a number of cycles of intelligence fusion are required in order to build up a suitable picture against which to create a plan.Two criteria are specified in the data that determine when intelligence fusion is deemed to be complete enough for planning: (a) the number of times that specified zones must be searched, or (b) a time period.
The first of these criteria to be realised is used to initiate the plan generation task.We also normally assume that one side in the model is attacking, and the other defending, with the attacker (either Blue or Red) being the first to formulate a plan, followed by the defender.
Once started the Blue plan generation process takes account of the likely Red strategies, and possible own Blue strategies, together with the assumed style of command (bold, median or cautious) in order to determine the course of action to adopt and hence the orders that need to be issued.A plan (a Blue strategy) is a force allocation to a number of areas of operation, and this is evaluated using the plan fitness function, given the possible set of Red strategies.The initial set of Blue plans is then 'bred' using the genetic algorithm, as previously described, to determine the best plan to adopt.Once this process is complete, a set of orders are generated and picked up by the interface classes to be translated into the orders required to task units within WISE.
As the plan is executed in the simulation, sensor assets continue to search for further information, and the Deliberate Planner's recognised picture continues to be updated.Each time that this process is carried out, an assessment is made (the 'Plan Supervision and Repair' process) to determine whether the plan is performing within defined bounds.This is done by applying the plan fitness function to the Blue plan as it evolves through the simulation, taking account of additional sensor based information (i.e.Blue's evolving perception) about the location of both Blue and Red units.If the plan is failing (i.e.not achieving the required fitness level) the Plan Supervision and Repair process takes place.The planning algorithm determines which areas of operation are failing to meet the plan.It also identifies which units are surplus or in reserve and places these in an availability pool.The areas of operation that are in deficit are then supplemented as required and a new set of orders are issued.

Testing the Genetic Algorithm
In order to test the genetic algorithm, we played through a future scenario using the SE version of WISE, employing expert military players on both the Blue and Red sides.We also represented the same scenario within the closed form constructive simulation version of WISE.In the Deliberate Planner, the broad movement of the forces on the ground is task organised into 'channels' or areas of operation, which head towards objectives (such as an area of ground to be attacked, or a capital city to be defended).These are options which the Deliberate Planner can use in its consideration of how to deploy the force, and forces can be moved between channels as the scenario progresses, as part of the Plan Supervision and Repair process.In our future scenario there are two Blue channels (Figure 2) and two Red channels (Figure 3).Red are initially static with Blue moving towards their objective.In order to make a fair comparison, both the players in the wargame, and the closed form simulation, started with the same information from sensors and intelligence reports, and had the same initial appreciation of the battlespace in terms of movement and key areas of ground.Thus, for example, the initial picture available to a Blue commander in the SE version of the scenario, had the same information content as the picture available to the algorithms representing that commander in the closed form constructive simulation version.Of course this could diverge as the scenario unfolded, depending on the choices made subsequently either in the SE or the closed form version.It was also assumed in each case that the Unmanned Air Vehicles (UAV) deployed as sensors could not be shot down, in order that a reasonable level of situational awareness could be maintained and that this factor (i.e.loss of sensor input) would not greatly influence the plan created.For our comparison, the planner was run with a cautious command style (i.e. a maximin payoff function was assumed for Blue, as part of the evaluation of his plan fitness).A higher weighting in the fitness function was also given to the impact of Blue's plan on Red forces than the impact of Blue's plan on Blue forces.

Comparison of the simulation model algorithm with the wargame
When the Deliberate Planner algorithm is initialised it allocates airborne unmanned air vehicle sensors (UAVs) to the first zones on the channels.Data is used to define the list of sensors allocated to each channel, as well as how many should be used on that channel at any one time.The output log from the closed form simulation showed that both the initial sensor tasking and subsequent sensor tasking took place in the model, with information from these sensors influencing the Deliberate Planner.A plan is generated either when the sensors have searched all of the zones three times or when a user defined trigger time has been reached.In terms of the simulation run, the plan generated was triggered by the user defined time.
Prior to the generation of the plan, information is supplied to the Deliberate Planner to allow it to build up its Recognised Picture.An idea of the type of information available in completing this situational assessment can be seen in the two Brigade perception screenshots at Figure 5 showing the initial assessment made by the planner, and then a more refined assessment as further sensor information is taken into account by the algorithms.Figure 6 shows the execution of the plan within the simulation following the dissemination of orders to the forces.This higher level plan is a 'left hook' by Blue forces, bypassing concentrations of Red force in order to achieve Blue's objectives and intent in a timely way.A small allocation of Blue force is also directed towards the Red perceived objective in order to 'fix' Red forces in place.
Figure 6; The higher level plan generated and implemented by the simulation.

Discussion
The plan that was generated sent the majority of units along the 'left hook', with only two company sized units being tasked down the second channel to 'fix' the Red forces.At first glance this appears to be counter-intuitive.However, since the planner is clearly trying to reach the objective as quickly and with as few casualties on its side as possible, the plan is militarily credible.By choosing the left hook, the main enemy dispositions in the two urban areas are bypassed so that the objective can be reached through the least cost path.Bypassing urban areas rather than clearing them is an accepted tactic in order to maintain tempo, but the enemy left behind must be fixed or at the very least screened to provide intelligence on enemy movement.In the orders generated by the planner in the simulation, the allocation of Blue units to these areas would be insufficient to conduct this without support from other assets, e.g.UAVs, Attack Helicopters, Indirect Fire, etc.

Further Developments
We have demonstrated that higher level planning can be carried out using genetic algorithms, and produces militarily credible plans.This approach is being exploited further within the UK in current model developments, as illustrated in Figure 7.In one of our other models (CLARION -see Figure 1) we are developing a Mission Planner based on the same genetic algorithm approach employed in Deliberate Planning.As indicated in Figure 7, the approach being constructed is that each unit develops a local plan using Rapid Planning (Moffat, 2002).These resultant 'missions' or 10 course of action choices are then coordinated within an area by the Mission Planner.Meanwhile, the Deliberate Planning algorithms deal with the larger scale allocation of forces to areas of operation.
Figure 7; The interaction of Deliberate, Mission and Rapid Planning.

Figure 1 :
Figure 1: The hierarchy of key simulation models.

Figure 2 ;
Figure 2; Blue areas of operation ('channels') within the context of Blue intent.

Figure 3 ;Figure 4
Figure 3; Red areas of operation ('channels') within the context of Red intent.

Figure 4 ;
Figure 4; Blue and Red initial deployment locations.

Figure 5 ;
Figure 5; Brigade picture evolution following additional sensor based information.