We describe an approach for simulating human game-play in strategy games using a variety of AI techniques, including simulated annealing, decision tree learning, and case-based reasoning. We have implemented an AI-bot that uses these techniques to form a novel approach for planning fleet movements and attacks in DEFCON, a nuclear war simulation strategy game released in 2006 by Introversion Software Ltd. The AI-bot retrieves plans from a case-base of recorded games, then uses these to generate a new plan using a method based on decision tree learning. In addition, we have implemented more sophisticated control over low-level actions that enable the AI-bot to synchronize bombing runs, and used a simulated annealing approach for assigning bombing targets to planes and opponent cities to missiles. We describe how our AI-bot operates, and the experimentation we have performed in order to determine an optimal configuration for it. With this configuration, our AI-bot beats Introversion's finite state machine automated player in 76.7% of 150 matches played. We briefly introduce the notion of ability versus enjoyability and discuss initial results of a survey we conducted with human players.
DEFCON is a multiplayer real-time strategy game
from Introversion Software Ltd., for which a screenshot is
provided in Figure
Screenshot of DEFCON.
The existing single-player mode contains a computer
opponent that employs a finite state machine with five states which are carried
out in sequence:
placement of ground units and fleet, scouting by planes and fleet to uncover
structures of the opponent, assaults on the opponent with bombers, a full strike on the opponent with missiles
from silos, submarines, and bombers, a final state, where fleets of ships approach
and attack random opponent positions.
Once the state
machine has reached the fifth state, it remains in that state for the remainder
of the game. This results in a predictable strategy that may appear monotonous
to human players.
We have designed and implemented a novel two-tiered
bot to play DEFCON: on the bottom layer, there are enhanced low-level actions
that make use of in-match history and information from recorded games to
estimate and predict opponent behavior and manoeuvre units accordingly. The
information is gathered in influence maps (see also [
Our AI-bot also controls naval resources,
organized into
fleets and metafleets (i.e., groups of fleets). Because these
resources move during the game, the AI-bot uses a high-level plan
to dictate the initial placement and metafleet movement/attack strategies. To
generate a bespoke plan to fit the game being played, the AI-bot again
retrieves cases from the case-base, and produces a plan by extracting pertinent
information from retrieved plans via decision tree learning, as described in
Section
During the game, the AI-bot carries out the metafleet
movement and attack plan using a movement desire model which takes its context
(including the targets assigned to ships and opponent threats) into account.
The AI-bot also controls low-level actions at game-time, such as the launching
of plane bombing runs, attempting to destroy incoming missiles, and launching
missile attacks from fleets. As described in Section
We have performed much experimentation to fine-tune
our AI-bot in order to maximize the proportion of games it wins against
Introversion's own player simulator. In particular, in order to determine the
weights in the fitness function for the placement of silos and airbases, we
have calculated the correlation of various resource attributes with the final
score of the matches. We have also experimented with the parameters of the
simulated annealing search for assignments. Finally, we have experimented with
the size of the case-base to determine if/when overfitting occurs. With the
most favorable setup, in a session of 150 games, our AI-bot won 76.7% of the
time. We describe our experimentation in Section
The superiority of our AI-bot leads to the question of
whether higher ability implies higher enjoyability for human players. To this
end, we have proposed a hypothesis and conducted an initial survey, which we
describe in Section
The design of
the learning bot is based on an iterative optimization process, similar to that
of a typical evolutionary-based process. An overview of the cycle is depicted
in Figure
Overview of the system design.
Given a situation requesting a plan, a case-base of
previous plan—game pairs is used to select matching plans according to a
similarity measure described in the next subsection. This subset of plans is
then used in a generalization process to create a decision tree, where each
node contains an atomic plan item, as described in Section
We treat the
training of our AI-bot as a machine learning problem in the sense of [ the starting positions of the airbases, radar
stations, fleets, and nuclear silos for both players; the metafleet movement and attack plan which
was used (as described in Section performance statistics for deployed resources
which are for nuclear silos the number of missiles
attacked and destroyed and planes shot down by each silo, for radar stations
the number of missiles identified, and for airbases the number of planes
launched and the number of planes which were quickly lost; an abstraction of the opponent attacks which
took place; we abstract these into waves, by clustering using time-frames of
500 seconds and a threshold of 5 missiles fired (these settings were determined
empirically, see Figure routes taken by opponent fleets; the final scores of the two players in the
game.
Launched opponent missiles during a match, showing wave-pattern of attacks. The time shown is in-game time.
Cases are retrieved from the case-base using the
starting configuration of the game. There are 6 territories that players can be
assigned to (North America, Europe, South Asia, etc.), hence there are
Airbases are structures from which bombing runs are launched; silos are structures which launch nuclear missiles at opponent cities and defend the player's own cities against opponent missile strikes and planes; and radar stations are able to identify the position of enemy planes, missiles, and ships within a certain range. As such, all these structures are very important, and because they cannot be moved at game time, their initial placement by the AI-bot at the start of the game is a key to a successful outcome. The AI-bot uses the previously played games to calculate airbase, silo, and radar placement for the current game. To do this, it retrieves cases with the same starting configuration as the current game, as described above. For each retrieved game, it analyzes the statistics of how each airbase, silo, and radar performed.
Each silo is given an effectiveness score as a
weighted sum of the normalized values for the following:
the number of enemy missiles it shot at; the number of enemy missiles it shot down; the number of enemy planes it shot down; the time it survived before being destroyed.
With respect to the placement of silos, each case is
ranked using the sum of the effectiveness of its silos. Silo placement from the
most effective case is then copied for the current game. The same calculations
inform the placement of the radar stations, with the effectiveness given by the
following values:
the number of enemy planes detected; the number of enemy planes detected before
other radars; the number of enemy ships detected; the time it survived before being destroyed.
Finally, the placement of airbases is determined with
these effectiveness values:
the number of planes launched; the number of units destroyed by launched
planes; the time it survived before being destroyed.
To find suitable weights in the weighted sum for
effectiveness, we performed a correlation analysis for the retaining/losing of
resources against the overall game score. This analysis was performed using
1500 games played randomly (see Section
Pearson product-moment correlation coefficient for loss/retention of resources.
We note that—somewhat counter-intuitively—the loss of carriers, airbases, bombers, fighters, battleships, and missiles is correlated with a winning game. This is explained by the fact that games where fewer of these resources were lost will have been games where the AI-bot did not attack enough (when attacks are made, resources are inevitably lost). For our purposes, it is interesting that the retention of silos is highly correlated with winning games. This informed our choice of weights in the calculation of effectiveness for silo placement: we weighted value (d), namely, the time a silo survived, higher than values (a), (b), and (c). In practice, for silos, we use 1/10, 1/3, 1/6, and 2/5 as weights for values (a), (b), (c), and (d), respectively. We used similar correlation analyses to determine how best to calculate the effectiveness of the placement of airbases and radar stations.
An important aspect of playing DEFCON is the careful control of naval resources (submarines,
battleships, and aircraft carriers). We describe here how our AI-bot generates
a high-level plan for ship placement, movement, and attacks at the start of the
game, how it carries out such plans (see Figure
Two fleets attacking each other in DEFCON.
It is useful
to group resources into larger units so that their movements and attacks can be
synchronized. DEFCON already allows collections of ships to be moved as a
fleet, but players must target and fire each ship's missiles independently. To
enhance this, we have introduced the notion of a metafleet which is a
collection of a number of fleets of ships. Our AI-bot will typically have a
small number of metafleets, (usually 1 or 2) with each one independently
targeting an area of high opponent population. The metafleet movement and
attack plans describe a strategy for each metafleet as a subplan, where the
strategy consists of two large-scale movements of the metafleet. Each subplan
specifies the following information.
In what general area (sea territory) the ships
in the metafleet should be initially placed, relative to the expected opponent
fleet positions. What the aim
of the first large-scale movement should be, including where (if anywhere) the
metafleet should move to, how it should move there, and what general target
area the ships should attack, if any. When the metafleet should switch to the second
large-scale movement. What the aim of the second large-scale
movement should be, including the same details as for (2).
Sea territories—assigned by DEFCON at the start of a game—are split into two oceans, and the plan dictates which one each metafleet should be placed in. The exact positions of the metafleet members are calculated at the start of the game using the case-base, that is, given the set of games retrieved, the AI-bot determines which sea territory contains on average most of the opponent's fleets. Within the chosen sea territory, the starting position depends on the aim of the first large-scale movement and an estimation of the likelihood of opponent fleet encounter which is calculated using the retrieved games. This estimation uses the fleet movement information associated with each stored game in the case-base. The stored information allows the retrieval of the position of each fleet from the stored game as a function of time. For any given position, the closest distance from that position which each fleet obtains during the game can be calculated. The likelihood of enemy fleet encounter is then estimated by the fraction of games in which enemy fleets get closer to the observed position than a predefined threshold.
There are five aims for the large-scale movements, namely,
to stay where they are and await the
opponent's fleets in order to engage them later, to move in order to avoid the opponent's
fleets, to move directly to the target of the attack, to move to the target avoiding the opponent's
fleet, to move towards the opponent's fleet in order
to intercept and engage them.
The aim determines whether the AI-bot should place the
fleets at (i) positions with high-opponent encounter likelihoods (in which
case, large-scale movements (a) and (e) are undertaken), (ii) positions with
low-opponent encounter likelihoods (in which case, large-scale movements (b)
and (d) are undertaken), or (iii) positions which are as close to the attack
spot as possible (in which case, large-scale movement (c) is undertaken). To
determine a general area of the opponent's territory to attack (and hence to
guide a metafleet towards), our AI-bot constructs an influence map [
We implemented a central mechanism to determine both
the best formation of a set of fleets into a metafleet, and the direction of
travel of the metafleet given the aims of the large-scale movement currently
being executed. During noncombative game-play, the central mechanism guides the
metafleets towards the positions dictated in the plan (see Figure
Fleet formation in DEFCON with front direction shown. Each circle indicates a separate fleet of up to three ships.
Hence, we also implemented a movement desire model to take over from the default central mechanism when an attack on the metafleet is detected. This determines the direction for each ship in a fleet using (a) proximity to the ship's target if this has been specified (b) distance to any threatening opponent ships, and (c) distance to any general opponent targets. A direction vector for each ship is calculated in light of the overall aim of the large-scale metafleet movement. For instance, if the aim is to engage the opponent, the ship will sail in the direction of the opponent's fleets.
The movement desire model relies on being able to
predict where the opponent's fleets will be at certain times in the future. To
estimate these positions, our AI-bot retrieves cases from the case-base at
game-time, and looks at all the various positions the opponent's fleets were
recorded at in the case. It then ranks these positions in terms of how close
they are to the current positions of the opponent's fleets. To do this, it must
assign each current opponent ship to one of the ships in the recorded game in
such a way that the overall distance between the pairs of ships in the
assignment is as low as possible. As this is a combinatorially expensive task,
the AI-bot uses a simulated annealing approach to find a good solution, which
is described in more detail in Section
As mentioned
above, at the start of a game, the starting configuration is used to retrieve a
set of cases. These are then used to generate a bespoke plan for the current
game as follows. Firstly, each case contains the final score information of the
game that was played. These are ranked according to the AI-bot's
score (which will be positive if it won, and negative if it
lost). Within this ranking, the first half of the retrieved games are labeled
as negative, and the second half are labeled as positive. Hence, sometimes,
winning games may be labeled negative and, at other times, losing games may be
labeled positive. This is done to achieve an example set that generates a more
detailed decision tree using the
These positive and negative examples are used to
derive a decision tree which can predict whether a plan will lead to a positive
or a negative game. The attributes of the plan in the cases are used as
attributes to split over in the decision tree, that is, the number of
metafleets, their starting sea territories, their first and second large-scale
movement aims, and so on. Our AI-bot uses the
We portray the top nodes of an example tree in Figure
Full selected path through the decision tree. Chosen path is highlighted in bold, not chosen branches are truncated. Remaining plan items are filled in randomly, in this case this is second large-scale movement, first attack time, and number of carriers in metafleet.
Each branch from the top node to a leaf node in these decision trees represents a partial plan, as it will specify the values for some—but not necessarily all—of the attributes which make up a plan. The AI-bot chooses one of these branches by using an iterative fitness-proportionate method, that is, it chooses a path down the tree by looking at the subtree below each possible choice of value from the node it is currently looking at. Each subtree has a set of positive leaf nodes, and a set of negative leaf nodes, and the subtree with the highest proportion of positive leaf nodes is chosen (with a random choice between equally high-scoring subtrees). This continues until a leaf node is reached. Having chosen a branch in this way, the AI-bot fills in the other attributes of the plan randomly. The number of randomly assigned attributes depends on the size of the case-base, for 35 cases this is about 3 attributes.
In order for players not to have to micromanage the playing of the game, DEFCON automatically performs certain actions. For instance, air defence silos automatically target planes in attack range, and a battleship will automatically attack hostile ships and planes in its range. Players are expected to control where their planes attack, and where missiles are fired (from submarines, bombers, and silos).
As mentioned above, the AI-bot uses an influence map
to determine the most effective general radius for missile attacks from its
silos, submarines, and bombers. Within this radius, it must assign a target to
each missile. This is a combinatorially difficult problem, so we frame it as an
instance of the assignment problem [
For most of our testing, we used values
Only silos can defend against missiles, and silos require a certain time to destroy each missile. Thus attacks are more efficient when the time frame of missile strikes is kept small, so we enabled our AI-bot to organize planes to arrive at a target location at the same time.
To achieve such a synchronized attack, our AI-bot
makes individual planes take detours so that they arrive at the time that the
furthest of them arrives without detour (see Figure
Synchronized attack in DEFCON.
We tested the
hypothesis that our AI-bot can learn to play DEFCON better by playing games
randomly, then storing the games in the case-base for use as described above.
To this end, for experiment 1, we generated 5 games per starting configuration
(hence 150 games in total), by randomly choosing values for the plan
attributes, then using the AI-bot to play against Introversion's own automated
player. Moreover, whenever our AI-bot would ordinarily use retrieved cases from
the case-base, uninformed (random) decisions were made for the fleet movement
method, and the placement algorithm provided by Introversion was used. The five
games were then stored in the case-base. Following this populating of the
case-base, we enabled the AI-bot to retrieve and use the cases to play against
Introversion's player 150 times, and we recorded the percentage of games our
AI-bot won. We then repeated the experiment with 10, rather than 5 randomly
played games per starting configuration, then with 15, and so on, up to 70
games, with the results portrayed in Figure
Number of winning configurations versus the size of training data.
We see that the optimal number of cases to use in the
case-base is 35, and that our AI-bot was able to beat Introversion's player in
76.7% of the games. We analyzed games with 35 cases and games with 40 cases to
attempt to explain why performance degrades after this point, and we found that
the decision tree learning process was more often using idiosyncracies from the
cases in the larger case-base, hence overfitting. We describe some possible
remedies for overfitting in Section
Using the 35 cases optimum, we further experimented
with the starting temperature and cool-down rate of the simulated annealing
search for mappings. As described in Section
Performance versus simulated annealing parameters.
Games won (%) | Mean score differential | ||
---|---|---|---|
0 | 0 | 53.3 | 13.3 |
0.3 | 0.75 | 73.3 | 22.7 |
0.5 | 0.9 | 76.7 | 33.2 |
01.01.00 | 0.95 | 69.0 | 34.9 |
01.01.00 | 0.99 | 73.3 | 33.0 |
In a final set of experiments, we tried various different metafleet planning setups to estimate the value of learning from played games. We found that with random generation of plans, our AI-bot won 50% of the time, and that by using hand-crafted plans developed using knowledge of playing DEFCON, we could increase this value to 63%. However, this was not as successful as our case-based learning approach, which—as previously mentioned—won 76.7% of games. This gives us confidence to try different learning methods, such as getting our AI-bot to play against itself, which we aim to experiment with in future work.
We achieved
the initial goal of building an AI-bot that can consistently beat the one written
by Introversion and included with the DEFCON distribution. Its capability to
learn and improve from previous experience makes it more competitive, and thus
we can assign a higher
We define
In Figure
Graph for enjoyability as a function of bot ability (hypothetical).
Figure
Graph for enjoyability as a function of time (hypothetical).
To approach
the question of enjoyability versus ability in strategy games, we conducted a
pilot survey with students, none of which had played DEFCON before. The sample
size of 10 is too small to yield statistically significant results, but it
provided valuable responses for further improvement of our AI-bot and helped us
to remedy inaccuracies in the test protocol. The test was carried out as a
blind test, that is, half the subjects played against the original AI-bot and
the other half against our AI-bot. All other game parameters such as starting
territories and game mode were identical. After each of the 10 successive
matches against their computer opponent, the players were asked to rate their
enjoyment, frustration, difficulty, desire to play again and confidence of
winning the next match. The results are portrayed in Figure
Results of the conducted survey, averaged over 10 games. Values marked with an asterix range from 1 = very low to 6 = very high.
The results indicate that the novices were
overburdened with the increased strength of the new AI-bot, as they won 39% of
the games against our AI-bot, while the test group won 56% of the games against
the original DEFCON bot. This was reflected in the answers of the
questionnaire, where the people playing the original bot
were less frustrated (Figure
Comparison of survey results on the
Comparison of survey results on the
Comparison of survey results on the
Regarding our initial hypothesis from Section
The use of
case-based reasoning, planning, and other AI techniques for board games is too
extensive to cover, hence it is beyond the scope of this paper. Randall et al.
have used DEFCON as a test-bed for AI techniques, in particular learning ship
fleet formations [
A comparison of artificial neural networks and
evolutionary algorithms for optimally controlling a motocross bike in a video
game was investigated in [
Although the developed bot is in itself already an application of the used techniques, the underlying concept of combining artificial intelligence methods to benefit from synergy effects is applicable to many problems, including, but not restricted to, other strategy computer games that have similar requirements of optimizing and planning actions to be able to compete with skilled humans.
In particular, the combination of case-bases and
decision trees to retrieve, generalize, and generate plans is a promising
approach that is applicable to a wide range of problems that exhibit the
following properties.
There are many
ways in which we can further improve the performance of our AI-bot. In
particular, we aim to lessen the impact of over-fitting when learning plans, by
implementing different decision tree learning skills, filling in missing plan
details in nonrandom ways, and by trying other logic-based machine learning
methods, such as Inductive Logic Programming [
With improved skills to both beat and engage players,
the question of how to enable the AI-bot to play in a multiplayer environment
can be addressed. This represents a significant challenge, as our AI-bot will
need to collaborate with other players by forming alliances, which will require
opponent modelling techniques. We aim to use DEFCON and similar video games to
test various combinations of AI techniques, as we believe that integrating
reasoning methods has great potential for building intelligent systems. To
support this goal, we are developing an open AI interface for DEFCON, which is
available online at
The initial results of the survey and the discussion of ability versus enjoyability raise another important point for the future direction of our research. Usually it is a practice in academic AI research to strive for an algorithm that plays at maximum strength, that is, it tries to win at all costs. This is apparent in the application of AI techniques to playing board games. Chess playing programs, for example, usually try to optimize their probabilities of winning. However, this behavior may be undesirable for opponents in modern video games. It is not the goal of the game to make the player lose as often as possible, but to make him/her enjoy the game. This may involve opponents that act nonoptimally, fall for traps, and make believable mistakes. This behavior is another aspect we hope to improve in our bot in the future. It also suggests further player studies, as it is imperative to evaluate the enjoyability and believability of a bot through player feedback.
We have implemented an AI-bot to play the commercial game DEFCON, and showed that it outperforms the existing automated player. In addition to fine-grained control over game actions, including the synchronization of attacks, intelligent assignment of targets via a simulated annealing search, and the use of influence maps, our AI-bot uses plans to determine large-scale fleet movements. It uses a case-base of randomly-planned previously played games to find similar games, some of which ended in success while others ended in failure. It then identifies the factors which best separate good and bad games by building a decision tree using ID3. The plan for the current game is then derived using a fitness-proportionate traversal of the decision tree to find a branch which acts as a partial plan, and the missing parts are filled in randomly. To carry out the fleet movements, ships are guided by a central mechanism, but this is superceded by a movement-desire model if a threat to the fleet is detected.