Jason Brownlee
April 2003
AI Writing Competition
Adaptation - alteration in the structure or function of organisms which enables them to survive and multiply in a changed environment
- The Macquarie Dictionary
Forward
I had read a number of articles on seemingly complex systems derived from
simple rule sets, but it wasn’t until I constructed a Wator [1] simulation for a
C++ class at University that I truly appreciated the magic of these systems.
Later I reread Craig Reynolds: Boids [2] documentation and built a simple 2D
implementation in Java. Honestly, I was spellbound. I could stare at the
simulations for hours, wholly engrossed in the effect small changes in rules at
the individual level had on the system as a whole.
I became intensely interested in machine learning algorithms and data mining
after taking a few courses in Intelligent Agents and Artificial Intelligence.
After constructing a reasonable library of various neural network algorithms and
playing with the plethora of toy classification and prediction problems I needed
something I could sink my teeth into.
The result is this article and its associated software. The goal was to
attempt the construction of a self-perpetuating system of non-playable
characters (NPC’s) that would use learning algorithms to optimize elements of
their behaviour to engender their species.
The system was designed to be simplistic, with the intent of reinforcing
ideas. Although I believe the concepts involved are thought provoking
nonetheless. I hope you’ll agree and possibly like myself spend countless hours
of enjoyment and frustration in continually tweaking and observing the program’s
parameters.
Introduction
The goal of this article is to discuss the design and the practical
implementation details of two common artificial intelligence techniques used in
the construction of a non-deterministic, self- perpetuating and self-optimizing
ecosystem of simple NPC’s in a real-time environment. Practical considerations
are emphasized over technique detail. Upon completing this article the reader
will have an understanding of development issues involved in applying the
selected techniques in a real-time game environment.
We will start with what the ecosystem is trying to achieve and what type of
agents populate its universe. Next a development platform is selected and
evaluated, with a focus on its strengths and weaknesses.
A review is then made of the plant NPC with a focus on parameters which
directly influence the species ability to survive and interact with other
agents. A genetic algorithm is proposed and constructed to give the agent the
flexibility to adapt to its local surroundings.
Next the herbivore agent is examined. We look at critical sub-systems such as
goal selection focus on the problem of feed goal evaluation. A perceptron
solution is proposed and implemented, with a focus on management of herbivore
memories and the affect dynamic learning has on goal selection.
The system is evaluated using concepts such as diversity, longevity and
determinism. Interesting observations and behaviour are discussed and finally
extensions and additional features are proposed.
We have an exciting journey ahead, so let’s gets straight into it.
What is Ecosystem?
To begin, please forgive my use of the word agent; I have used it in the
terms of "A means by which something is done or caused", rather than the
Artificial Intelligence meaning defined by "Agency, Intelligence and Mobility".
Associate my usage with the term Non-Playable Character (NPC).
The vision of the ecosystem is a system of agents that use learning
techniques, to directly or indirectly optimize their behaviour to meet
challenges in their real-time environment with the goal of individual survival
and reproduction. There are two agent types that exist in the ecosystem, which
are highly dependent on each other for survival.
Plant
The plant is a basic organism which exists to consume food and act as a food
source to the Herbivore. Life for a plant is started as a seed which bursts into
existence and floats down to root. Seeds can be sticky so they may stick to
surfaces, but they are delicate little things so if they touch the wrong surface
they die. Food for the plant is extracted from the environment over time
(concept of sun-light). A plant can reproduce, but it must be grown-up (of a
certain age) and it must receive pollen (genetic material) from another plant.
Pollen is attached to agents whenever they touch/eat from the plant. Given a
special combination of plant genes, a plant can be poisonous to eat.
Herbivore
The herbivore is a simple creature that lives to eat plants. Herbivores start
life as children able only to select plants to eat, and eat them. Once grown up,
herbivores are able to consume plants at a faster rate and are able to
reproduce. Herbivore’s live for a shorter time than plants, but have the unique
ability of passing their instinct (regarding what makes a good plant to eat)
onto their children.
System
The system is small and low in complexity and its clear there are strong
relationships between the two agents. A plant requires herbivores to act as a
delivery vehicles to distribute its genetic material to other plants. Plants
also rely on the herbivore to expose them to new pollen, providing the only
mechanism for reproduction and ultimately the continuation of the plant species.
Herbivores require plants to eat. Without plants, herbivores’ would die
before they could grow up and reproduce. Herbivores need their population size
maintained. Too many herbivores mean the food supply is exhausted before it can
be replenished. Therefore plant population size has a relationship with the
maximum number of herbivores it can sustain. An additional interesting feature
is that not all plants are good for herbivores. There are some plants that have
bad genes to cause them to be harmful to herbivores once eaten.
The combination of the above relationships defines the general rules which
restrict the system. Both species are forever walking the tight-rope of
extinction relying on the individual agent decisions and the combined total of
the actions/parameters of all agents to determine the systems condition.
Combined with this is the fact that the very environment they exist in is
subject to constant change with things like hazards and having paths to goals
blocked. Agent associations are defined by their behaviour and need to be
continually adjusted to adapt to local conditions.
Lifecycle
Although each of the two distinct species perform different actions and have
different behaviours they all share a common lifecycle. This lifecycle is a
loose abstraction from nature:
The lifecycle specification is better demonstrated through the use of a
diagram (see figure 1.0).
Figure 1.0 - The common agent lifecycle
Platform
I chose to build the simulation on the Quake II computer game platform [3] by
id Software [4]. The following are reasons why I think this particular platform
is a good starting point for experimentation:
The above points are great, but the selling point for me was the existing
monster sub-systems. They provided me with a basic tool kit of monster behaviour
which I could hook into as required, meaning I didn’t have to bother with detail
outside the scope of what I was trying to achieve.
Framework
Each monster in the game is provided with processing time via a common
interface. The approach could almost be called polymorphism if it were an object
oriented implementation. Generic monster functionality is abstracted and
specific behaviour is defined by implementing the required interfaces (in this
case provided as function pointers). This elegant approach is used throughout
the game’s code and provides a welcomed level of flexibility.
The monster framework provides an extensive collection of predefined generic
monster behaviour. These subsystems include:
Where appropriate, the generic monster framework will call monster specific
functionality to cope with a situation which generally in turn calls generic
monster functions to deal with those conditions. The majority of the state
transitions are managed for the monsters, all that is required is situation
specific functions for things like attacking goals.
The monster framework is low in complexity, but provides a lot of flexibility
in allowing the choice to use as much or as little of the existing functionality
as is required. This, in conjunction with the excellent rendering engine,
physics, and network code make it a recommended launch pad for creative AI
simulations and experiments.
For some useful resources for programming on this platform, please refer to
[7], [8] and [9].
Plant Agents
Plants are very simple organisms. This section will discuss the four states a
plant may be in and the constraints and actions of those states. The issue of
plant reproduction will be raised and addressed using a genetic algorithm
inspired solution.
Plant States
A plant has a number of defined states it can be in at any one time. The
state a plant agent instance is in narrows the set of actions it can perform and
defines the rules which must be fulfilled to trigger a transition into other
valid plant states.
The following is a list of valid plant states.
The following figure 1.1 provides a diagrammatical representation of the
states for a plant and the valid state transitions.
Figure 1.1 - The 4 state of a plant and the triggers for each state transition
Plants live for a defined length of time as a sprout and then grow to a
mature state. Once mature, plants received an increase in the amount of
resources that they can extract from the environment per server frame. They are
also allowed to reproduce if pollinated. There are constraints regarding the
number of times a plant can reproduce, the number of seeds it will release, the
length of time between reproductions and so on. These variables are critical to
the plant as they directly influence its behaviour and thus its ability to
survive.
To address the critical nature of plant parameters, many variables have been
left as user adjustable constants in a configuration file. A small collection of
critical variables were extracted and make up what is defined as the plants DNA.
These parameters include:
Genetic Algorithm (GA)
Plants pass on genetic material to their seeds when they reproduce. A direct
copy of this material would be clones of plants, meaning the species would be
forever stuck in its random starting condition. The environments that the plants
exist in are large and consist of varying terrain and hazards. There is a need
for plants to be able to adapt their behaviour to meet local conditions; both
local to the selected environment and local to the plants position and
surroundings in the selected environment.
Genetic Algorithms (GA’s) are a method of computing inspired by the ideas of
evolution and natural selection. GA’s simulate the evolution process for a given
problem via processes of selection, mutation and reproduction. They are known as
being an effective search/learning/optimization technique at finding close to
optimal solutions. Please refer to the Genetic Algorithms FAQ [10] for thorough
treatment.
The following provides an outline of the benefit of the various genetic
operators to the plant species:
Figure 1.2 - Representation of the reproduction action of a plant using GA operators
Plant Fitness
Figure 1.2 shows the design and features of plant reproduction using genetic
operators, though there is something important missing; Fitness Evaluation and
Selection. This is the determination of which plant agents are given the chance
to reproduce and how that choice is made, a critical component of which the
above scheme would be irrelevant without.
Selection of which plants may reproduce is made by the herbivore agents.
Herbivores use a selfish fitness function to evaluate plants in their local area
and select one to eat from. Upon eating from a plant they pass on genetic
material from the last plant they ate from, and then take genetic material away
with them to their next food goal.
Reiterating a plants ability to reproduce along with the implications of
using the herbivores as the selection mechanism, a plant must:
Sounds like an interesting self optimisation problem, but there is one more
catch: the poison genes. Plants optimise their DNA ensuring they produce enough
seeds, and that their seeds float through the air the right amount to survive
and get eaten by herbivores. The downside is crossover and mutation also
manipulates their poison genes and once activated will injure the herbivores
they work so hard to optimise themselves for.
Generally when employing GA techniques, the problems involve a large search
space, where unit populations counted in the hundreds or thousands, and where
converge on a solution can occur over many hundreds of generations. In this case
the problem is small and relatively simple and it is expected smaller population
sizes and convergence time are required. So called "errors" in plant behaviour
are welcomed as they provide the simulation with a level of non-determinism
expected by the user. They are also required by the optimisation procedure for
pushing out of locally optimal values, to traverse the search space for good
plant DNA.
Herbivore Agents
The herbivore agent is more complex than the plant in that it makes use of a
goal selection and evaluation system and moves about the environment pursuing
its goals. This section discusses the states that a herbivore agent may have, as
well as the actions it may perform in each state. The problem of flexible food
selection is discussed and a Perceptron based solution implemented, along with a
simple short term memory subsystem to capture and maintain training samples.
Herbivore States
A herbivore has a number of states that it may exist in, and each state
defines a set of actions which a herbivore instance may perform.
The following is a brief description of the four valid states a herbivore may
be in:
Figure 1.3 is a diagrammatic representation of the state lifecycle of a
herbivore, depicting valid state transitions.
Figure 1.3 - Shows the lifecycle states of a herbivore agent and the triggers for each state transition
Herbivore Feeding Problem
Herbivores must eat plants to survive. The problem is not all plants are
edible; some are poisonous which injure the herbivore when eaten. Herbivores
cannot reproduce until they mature, but when they reproduce they pass a snapshot
of their instincts onto their offspring, giving them the best start in life they
can. Obviously only herbivores that are able to select enough good food to live
until they reproduce will continue the species, providing the interesting
situation of Darwinism (survival of the fittest or survival of the best
classifiers). The combined result of this behaviour is that over a number of
generations it is expected that the amount of accumulated knowledge regarding
good food selection would continually increase.
Herbivores are asexual meaning they reproduce by themselves and are not
required to mate. This means genetic operators cannot be used to combine genetic
material of parents, so how is change introduced into the species? What is to
stop herbivores continually refining their behaviour and passing it on to their
children until they are near-perfect at selecting food? What affect would this
have when all the "good" food is exhausted?
We have seen that plants require the herbivore to act as a fitness evaluation
mechanism, selecting which plants may reproduce. We have also seen that the
plants require a fitness evaluation that is flexible enough to not always make
selections based on the same plant characteristics, in an attempt at maintain a
level of plant diversity. Now we have seen that herbivores also require this
flexibility. Both agent species have common requirements for the same
functionality, so how can these problems be addressed?
Enter the Perceptron
Effectively, the plant attributes most relevant to the herbivores immediate
needs (what can I eat right now) are the poison genes.
The plants poison genes are floating point values which must be above a
threshold to be active. To make a plant poisonous for its lifetime, two of the
threes genes must be active. This problem can be characterized by the following
propositional rules:
(P1 ^ P2) v (P1 ^ P3) v (P2 ^ P3) => Not Edible
¬(P1 ^ P2) ^ ¬(P1 ^ P3) ^ ¬(P2 ^ P3) => Edible
Those not familiar with logic connectives: (^ is AND, v is OR, ¬ is NOT, =>
is Implies).
Below is the same problem expressed as a binary truth table:
|
Poison1 |
Poison2 |
Poison3 |
Edible |
|
1 |
1 |
1 |
No |
|
1 |
0 |
1 |
No |
|
1 |
1 |
0 |
No |
|
1 |
0 |
0 |
Yes |
|
0 |
1 |
1 |
No |
|
0 |
0 |
1 |
Yes |
|
0 |
1 |
0 |
Yes |
|
0 |
0 |
0 |
Yes |
Table 1.0 - all possible combinations of plant poison genes and their affect on a plants edibility, one indicates active gene, and zero indicates inactive gene.
The herbivore is required to learn this classification problem, but retain
the flexibility of not learning it perfectly. Once learned, the herbivore is
required to apply the classification to unseen cases in the environment and
predict whether or not the plants are edible. An effective classification and
prediction algorithm suited to this problem is the Perceptron learning rule.
Why Perceptron
The Perceptron is a learning algorithm catered to learning problems with a
small number of binary inputs and a single binary output. Basically a Perceptron
is a single neuron where inputs are weighted. The sum of the inputs passes
through a linear transfer function, determining whether or not the neuron will
fire (output one) or not fire (output zero).
Figure 1.4 - Example Perceptron showing plant input variables and classification output
The algorithm has an advantage over other learning algorithms of similar
nature which is its speed and simplicity, both welcome features in a real-time
environment. A highly publicized weakness of this algorithm is its inability to
learn problems which are not linearly separable, forever demonstrated by the XOR
problem. Non-linearly separable problems are those in which outputs of the
training cases cannot be separated by a straight line or hyperplane. Fortunately
the plant poison problem is linearly separable.
For further understanding of this technique and its strengths and weaknesses
please refer to [11], [12] and [13]. For a broader understanding of neural
networks see [14] and [15].
Memory and Learning
The Perceptron is a supervised learning technique meaning that it learns by
being exposed to samples from the problem domain along with expected outputs.
The algorithm then trains itself to produce similar outputs from the same
inputs. The problem now becomes; where do herbivores get example patterns
from? To solve this a herbivore memory management system was designed to
allowed herbivores to remember plants they eat from and the result of their
eating (obtained sustenance or was poisoned).
Figure 1.5 - Shows how the memory system and Perceptron is used by the herbivore
The herbivore’s goal selection subsystem can be divided into three main
sections:
Memory Storage
When a herbivore selects a food goal, it makes a copy of the plant’s details
and stores it in its state. These details are
If the herbivore manages to navigate to its selected food goal, it will take
a number of bites from the plant. Each bite will create a new memory. A memory
consists of the above details of the current food goal combined with the result
of taking a bite (gain or lose health).
Memories are maintained in a queue data structure where new memories are
added to the end of the queue, possibly popping old memories from the front of
the queue. This simulates the concept of herbivores remembering a handful of
food goals and forgetting food goals from a long time ago. If the herbivore
already has a memory for the plant the memory is updated, otherwise a new memory
is created. This is not required functionality however it stops the generation
of too many similar memories.
Training from Memory
This module is responsible for using the remembered goal selections and
results to train the herbivore’s perceptron. Basically the memory is knowledge
collected by the herbivore through experience in the simulation. Application of
this knowledge is provided by teaching the perceptron structure to recognize
good and bad food goals based on the value of a plants poison genes. Training is
performed on the creation or update of each memory.
Plant selection - flexible food selection
When a herbivore is hungry it evaluates its surroundings and constructs a
list of all plants near by. Each plant is then evaluated by the herbivores
Perceptron in a read-only fashion. The Perceptron will classify one or more
plants as good food goals and the herbivore will randomly select one of these
good food goals to eat from. If the Perceptron fails to classify any food goals
as edible a random plant is selected to eat from the list.
Training Time and Training Patterns
To get an idea of the poison plant problem and the Perceptrons ability to
learn it, I performed a number of experiments using the Perceptron learning
algorithm external to the proposed system. The goal was to gain an understanding
into how difficult the problem was in terms of the number of training epochs and
the number of patterns required to learn and generalize the problem.
This information was determined to be critical to the success of the
Perceptron implementation. A herbivore can only capture so many patterns during
its lifetime, and only so many of those patterns will be unique. The number of
training epochs must be minimized. Large numbers of training iterations means
that too much processor time is allocated to a single task, which may cause
pauses in the simulation and restrict the maximum number of agents the ecosystem
can support.
The following table is a listing of results averaged over 10 tests, each test
stopping when all training patterns were learned successfully.
| % of Training / Test Patterns | Epochs | Average % Correct (Test Patterns) | Average % Correct (All Patterns) |
| 25% / 75% | 1.1 | 48.33% | 61.25% |
| 50% / 50% | 4 | 57.50% | 78.75% |
| 75% / 25% | 5 | 60% | 90% |
| 100% / 100% | 5.4 | 100% | 100% |
The table shows that on average, 6 epochs are enough to generalize the
problem sufficiently to classify approximately 50% of the unseen patterns
correctly when using from between 25% to 75% of the possible unique gene
combinations as training data.
It is expected the herbivore would be able to collect roughly 10 patterns
during its lifetime and that at least half of those collected patters would be
unique, providing the herbivore with an estimated 60% chance of classifying
unseen plants correctly and a 90% chance of classifying any given plant
correctly. These estimations are the best case scenarios. A herbivore that is
able to only capture 2 unique training samples has a 60% an estimated chance of
classifying any given plant correctly.
None of these estimations take the herbivores ability to inherit learned
behaviour into account, which is expected to give the herbivore a jumpstart on
good classifications and possibly inherit features which need to be unlearned.
Safe values for the max epochs and herbivore memory size were selected as
follows:
The Ecosystem
An environment was constructed, specifically designed to test the ecosystem.
I’m no artist so it’s laughably simple. It contains three points from which an
initial population can be started. The level provides a high ceiling to allow
seeds to power up into the air, and an observation platform is provided for the
user to oversee the action.
Figure 1.6 - a screenshot taken of the test environment showing the central island structure and the observation tower.
The agents were implemented using game models and textures from the Quake II
platform. A plant is represented as a round ball. Upon reaching maturity they
transform into a small cylindrical shape. Poisonous plants are represented by a
purple shell effect.
Herbivores have the appearance of a mechanical creature which runs around on
all fours. When feeding from a plant they extend a tongue / tentacle. The
herbivores current feed and roam goals are represented by a green line extending
from the herbivore to its goal. Healthy herbivores are coated in a green shell,
sick herbivores are shown by a red shell.
Figure 1.7 - Shows a number of plant’s in their seed state. Poisonous seeds are represented in purple.
The scene contains two herbivores, one of which is sick; designated by a red shell.
Both the herbivores and plants have a lot of constants which are used to
manage and manipulate various behaviours. These are stored in a configuration
file called "agents.ini". Locating suitable values for these parameters is where
half the fun lies in tweaking the simulation, and due to the flexibility of the
system it is possible for one askew value to have a dramatic impact on the way
the simulation progresses.
Ecosystem Observations
This section is devoted to the meat of this experiment - the observations. By
far the best part of a project like this for me is in the tweak-and-observe
cycle which I can get lost in for hours at a time. Small changes are all that is
usually required to produce very different effects in things like food
selection, population migration and diversity. The following are a number of
small interesting situations observed during the evaluation of my many tweaks:
Figure 1.8 - Shows a dense population of plants and feeding herbivores.
Proposed Feature Enhancements
The program is not perfect. It was constructed over a short period of time
and there are number of features and feature enhancements that could make for a
more enjoyable experience. This section mentions a handful of features I’d have
liked to see implemented:
Next Step: Learning in real time environments
Though this simulation was a trivial example it did show an example of how to
achieve learning in a real-time environment and some of the issues involved such
as pattern diversity and population diversity. Though the program is simple, the
effectiveness of learning was demonstrated.
A next easy step could be to expand the current application. Allow herbivores
to learn more about their food, parameters such as:
Plants could also benefit from encoding more relevant parameters such as:
The area that I find most fascinating is determining how much of an agents’
state machine can be replaced or intergraded with connectionist structures. The
herbivore has only its goal classification managed by a single Perceptron, and
proved to be most effective. It would not be too difficult a task to replace
other subsystems and decision making function with small multiplayer networks.
From there the very triggers between states could again be replaced by small
networks allowing for learned response to stimuli.
Interesting, if not unoriginal ideas [16], but by using the concepts
demonstrated in this article, they could be most achievable. The specific
problems need to be identified, abstracted and solved effectively. This was
shown with goal selection by testing the chosen structure externally from the
application to determine an optimal number of training patterns to learn the
problem and the number of iterations required to reproduce those training
patterns.
Conclusion
We have covered a lot of ground from the design of the agents state to the
implementation of learning algorithms, and finally to the system observations
and ideas for the future. A simple society was constructed with relatively small
population sizes compared to well known alife simulations [17]. Intuitive
concepts for the managing of learning in a real-time environment were
demonstrated through practical design and implementation of both a Genetic
Algorithm and Perceptron.
Those readers with the computer game Quake II can download the program and
all its source code from [18]. I’m interested to see what kind of wacky
simulations readers are able to generate and the parameters which sparked them.
For those with programming background you’ll find the source code commented like
a novel, with a reasonable design, decoupling systems such as the herbivore from
its goal selection, goal selection from its perception and the plants from the
DNA manipulation. This will allow easy replacement of subsystems with little
effect on existing code.
Download
Bibliography
[1] Wator, Not mine, but a great example applet, though I recommend reducing size to 5500: Link
[2] Craig Reynolds Boids: Link
[3] Quake II Computer Game: Link
[4] id Software: Link
[5] Quake II Computer Game Source Code: Link
[6] GNU General Public License (GPL): Link
[7] Inside 3D tutorials for Quake II: Link
[8] The Quake Developers Library Site Offline Tutorials: Link
[9] Open source Quake Development: Link
[10] Genetic Algorithms FAQ: Link
[11] Generation5.com, Perceptrons, Link
[12] Muchele D. Estebon, Perceptrons: An Associative Learning Network, 1997, Link
[13] Omri Weisman & Ziv Pollack, The Perceptron, 1995, Link
[14] Neural-nets Usenet News FAQ: Link
[15] AI-Depot.com, Neural Network Warehouse, Link
[16] [Finite State Machines and neural nets: the inception], McCulloch and Pitts’ neural logical calculus: Link
[17] Seeing around corner, Jonathan Ranch the Atlantic Online, April 2002, Link