CharlieAI

Connection Established

2013-07-05T23:00:00.002+01:00

For a while I've been working on a GGP agent that can connect to the Tiltyard hosting server and I've finally got it working. It uses the framework from cadiaplayer for the web server part, my own custom message handler to govern the central message loop and the KIF parsing framework from Dresden University to parse and reason about the games.

To test things, this agent is just picking the first valid move it comes across. It doesn't apply strategy or move evaluation of any sort. Still, it is great to see it play valid moves for a wide range of games without knowing the rules ahead of time.

It first tells the server it's ready to play, then the server sends it the rules for a game, informs it of other players' moves and my agent sends back valid moves. When the game is over, the server starts a new game with a different set of rules.

In testing it, my agent played through the games of Peg Jumping, Knight's Tour, Tic Tac Toe, 3-player Chinese Checkers and Bomberman! Below is a screenshot of it playing Chinese Checkers (my agent played Red).

Next step is to integrate my Monte Carlo Tree Search algorithm - I will report back :)

General Game Playing Platforms

2013-06-29T07:35:00.000+01:00

For several years, a community of people generating and competing with AI to play games has existed. In recent years this community has expanded considerably, mainly due to the projects at the Stanford Logic Group, so I wanted to give a summary of the field as it stands now.

General Game Playing (GGP) is the art of making AI that can play a diverse range of games well without being hand-programmed for any specific game. Rules are described in a formal language called GDL and agents parse the game descriptions, infer the valid moves at any given time and apply a strategy to win.

There are multiple Game Manager hosts, which developers can register their agents with. The main two ones are Dresden GGP Server and Tiltyard which provide large game databases played by registered agents. Matches are automatically hosted between agents, across the Internet, and participants are awarded scores and ratings. The time between an agent getting to know the rules and having to play the game is typically less than a minute, so the AI must be truly general-purpose to play well across the large variety of games.

Every year a world cup with prizes is held for the very best agents. Fluxplayer, Ary and Turtleplayer have been strong in the past, but Cadiaplayer from Reykjavik University has won most world cups and is also the current champion.

The slides from a previous course at Dresden University are a good introduction on how to get started. Several existing frameworks and reference players help to get up to speed quickly. There is also an online course starting in September at Coursera that will teach all the relevant theory and practical knowledge to participate.

My next step is to put together a framework that can parse GDL and communicate to the Game Manager via the established protocol, and then start plugging in my previous AI algorithms. Exciting times ahead! :)

Connect Four Embedded, Too

2013-05-27T17:06:00.001+01:00

And here is the Connect Four player available for you to play against (on Mac OS X and Windows). Again, let me know if you manage to beat it.

Tic Tac Toe Embedded

2013-05-27T15:24:00.003+01:00

I managed to put together a simple interactive application for the Tic Tac Toe player, and embed it here. Unfortunately it only supports Mac OS X and Windows, so apologies if you are viewing this from some other operating system.

Let me know if you manage to win! :)

Does it play Connect Four? Yes, it does.

2013-05-26T07:37:00.000+01:00

Based on how well my Monte Carlo Tree Search agent played Tic Tac Toe, I wanted to move onto Connect Four.

Because the algorithm is domain independent I merely had to add the logic to list valid moves and determine a win, loss or draw outcome. One half-hour commute later and I had it playing Connect Four!

And it plays well, in fact really really well. Despite me trying my very best and with help from my wife, who happens to beat most of our family and friends at this game, we cannot beat the AI, when it plays first. Connect Four has been fully solved by MinMax searches and it can be shown that the first player can always force a move. This is exactly what my MCTS agent does. It will setup various potential win cases at the higher level slots and then eventually force you into playing moves that will allow it to win.

The other nice thing about the playing style you get from MCTS is that even if you play the same moves against it, due to its random sampling of the search tree, it will typically come up with slightly varied responses - ultimately leading to a win of course! :)

What's next? I'm currently deciding on whether to move onto 9x9 Go, some other intermediate step, or perhaps spend a bit of time making simple web-based GUIs for the Tic Tac Toe and Connect 4 agents, as my current UI is a bit lacking...see insert :)

Monte Carlo Tree Seach

2013-05-25T16:26:00.001+01:00

Recently I have made great progress on my AI project, so today I will attempt to wake this sleepy blog from hibernation.

While I had heard about the success of Monte Carlo Tree Search applied to Go and other games a few years ago, it was only about a month ago that I became aware that in its basic form the algorithm is domain independent. It is therefore very suitable for my goal of making general game playing AI that doesn't need to be taught the rules of the games played.

For the necessary background knowledge and a great introduction to the algorithm and its variations, "A Survey of Monte Carlo Tree Search Methods" is a great resource. This paper also includes pseudo-code for the basic algorithm.

In essence, MCTS plays thousands of games to completion taking random moves to get there. The resulting win, loss or draw of each game then informs the algorithm which parts of the search tree to explore further. This combination of random moves and statistical analysis makes the algorithm applicable to a large range of domains and also means it scales well to harder problems - the more CPU time you give it, the better the result will be.

Because the algorithm plays the games to completion, it only needs to be informed about which moves are valid for each turn and whether the game resulted in a win, loss or draw at the end. No other domain-specific knowledge is required.

Is this the end of my genetic programming approach? Quite the opposite, I will be exploring combining the two approaches in the near future.

To try out MCTS, I implemented the basic algorithm and applied it to Tic Tac Toe. After a couple of hours of coding I was ready to play against it...and...it plays extremely well! If the agent starts, it appears impossible to win against it. It sets up clever double win strategies so the best a human opponent can hope for is a draw. I was very impressed!

It is time to move on to the slightly more complex game of Connect 4...

Objective Score for Good Move Algorithm

2011-10-08T19:24:00.000+01:00

I wanted to get an idea of what a relatively strong Tic-tac-toe player would score with the objective measure. So instead of using an evolved player, I measured the Good Move algorithm. As mentioned before it looks for 2 pieces in a row with an empty square at either end and then plays there or blocks accordingly.

Here are the scores from ten tests and the average score computed:

967, 975, 844, 968, 1021, 917, 992, 1059, 1051, 986 => 978.0

Now, this algorithm doesn't look ahead more than one move so it will not play perfectly, but it will be relatively strong. If I can somehow evolve something as strong as that, using co-evolution, it would certainly be a worthy opponent.

Initial Objective Scores

2011-10-07T22:38:00.002+01:00

I completed the objective scoring function that I mentioned earlier and wanted to document some initial results. I did 10 evolution runs against each opponent type, until a perfect fitness case was found, and calculated the average objective score:

Random Move: 124, 180, 108, 132, 366, 160, 108, 72, 198, 108 =>155.6
Good Move: 310, 344, 882, 138, 273, 1172, 150, 518, 128, 246 => 416.1

The Random Move opponent simply just picks a random, valid, move each turn.

Good Move picks a winning move if there is one move that makes 3 in a row, and blocks the opponent if there is a chance for them to play one piece to get 3 in a row. Otherwise it just picks a random, valid, move.

The setup was as follows:

Function nodes	+, -, , /, IfThenElse, =, >, <, And, Or, MakeMove*
Terminal nodes	0, 1, 2, Ephemeral Constant (0-10), SQ_00...SQ_22
Crossover probability	90%
Reproduction probability	9%
Mutation probability	1%
Mutation depth	2
Max tree size (number of nodes)	100
Max tree depth	30

Of course, this was more teaching the AI how to play as opposed to letting it evolve from scratch, so the true test would be with co-evolution. However, I needed to do some more groundwork to get that up and running in my new environment...

A Tic-tac-toe Yardstick

2011-10-02T21:18:00.000+01:00

After getting a new laptop and installing Linux on it (after some experimentation I settled with Ubuntu), I now have a better system to continue experimentation on. I have been spending some time converting my existing code to make use of some of the new C++11 features and its new standard library. The GP system now relies very little on the Boost library (purely for timing functions) and is thus much more portable.

So now the focus is on what I can do to evolve something that successfully plays Tic-tac-toe. Something I have been wanting to introduce for a long time is a yardstick - a way to objectively measure how good my Tic-tac-toe players are. You cannot really do good science without one - how else do you know when you have improved the system or by how much?

Fortunately, for a simple game like Tic-tac-toe it should be possible to make the yardstick something that systematically plays all possible game permutations against the evolved players. An upper bound for the number of permutations can be computed by observing that the first move can be any of 9 possible moves, then 8, then 7 etc. reaching a total of 9! = 362880 permutations. Considering that often a winner is determined before the game is played that far, the actual number of games to play through will be a lot lower. Thus, the yardstick should not be too cpu-intensive and can probably easily be applied at the end of each run to give it an objective score: The number of wins against the yardstick player, which tries out each each possible permutation.

Fibonacci Sequence

2011-08-05T21:03:00.001+01:00

I wanted to test my GP framework with a more challenging problem. So a few weeks ago I decided to use my newly found computing resource to try and evolve a program that can output the first 30 numbers of the Fibonacci sequence.

My initial setup was similar to the one used in the search for even numbers - only the fitness function differed. This time it did not result in a perfect solution however. Elaborate approximations were evolved but at no point was a simple iterative solution, keeping track of the last two numbers output and adding these up, evolved.

I thought there may have been some subtle bug in my system preventing that kind of program to be constructed, so I tried manually creating one using the function nodes available to the simulation. That worked fine, so somehow the probability of evolving an equivalent program was just very unlikely. That led me to experiment with other pseudo-random number generators (I ended up using Boost's Mersenne twister implementation). Still no perfect solution.

I had been using a fixed seed for my random number sequences so the final thing I tried was to change the system to use the system clock as seed in order to get variation in successive runs. I ran the final setup with large populations and hundreds of generations ten times over. Still no perfect solution. Still just very good approximations.

I finally concluded that the simulation is very likely to generate a good approximate solution early on and thus gets stuck in a local maximum, where progressive improvements do not move the population in the direction of a perfect solution, and where the distance to the perfect solution is so vast that arriving at it is extremely improbable.

The experiment underlined how important it is to provide an environment with gradual selection pressure if one wants to succeed with genetic programming.

Cloud computing to the rescue

2011-05-29T20:24:00.000+01:00

I've been playing with the thought of getting a fast, reliable computer for my experiments for a while, and a couple of weeks ago I found the perfect solution: Amazon Elastic Compute Cloud. You basically launch computer instances on a need by need basis and within a few seconds they are up and running with whatever OS and software you have configured on them.

It is relatively cost-effective compared to buying your own hardware and maintaining that. I only need computing power from time to time, but most often a lot of it, and this way I can launch as many cores as I need whenever I need! :)

Their Linux instances are the most powerful, so I've spent some time porting my GP code, as a few components were not as portable as I would have liked (was using Windows-based threads and timers). It's now all Boost based and today I got it all running so I can't wait to try some more experiments on fast CPUs.

Loops in Genetic Programming

2011-05-07T21:35:00.000+01:00

Dealing with loops and other iterative constructs in GP is far from trivial. You have to deal with closure and you have to avoid infinite or lengthy fitness evaluations. It only takes a few nested loops of a million iterations each in your individuals to grind everything to (nearly) a halt.

This is perhaps why loops have not seen much use in the GP field. Experiments with Explicit For-loops in Genetic Programming by Vic Ciesielski and Xiang Li provides good coverage of related work.

I ended up using a binary variant of protected for-loops where the first sub tree provides the number of iterations, and the second sub tree, the body of the loop, is then iterated. I did not restrict the number of loop nodes per program, but instead enforced an overall limit of 1000 iterations per program.

To give the programs something to work with, I also introduced a few variable nodes to read and write the variables X and Y. X was always set to the current iteration of a loop.

This allowed me to evolve a perfect solution to the "100 First Even Numbers" sequence in just 8 generations amongst a population of 10,000! :) The run parameters as well as the tree of the perfect solution can be seen below:

Function nodes	Add, Subtract, Multiply, Divide, SetX, SetY, OutputNumber, Loop
Terminal nodes	0, 1, 2, Ephemeral Constant (0-10), GetX, GetY
Crossover probability	90%
Reproduction probability	9%
Mutation probability	1%
Mutation depth	2
Max tree size (number of nodes)	30
Max tree depth	10

Sequences

2011-05-03T18:49:00.004+01:00

I wanted to do some experiments with simpler problems, so last night I experimented with evolving individuals that can generate a sequence of numbers. Fitness is then based on how many of these numbers match a predefined sequence. Conceptually, sequences of numbers are a bit like board games, which in a way just boil down to sequences of moves...

Function nodes	Add, Subtract, Multiply, Divide, SetX
Terminal nodes	0, 1, 2
Crossover probability	90%
Reproduction probability	9%
Mutation probability	1%
Mutation depth	2
Max tree size (number of nodes)	300
Max tree depth	100

I experimented with the sequence of even numbers. The system struggled to evolve large sequences correctly, but produced perfect solutions for up to the first 6 even numbers: 2, 4, 6, 8, 10, 12 after a few minutes of computation.

Note that the "index" was not provided to the population - the programs simply had to add numbers to a list, with a SetX function node, and then the list was examined in full afterwards.

At first I limited the tree depth to 30, which turned out to be too stringent. Expanding the limit to 100 was sufficient up to the first 6 digits.

After examining the trees containing perfect solutions, I'm starting to see why the trees need to be so deep. Personally, if I coded the program tree manually, I would not be able to create a much shallower tree with the function nodes provided; I would need a Loop node. So I will try to add that next.

But that got me thinking about the Tic-tac-toe experiment. I had avoided adding a Loop node in order to not make the search space too large. In my mind, the solution I had expected to evolve was a giant if-then-else-if-else-if... block with all the possible positions to block against. But considering the number of possible "near-wins" for the opponent there are quite a few: 3 horizontal, 3 vertical and 2 diagonal lines, each of which can have two pieces in 3 different ways, so 24 moves to block against. The Sequence experiment struggled with anything higher than 6 numbers of a sequence, so it's not surprising that the Tic-tac-toe opponents couldn't cover most winning strategies.

So a Loop function node may actually help quite a lot. Alternatively, a FunctionList node may do the trick...i.e. something with high arity that calls a number of sub trees in turn, similar to calling a function in procedural programming languages (making the tree wide instead of deep). Automatically-Defined-Functions would also be interesting to add, to improve reuse of code.

But first things first, so I will try with Loops initially.

Tic-tac-toe with Charlie

2011-05-02T16:38:00.001+01:00

I first attempted the Tic-tac-toe experiment, using Charlie, with the same setup as the symbolic regression experiment, along with a MakeMove function node and terminal nodes to represent the status of each square on the board:

Function nodes	Add, Subtract, Multiply, Divide, MakeMove
Terminal nodes	0, 1, 2, EphemeralConstanct(0,10), SQ_00, SQ_01, SQ_02, SQ_10, SQ_11, SQ_12, SQ_20, SQ_21, SQ_22
Crossover probability	90%
Reproduction probability	9%
Mutation probability	1%
Mutation depth	2
Max tree size (number of nodes)	unlimited
Max tree depth	unlimited

That didn't produce very good results. This was as expected as there were no conditional function nodes. So I introduced If-Then-Else, Equals, Greater-Than, Less-Than, And and Or operators and immediately got more promising results.

However, it was still very easy to beat the evolved opponents, so I went ahead and made various performance optimisations including making the simulation multi-threaded. Still not very good results.

I then tried running the setup with huge populations, on a 64-bit OS with lots of RAM, for several weeks but still the best opponents were mediocre and very easy for humans to beat. They frequently made large blunders and you didn't have to be particularly careful to ensure a win against them.

All in all, very similar results to the initial Beagle experiment, so quite disappointing. That brings my journal up to the present. Yes, I'm stuck with this current issue of the simulation getting stuck before strong players are evolved. I'm going to try and think of some new approach or tweak to the setup to improve things. Do let me know if you have any suggestions!

Symbolic regression results

2011-04-11T18:15:00.001+01:00

An important role of this blog is to act as a journal of my experiments. Onwards from now I will seek to document my experiments with enough detail to allow others to compare results, and so that I can compare results myself at a later date. So here we go...

I managed to dig out some notes from my early Charlie tests - at first I experimented using the following setup:

Function nodes	Add, Subtract, Multiply and Divide
Terminal nodes	0, 1, 2 and x
Crossover probability	90%
Reproduction probability	9%
Mutation probability	1%
Mutation depth	2
Max tree size (number of nodes)	unlimited
Max tree depth	unlimited

Below is a table of experiments I made with the above configuration. The function that I was trying to evolve is on the left. Population size and maximum generations are also listed along with some comments on the results:

Function	Population Size	Generations	Result
7.2x³+x²-3x	10000	50	No perfect solution and struggling.
7x³+x²-3x	10000	50	Perfect solution in 16 generations
7x+x²-3x	10000	50	Perfect solution in 1 generation
7.2x+x²-3x	10000	50	No perfect solution
7.2x	10000	50	No perfect solution. Best fitness score: ~442
7.2	10000	50	No perfect solution

I was surprised that the system struggled to generate a sub tree equal or close to 7.2. It seemed to evolve integers just fine but struggled with fractions. I therefore tried introducing two further terminal nodes: the constants 4 and 8 in the hope of guiding the evolution to the right ballpark figures. I repeated the last experiment above, but still got no perfect or even approximate solution:

Function	Population Size	Generations	Result
7.2	10000	50	No perfect solution

I got desperate and decided to introduce ephemeral constants, which was on my to-do list anyway. I also removed constants 4 and 8 to reduce the search space a bit again. This resulted in the following figures:

Function	Population Size	Generations	Result
7.2	10000	50	No perfect solution, but very close (~0.0001)
7.2x³+x²-3x	10000	50	No perfect solution but no longer struggling.
7x³+x²-3x	10000	50	No perfect solution.
7x+x²-3x	10000	50	Perfect solution in 1 generation
7.2x+x²-3x	10000	50	No perfect solution
7.2x	10000	50	No perfect solution. Best fitness score: ~13
7.2x	100000	43	No perfect solution but very close
7.2x	10000	100	No perfect solution. Best fitness score: ~12
sqrt(x)	10000	50	No perfect solution. Best score: ~34
7x+x²-3x+x³	10000	50	Perfect solution in 6 generations.

The results were better but by no means amazing. I then came across a subtle bug that meant some arguments were not forwarded to sub trees, and after fixing that I got the following results:

Function	Population Size	Generations	Result
7.2	10000	50	No perfect solution, but very close (~0.00001)
7.2x³+x²-3x	10000	50	No perfect solution but no longer struggling.
7x³+x²-3x	10000	50	Perfect solution in 12 generations.
7x+x²-3x	10000	50	Perfect solution in 1 generation
7.2x+x²-3x	10000	50	No perfect solution, but very close (~0.0001)
7.2x	10000	50	No perfect solution, but very close: ~10e-6.
7.2x	100000	31	No perfect solution but very close: ~10e-6.
sqrt(x)	10000	50	No perfect solution. Best score: ~0.036
7x+x²-3x+x³	10000	50	Perfect solution in 3 generations.

That was more like it! :) It was time to try the Tic-tac-toe experiments with Charlie...

Introducing Charlie

2011-03-08T22:21:00.000+00:00

A few years ago I decided that to fully understand the dynamics of genetic programming, I would have to implement my own framework. Beagle had proved incredibly useful, but I required full control of the entire software design, down to minute details, so I had to write my own.

Named after a dog we had when I was a kid, and after a certain pioneer in the field of evolution, I called my new framework Charlie (if you were wondering why this blog was called CharlieAI, you now have your answer!)

I will spare you the implementation details and instead list the feature set I ended up with:

Tree-based genomes
Cross-over, mutation and re-selection operators
Node and sub-tree mutation
Tournament-based selection
Ephemeral random constants
Population statistics
Lisp-style and graph in .gif file output (Graphviz)
Multi-threaded evaluation
Single and co-evolution
Hall-of-fame archives

This was all built over six months or so, but I only had a few hours of time to devote every week so the whole thing could easily have been done in something like two weeks of full-time work. I must admit, though, that it was nice to have ample thinking time between implementation steps, to ensure it all fitted well together and performed efficiently.

Next I will cover the results I got from running symbolic regression experiments with this framework. In the meantime, if you would like any more details on any of the above features or any aspects of the software, do let me know and I will attempt to cover it in a future post.

Hall-of-fame archives

2011-03-03T21:46:00.001+00:00

To solve the problem with strategy loops, I implemented a hall-of-fame archive - a collection of the best individual from each generation that led to a new "champion". Generations that did not beat the opposing population were ignored.

Fitness was then measured as an individual's performance against the opposing population's entire archive. Only if all members could be beaten, would a new champion emerge and get added to its population's archive. Thus, loops were made impossible and only definite progress was allowed.

One issue with this approach was the increasing CPU costs of evaluating individuals as the archives expanded, but this is the approach I still use today. It would be interesting to try only using the e.g. ten most recent, and therefore strongest, members of the archive as that would still make progress likely, although not guaranteed, at a much reduced CPU cost.

Using hall-of-fame archives resulted in much stronger Tic-tac-toe players than previously achieved. At first a new champion would be evolved every few generations, but after about a few dozen members in the archive the time between new champions increased, seemingly exponentially, and thus stagnated - even if I left the simulation running for weeks.

While the evolved programs were much stronger than any I had previously evolved, they were still not as strong as I had expected. If I played certain patterns they would block well and eventually win, but they were playing far from perfectly and were still pretty easy to beat.

I was under no illusion that my search space was not huge, but I had thought that for a game as simple as Tic-tac-toe it would not be unrealistic to evolve a perfect player, so that the game would always end in a draw or a win for the AI.

I thought there must be something I could do to improve things, but didn't know exactly what; it was back to the drawing board...

Coevolutionary Principles

2011-03-02T19:12:00.001+00:00

Before I continue my project log, I wanted to point you at an excellent chapter that I came across today:

Coevolutionary Principles (PDF)

The chapter is an extensive overview of co-evolution and a summary of progress made in the last two decades.

It is by Popovici, E., Bucci, A., Wiegand, P. and De Jong E.D. and from Handbook of Natural Computing, Springer-Verlag, which will be out later this month.

Strategy loops

2011-03-01T18:19:00.006+00:00

After close examination of the tactics employed by the best individuals from successive generations, I noticed that strategies that were the best in early generations got superseded, as you would expect, but then reappeared several generations later!

While this at first was somewhat puzzling there turned out to be a logical explanation: If program x is always beaten by program y, and y by program z, then that does not mean that x cannot beat z. In practice it was more like this:

program a from population A is randomly selected
program b from population B beats a
c from A beats b
d from B beats c
e from A beats d
f from B beats e
g from A beats f
...
m from A beats l
n, equivalent to f, from B beats m
repeats from step 7.

So this was why the simulation never produced strong players.

Next, I will describe how I solved this problem, but I'd be interested to hear from you if you have any suggestions.

Co-evolution

2011-02-27T19:14:00.000+00:00

The main reason for introducing co-evolution was to provide gradual selection pressure. Instead of just evolving one population against a fixed fitness measure I instead setup two populations, A and B, each initialised with random individuals. I then picked a random individual from A and evolved B until it produced a player that could beat the individual from A. This individual, now the strongest from B, was used to evolve A. Once some player from A could beat it, I swapped again and so forth:

Select random individual from A, Best of A, to get things started.
Evolve B until it can beat Best of A and thus select Best of B.
Evolve A until it can beat Best of B and thus select Best of A.
Repeat from step 2.

This meant that small steps were continually taken to improve the AI and, in theory, I should have ended up with much stronger players than when evolving against Mr Random, as per my previous post on rule inference. That was not the case however - when playing against the best individuals it was evident that they were pretty poor, not much stronger than the ones evolved against Mr Random.

I tried much larger populations and leaving it running for several days on more powerful PCs but to no avail, so I decided to analyse the evolution process and the individual generations in more detail...

Rule inference

2011-02-26T17:38:00.001+00:00

At that point I was ready to start on the board game AI, and I decided to experiment with a Tic-tac-toe player first; once I had evolved a strong player of this game, I could then move onto more complex games. And since I was going to include only minimal game-specific code, the system would be easy to apply to other games. Individuals playing one game could even be used in evolution of players for other games.

But how exactly do you do that? You could program a very simple individual by hand that just makes very simple, but valid, moves and then use that as the starting point for the evolution run. I decided that was not general enough and had the potential to send the search in the wrong direction, especially for more complex games like Go.

Instead I decided to just expose a function to get the status (blank, black or white) of an individual board square at position (x, y) and another to make a move at a particular square at position (x', y'). It should then be possible to evolve programs that could examine the state across the board, apply some logic, and then make a move accordingly. In the first generation of a run, all programs were initialised completely randomly.

If a program tried to read the status outside the boundaries of the board it would just return blank, and if it tried to make a move outside the boundaries or where another piece was already placed, it would simply have no effect. The analogy is a natural environment where e.g. fish that mutate and evolve to swim into rocks simply just get stopped in their way (and ultimately die of hunger and become an extinct species). So all I was providing was a set of "laws of nature" for a game environment, in which individuals could be evolved with any conceivable strategy.

As a fitness measure, individuals would at first play against a dumb training agent, Mr Random, which I wrote that would simply just make random valid moves. They would each play 3 times as white and 3 times as black and the proportion of games won was a direct measure of an individual's fitness.

Early generations would mostly make invalid moves, so consequently they didn't do very well, even against random play. After numerous generations, however, I observed the fittest individuals at play and, as I had hoped, they had learned to play by the rules! This was a simple side effect of only rewarding individuals that happened, by chance, to play valid moves.

When I tried playing against the fittest individuals it did of course turn out to be very easy to beat them, as the hardest opponent they had ever met was Mr Random. Thus, the next step was to implement co-evolution...

Open BEAGLE

2011-02-25T07:44:00.003+00:00

When I first started on my project, a few years ago, I decided to save time by making use of an existing tree-based GP system. I did some research into the various alternatives and by far the best framework I came across was Christian Gagné's Open BEAGLE. It is well-designed, flexible, easily extendable and very efficient.

Open BEAGLE also has a Yahoo mailing list offering help and support along the way.

As an initial test of the system, and my understanding of it, I played around with symbolic and numerical regression, where the input and output to a mathematical function is known but the function itself must be evolved.

All you need to do is loop over a series of input data (e.g. the integers 1 to 100), run some function (e.g. x²+x+1) on the input, store the output for fitness evaluation, and then provide only the input data to your population along with some basic operators (e.g. addition, subtraction, multiplication and division). Fitness is determined as the negated magnitude of the error when comparing an individual's output with the known output - the closer the error is to zero, the fitter the individual. Thus, if the error is exactly zero, you have a perfect fitness case.

If everything works you will end up with a program that matches the function exactly, after a few generations of evolution, e.g. for x²+x+1:

Next, I tried if this system could evolve more elaborate functions like 4x³+2x-27. I played around with how many generations were required and how large a population was necessary to get exact solutions. I also tried using a square root function without exposing a square root operator to the GP system, and was amazed when the GP system found very good approximate solutions within a few dozen generations of a few thousand individuals!

Introduction to genetic programming

2011-02-24T07:40:00.000+00:00

As a taster of genetic programming, there are some excellent slides from a course at University of Edinburgh, which covers a lot of theory behind genetic algorithms and, more specifically, genetic programming:
"Genetic Algorithms and Genetic Programming 2009-10".

My favourite book on the subject is "Genetic Programming: An Introduction" by Banzhaf et al. It is a well-written and concise introduction that will give you a sound grounding in all the theory necessary to evolve your own GP populations.

Once you are ready to either tweak existing GP systems or write your own from scratch, "A Field Guide to Genetic Programming" by Poli et al is a priceless resource. It is also available for free as pdf.

After reading through these slides and books (as well as some that turned out to be less helpful), I decided on a tree-based GP approach, where the programs evolved are Lisp-like structures. Below is a simple example from a Tic-tac-toe player. The program's output, in this case, are the MAKEMOVE nodes and each node computes its value based on any subtrees it may have, including the current board state read from SQ_X_Y nodes.

But first some more info on how I got to the stage where I could generate programs like the one above...

The Blind Watchmaker

2011-02-23T18:39:00.000+00:00

Before I move on to more technical resources, if you want to know exactly why evolution works from a biological point of view, Richard Dawkins' "The Blind Watchmaker" is an excellent introduction.

Genetic programming is a simulation of biological evolution, and of course a simplified one, but it doesn't hurt to know how nature does it.

One of the pieces from this book that proved most useful, later on in my project, is how gradual selection pressure is achieved via co-evolution of species and sub-species. More on that later on...

The beginning...

2011-02-22T18:55:00.001+00:00

It all started with the notion that to play board games like Tic-tac-toe, Connect Four, Checkers, Chess, Shogi and Go, the strategy to play each game may not be obvious, yet there are patterns that repeat across all these games.

In the case of Tic-tac-toe a simple min-max search may do well. Similarly, in Chess, if we apply min-max along with a host of extra heuristics to guide the search for the best move, we can end up with very strong play. However, for games like Go we know that the search space is too vast for a pure min-max approach to be practical.

Yet, when humans play these games there are often more abstract lessons learned in one game that can be applied in others. With application-specific approaches to AI these lessons are not easily transferable from one game to another.

I've always been a big fan of genetic algorithms and amazed at the novel solutions found in nature employed by Darwinian evolution. One particular variant in particular, genetic programming, where actual computer programs are evolved, is of special interest to me.

As opposed to application-specific search algorithms like min-max or machine learning algorithms like neural networks for pattern recognition, genetic programming is particularly applicable when relatively little is known about the problem domain, or when very little is known about what kind of shape a good solution would take. For example, it may be that a neural network may give us a somewhat good solution but without further degrees of freedom we will never know if for example a stochastic approach combined with a neural network would give us an even better solution.

Genetic programming has the potential to evolve any existing AI approach or combination of such and even to evolve brand new algorithms. In a sense genetic programming is the super set of all learning algorithms.

So I decided to start a hobby project where I used a genetic programming approach to evolve computer programs that can play board games without domain-specific logic or application-specific AI.

In a future post I will go into what exactly is being evolved and how the system teaches itself the rules. But for anyone just starting out with genetic programming, I will first cover some of the resources that helped me out enormously when learning about all this.