CharlieAI: 2011

Saturday, October 8, 2011

Objective Score for Good Move Algorithm

I wanted to get an idea of what a relatively strong Tic-tac-toe player would score with the objective measure. So instead of using an evolved player, I measured the Good Move algorithm. As mentioned before it looks for 2 pieces in a row with an empty square at either end and then plays there or blocks accordingly.

Here are the scores from ten tests and the average score computed:

967, 975, 844, 968, 1021, 917, 992, 1059, 1051, 986 => 978.0

Now, this algorithm doesn't look ahead more than one move so it will not play perfectly, but it will be relatively strong. If I can somehow evolve something as strong as that, using co-evolution, it would certainly be a worthy opponent.

Friday, October 7, 2011

Initial Objective Scores

I completed the objective scoring function that I mentioned earlier and wanted to document some initial results. I did 10 evolution runs against each opponent type, until a perfect fitness case was found, and calculated the average objective score:

Random Move: 124, 180, 108, 132, 366, 160, 108, 72, 198, 108 =>155.6
Good Move: 310, 344, 882, 138, 273, 1172, 150, 518, 128, 246 => 416.1

The Random Move opponent simply just picks a random, valid, move each turn.

Good Move picks a winning move if there is one move that makes 3 in a row, and blocks the opponent if there is a chance for them to play one piece to get 3 in a row. Otherwise it just picks a random, valid, move.

The setup was as follows:

Function nodes	+, -, , /, IfThenElse, =, >, <, And, Or, MakeMove*
Terminal nodes	0, 1, 2, Ephemeral Constant (0-10), SQ_00...SQ_22
Crossover probability	90%
Reproduction probability	9%
Mutation probability	1%
Mutation depth	2
Max tree size (number of nodes)	100
Max tree depth	30

Of course, this was more teaching the AI how to play as opposed to letting it evolve from scratch, so the true test would be with co-evolution. However, I needed to do some more groundwork to get that up and running in my new environment...

Sunday, October 2, 2011

A Tic-tac-toe Yardstick

After getting a new laptop and installing Linux on it (after some experimentation I settled with Ubuntu), I now have a better system to continue experimentation on. I have been spending some time converting my existing code to make use of some of the new C++11 features and its new standard library. The GP system now relies very little on the Boost library (purely for timing functions) and is thus much more portable.

So now the focus is on what I can do to evolve something that successfully plays Tic-tac-toe. Something I have been wanting to introduce for a long time is a yardstick - a way to objectively measure how good my Tic-tac-toe players are. You cannot really do good science without one - how else do you know when you have improved the system or by how much?

Fortunately, for a simple game like Tic-tac-toe it should be possible to make the yardstick something that systematically plays all possible game permutations against the evolved players. An upper bound for the number of permutations can be computed by observing that the first move can be any of 9 possible moves, then 8, then 7 etc. reaching a total of 9! = 362880 permutations. Considering that often a winner is determined before the game is played that far, the actual number of games to play through will be a lot lower. Thus, the yardstick should not be too cpu-intensive and can probably easily be applied at the end of each run to give it an objective score: The number of wins against the yardstick player, which tries out each each possible permutation.

Friday, August 5, 2011

Fibonacci Sequence

I wanted to test my GP framework with a more challenging problem. So a few weeks ago I decided to use my newly found computing resource to try and evolve a program that can output the first 30 numbers of the Fibonacci sequence.

My initial setup was similar to the one used in the search for even numbers - only the fitness function differed. This time it did not result in a perfect solution however. Elaborate approximations were evolved but at no point was a simple iterative solution, keeping track of the last two numbers output and adding these up, evolved.

I thought there may have been some subtle bug in my system preventing that kind of program to be constructed, so I tried manually creating one using the function nodes available to the simulation. That worked fine, so somehow the probability of evolving an equivalent program was just very unlikely. That led me to experiment with other pseudo-random number generators (I ended up using Boost's Mersenne twister implementation). Still no perfect solution.

I had been using a fixed seed for my random number sequences so the final thing I tried was to change the system to use the system clock as seed in order to get variation in successive runs. I ran the final setup with large populations and hundreds of generations ten times over. Still no perfect solution. Still just very good approximations.

I finally concluded that the simulation is very likely to generate a good approximate solution early on and thus gets stuck in a local maximum, where progressive improvements do not move the population in the direction of a perfect solution, and where the distance to the perfect solution is so vast that arriving at it is extremely improbable.

The experiment underlined how important it is to provide an environment with gradual selection pressure if one wants to succeed with genetic programming.

Sunday, May 29, 2011

Cloud computing to the rescue

I've been playing with the thought of getting a fast, reliable computer for my experiments for a while, and a couple of weeks ago I found the perfect solution: Amazon Elastic Compute Cloud. You basically launch computer instances on a need by need basis and within a few seconds they are up and running with whatever OS and software you have configured on them.

It is relatively cost-effective compared to buying your own hardware and maintaining that. I only need computing power from time to time, but most often a lot of it, and this way I can launch as many cores as I need whenever I need! :)

Their Linux instances are the most powerful, so I've spent some time porting my GP code, as a few components were not as portable as I would have liked (was using Windows-based threads and timers). It's now all Boost based and today I got it all running so I can't wait to try some more experiments on fast CPUs.

Saturday, May 7, 2011

Loops in Genetic Programming

Dealing with loops and other iterative constructs in GP is far from trivial. You have to deal with closure and you have to avoid infinite or lengthy fitness evaluations. It only takes a few nested loops of a million iterations each in your individuals to grind everything to (nearly) a halt.

This is perhaps why loops have not seen much use in the GP field. Experiments with Explicit For-loops in Genetic Programming by Vic Ciesielski and Xiang Li provides good coverage of related work.

I ended up using a binary variant of protected for-loops where the first sub tree provides the number of iterations, and the second sub tree, the body of the loop, is then iterated. I did not restrict the number of loop nodes per program, but instead enforced an overall limit of 1000 iterations per program.

To give the programs something to work with, I also introduced a few variable nodes to read and write the variables X and Y. X was always set to the current iteration of a loop.

This allowed me to evolve a perfect solution to the "100 First Even Numbers" sequence in just 8 generations amongst a population of 10,000! :) The run parameters as well as the tree of the perfect solution can be seen below:

Function nodes	Add, Subtract, Multiply, Divide, SetX, SetY, OutputNumber, Loop
Terminal nodes	0, 1, 2, Ephemeral Constant (0-10), GetX, GetY
Crossover probability	90%
Reproduction probability	9%
Mutation probability	1%
Mutation depth	2
Max tree size (number of nodes)	30
Max tree depth	10

Tuesday, May 3, 2011

Sequences

I wanted to do some experiments with simpler problems, so last night I experimented with evolving individuals that can generate a sequence of numbers. Fitness is then based on how many of these numbers match a predefined sequence. Conceptually, sequences of numbers are a bit like board games, which in a way just boil down to sequences of moves...

Function nodes	Add, Subtract, Multiply, Divide, SetX
Terminal nodes	0, 1, 2
Crossover probability	90%
Reproduction probability	9%
Mutation probability	1%
Mutation depth	2
Max tree size (number of nodes)	300
Max tree depth	100

I experimented with the sequence of even numbers. The system struggled to evolve large sequences correctly, but produced perfect solutions for up to the first 6 even numbers: 2, 4, 6, 8, 10, 12 after a few minutes of computation.

Note that the "index" was not provided to the population - the programs simply had to add numbers to a list, with a SetX function node, and then the list was examined in full afterwards.

At first I limited the tree depth to 30, which turned out to be too stringent. Expanding the limit to 100 was sufficient up to the first 6 digits.

After examining the trees containing perfect solutions, I'm starting to see why the trees need to be so deep. Personally, if I coded the program tree manually, I would not be able to create a much shallower tree with the function nodes provided; I would need a Loop node. So I will try to add that next.

But that got me thinking about the Tic-tac-toe experiment. I had avoided adding a Loop node in order to not make the search space too large. In my mind, the solution I had expected to evolve was a giant if-then-else-if-else-if... block with all the possible positions to block against. But considering the number of possible "near-wins" for the opponent there are quite a few: 3 horizontal, 3 vertical and 2 diagonal lines, each of which can have two pieces in 3 different ways, so 24 moves to block against. The Sequence experiment struggled with anything higher than 6 numbers of a sequence, so it's not surprising that the Tic-tac-toe opponents couldn't cover most winning strategies.

So a Loop function node may actually help quite a lot. Alternatively, a FunctionList node may do the trick...i.e. something with high arity that calls a number of sub trees in turn, similar to calling a function in procedural programming languages (making the tree wide instead of deep). Automatically-Defined-Functions would also be interesting to add, to improve reuse of code.

But first things first, so I will try with Loops initially.

Monday, May 2, 2011

Tic-tac-toe with Charlie

I first attempted the Tic-tac-toe experiment, using Charlie, with the same setup as the symbolic regression experiment, along with a MakeMove function node and terminal nodes to represent the status of each square on the board:

Function nodes	Add, Subtract, Multiply, Divide, MakeMove
Terminal nodes	0, 1, 2, EphemeralConstanct(0,10), SQ_00, SQ_01, SQ_02, SQ_10, SQ_11, SQ_12, SQ_20, SQ_21, SQ_22
Crossover probability	90%
Reproduction probability	9%
Mutation probability	1%
Mutation depth	2
Max tree size (number of nodes)	unlimited
Max tree depth	unlimited

That didn't produce very good results. This was as expected as there were no conditional function nodes. So I introduced If-Then-Else, Equals, Greater-Than, Less-Than, And and Or operators and immediately got more promising results.

However, it was still very easy to beat the evolved opponents, so I went ahead and made various performance optimisations including making the simulation multi-threaded. Still not very good results.

I then tried running the setup with huge populations, on a 64-bit OS with lots of RAM, for several weeks but still the best opponents were mediocre and very easy for humans to beat. They frequently made large blunders and you didn't have to be particularly careful to ensure a win against them.

All in all, very similar results to the initial Beagle experiment, so quite disappointing. That brings my journal up to the present. Yes, I'm stuck with this current issue of the simulation getting stuck before strong players are evolved. I'm going to try and think of some new approach or tweak to the setup to improve things. Do let me know if you have any suggestions!

Monday, April 11, 2011

Symbolic regression results

An important role of this blog is to act as a journal of my experiments. Onwards from now I will seek to document my experiments with enough detail to allow others to compare results, and so that I can compare results myself at a later date. So here we go...

I managed to dig out some notes from my early Charlie tests - at first I experimented using the following setup:

Function nodes	Add, Subtract, Multiply and Divide
Terminal nodes	0, 1, 2 and x
Crossover probability	90%
Reproduction probability	9%
Mutation probability	1%
Mutation depth	2
Max tree size (number of nodes)	unlimited
Max tree depth	unlimited

Below is a table of experiments I made with the above configuration. The function that I was trying to evolve is on the left. Population size and maximum generations are also listed along with some comments on the results:

Function	Population Size	Generations	Result
7.2x³+x²-3x	10000	50	No perfect solution and struggling.
7x³+x²-3x	10000	50	Perfect solution in 16 generations
7x+x²-3x	10000	50	Perfect solution in 1 generation
7.2x+x²-3x	10000	50	No perfect solution
7.2x	10000	50	No perfect solution. Best fitness score: ~442
7.2	10000	50	No perfect solution

I was surprised that the system struggled to generate a sub tree equal or close to 7.2. It seemed to evolve integers just fine but struggled with fractions. I therefore tried introducing two further terminal nodes: the constants 4 and 8 in the hope of guiding the evolution to the right ballpark figures. I repeated the last experiment above, but still got no perfect or even approximate solution:

Function	Population Size	Generations	Result
7.2	10000	50	No perfect solution

I got desperate and decided to introduce ephemeral constants, which was on my to-do list anyway. I also removed constants 4 and 8 to reduce the search space a bit again. This resulted in the following figures:

Function	Population Size	Generations	Result
7.2	10000	50	No perfect solution, but very close (~0.0001)
7.2x³+x²-3x	10000	50	No perfect solution but no longer struggling.
7x³+x²-3x	10000	50	No perfect solution.
7x+x²-3x	10000	50	Perfect solution in 1 generation
7.2x+x²-3x	10000	50	No perfect solution
7.2x	10000	50	No perfect solution. Best fitness score: ~13
7.2x	100000	43	No perfect solution but very close
7.2x	10000	100	No perfect solution. Best fitness score: ~12
sqrt(x)	10000	50	No perfect solution. Best score: ~34
7x+x²-3x+x³	10000	50	Perfect solution in 6 generations.

The results were better but by no means amazing. I then came across a subtle bug that meant some arguments were not forwarded to sub trees, and after fixing that I got the following results:

Function	Population Size	Generations	Result
7.2	10000	50	No perfect solution, but very close (~0.00001)
7.2x³+x²-3x	10000	50	No perfect solution but no longer struggling.
7x³+x²-3x	10000	50	Perfect solution in 12 generations.
7x+x²-3x	10000	50	Perfect solution in 1 generation
7.2x+x²-3x	10000	50	No perfect solution, but very close (~0.0001)
7.2x	10000	50	No perfect solution, but very close: ~10e-6.
7.2x	100000	31	No perfect solution but very close: ~10e-6.
sqrt(x)	10000	50	No perfect solution. Best score: ~0.036
7x+x²-3x+x³	10000	50	Perfect solution in 3 generations.

That was more like it! :) It was time to try the Tic-tac-toe experiments with Charlie...

Tuesday, March 8, 2011

Introducing Charlie

A few years ago I decided that to fully understand the dynamics of genetic programming, I would have to implement my own framework. Beagle had proved incredibly useful, but I required full control of the entire software design, down to minute details, so I had to write my own.

Named after a dog we had when I was a kid, and after a certain pioneer in the field of evolution, I called my new framework Charlie (if you were wondering why this blog was called CharlieAI, you now have your answer!)

I will spare you the implementation details and instead list the feature set I ended up with:

Tree-based genomes
Cross-over, mutation and re-selection operators
Node and sub-tree mutation
Tournament-based selection
Ephemeral random constants
Population statistics
Lisp-style and graph in .gif file output (Graphviz)
Multi-threaded evaluation
Single and co-evolution
Hall-of-fame archives

This was all built over six months or so, but I only had a few hours of time to devote every week so the whole thing could easily have been done in something like two weeks of full-time work. I must admit, though, that it was nice to have ample thinking time between implementation steps, to ensure it all fitted well together and performed efficiently.

Next I will cover the results I got from running symbolic regression experiments with this framework. In the meantime, if you would like any more details on any of the above features or any aspects of the software, do let me know and I will attempt to cover it in a future post.

Thursday, March 3, 2011

Hall-of-fame archives

To solve the problem with strategy loops, I implemented a hall-of-fame archive - a collection of the best individual from each generation that led to a new "champion". Generations that did not beat the opposing population were ignored.

Fitness was then measured as an individual's performance against the opposing population's entire archive. Only if all members could be beaten, would a new champion emerge and get added to its population's archive. Thus, loops were made impossible and only definite progress was allowed.

One issue with this approach was the increasing CPU costs of evaluating individuals as the archives expanded, but this is the approach I still use today. It would be interesting to try only using the e.g. ten most recent, and therefore strongest, members of the archive as that would still make progress likely, although not guaranteed, at a much reduced CPU cost.

Using hall-of-fame archives resulted in much stronger Tic-tac-toe players than previously achieved. At first a new champion would be evolved every few generations, but after about a few dozen members in the archive the time between new champions increased, seemingly exponentially, and thus stagnated - even if I left the simulation running for weeks.

While the evolved programs were much stronger than any I had previously evolved, they were still not as strong as I had expected. If I played certain patterns they would block well and eventually win, but they were playing far from perfectly and were still pretty easy to beat.

I was under no illusion that my search space was not huge, but I had thought that for a game as simple as Tic-tac-toe it would not be unrealistic to evolve a perfect player, so that the game would always end in a draw or a win for the AI.

I thought there must be something I could do to improve things, but didn't know exactly what; it was back to the drawing board...

Wednesday, March 2, 2011

Coevolutionary Principles

Before I continue my project log, I wanted to point you at an excellent chapter that I came across today:

Coevolutionary Principles (PDF)

The chapter is an extensive overview of co-evolution and a summary of progress made in the last two decades.

It is by Popovici, E., Bucci, A., Wiegand, P. and De Jong E.D. and from Handbook of Natural Computing, Springer-Verlag, which will be out later this month.

Tuesday, March 1, 2011

Strategy loops

After close examination of the tactics employed by the best individuals from successive generations, I noticed that strategies that were the best in early generations got superseded, as you would expect, but then reappeared several generations later!

While this at first was somewhat puzzling there turned out to be a logical explanation: If program x is always beaten by program y, and y by program z, then that does not mean that x cannot beat z. In practice it was more like this:

program a from population A is randomly selected
program b from population B beats a
c from A beats b
d from B beats c
e from A beats d
f from B beats e
g from A beats f
...
m from A beats l
n, equivalent to f, from B beats m
repeats from step 7.

So this was why the simulation never produced strong players.

Next, I will describe how I solved this problem, but I'd be interested to hear from you if you have any suggestions.

Sunday, February 27, 2011

Co-evolution

The main reason for introducing co-evolution was to provide gradual selection pressure. Instead of just evolving one population against a fixed fitness measure I instead setup two populations, A and B, each initialised with random individuals. I then picked a random individual from A and evolved B until it produced a player that could beat the individual from A. This individual, now the strongest from B, was used to evolve A. Once some player from A could beat it, I swapped again and so forth:

Select random individual from A, Best of A, to get things started.
Evolve B until it can beat Best of A and thus select Best of B.
Evolve A until it can beat Best of B and thus select Best of A.
Repeat from step 2.

This meant that small steps were continually taken to improve the AI and, in theory, I should have ended up with much stronger players than when evolving against Mr Random, as per my previous post on rule inference. That was not the case however - when playing against the best individuals it was evident that they were pretty poor, not much stronger than the ones evolved against Mr Random.

I tried much larger populations and leaving it running for several days on more powerful PCs but to no avail, so I decided to analyse the evolution process and the individual generations in more detail...

Saturday, February 26, 2011

Rule inference

At that point I was ready to start on the board game AI, and I decided to experiment with a Tic-tac-toe player first; once I had evolved a strong player of this game, I could then move onto more complex games. And since I was going to include only minimal game-specific code, the system would be easy to apply to other games. Individuals playing one game could even be used in evolution of players for other games.

But how exactly do you do that? You could program a very simple individual by hand that just makes very simple, but valid, moves and then use that as the starting point for the evolution run. I decided that was not general enough and had the potential to send the search in the wrong direction, especially for more complex games like Go.

Instead I decided to just expose a function to get the status (blank, black or white) of an individual board square at position (x, y) and another to make a move at a particular square at position (x', y'). It should then be possible to evolve programs that could examine the state across the board, apply some logic, and then make a move accordingly. In the first generation of a run, all programs were initialised completely randomly.

If a program tried to read the status outside the boundaries of the board it would just return blank, and if it tried to make a move outside the boundaries or where another piece was already placed, it would simply have no effect. The analogy is a natural environment where e.g. fish that mutate and evolve to swim into rocks simply just get stopped in their way (and ultimately die of hunger and become an extinct species). So all I was providing was a set of "laws of nature" for a game environment, in which individuals could be evolved with any conceivable strategy.

As a fitness measure, individuals would at first play against a dumb training agent, Mr Random, which I wrote that would simply just make random valid moves. They would each play 3 times as white and 3 times as black and the proportion of games won was a direct measure of an individual's fitness.

Early generations would mostly make invalid moves, so consequently they didn't do very well, even against random play. After numerous generations, however, I observed the fittest individuals at play and, as I had hoped, they had learned to play by the rules! This was a simple side effect of only rewarding individuals that happened, by chance, to play valid moves.

When I tried playing against the fittest individuals it did of course turn out to be very easy to beat them, as the hardest opponent they had ever met was Mr Random. Thus, the next step was to implement co-evolution...

Friday, February 25, 2011

Open BEAGLE

When I first started on my project, a few years ago, I decided to save time by making use of an existing tree-based GP system. I did some research into the various alternatives and by far the best framework I came across was Christian Gagné's Open BEAGLE. It is well-designed, flexible, easily extendable and very efficient.

Open BEAGLE also has a Yahoo mailing list offering help and support along the way.

As an initial test of the system, and my understanding of it, I played around with symbolic and numerical regression, where the input and output to a mathematical function is known but the function itself must be evolved.

All you need to do is loop over a series of input data (e.g. the integers 1 to 100), run some function (e.g. x²+x+1) on the input, store the output for fitness evaluation, and then provide only the input data to your population along with some basic operators (e.g. addition, subtraction, multiplication and division). Fitness is determined as the negated magnitude of the error when comparing an individual's output with the known output - the closer the error is to zero, the fitter the individual. Thus, if the error is exactly zero, you have a perfect fitness case.

If everything works you will end up with a program that matches the function exactly, after a few generations of evolution, e.g. for x²+x+1:

Next, I tried if this system could evolve more elaborate functions like 4x³+2x-27. I played around with how many generations were required and how large a population was necessary to get exact solutions. I also tried using a square root function without exposing a square root operator to the GP system, and was amazed when the GP system found very good approximate solutions within a few dozen generations of a few thousand individuals!

Thursday, February 24, 2011

Introduction to genetic programming

As a taster of genetic programming, there are some excellent slides from a course at University of Edinburgh, which covers a lot of theory behind genetic algorithms and, more specifically, genetic programming:
"Genetic Algorithms and Genetic Programming 2009-10".

My favourite book on the subject is "Genetic Programming: An Introduction" by Banzhaf et al. It is a well-written and concise introduction that will give you a sound grounding in all the theory necessary to evolve your own GP populations.

Once you are ready to either tweak existing GP systems or write your own from scratch, "A Field Guide to Genetic Programming" by Poli et al is a priceless resource. It is also available for free as pdf.

After reading through these slides and books (as well as some that turned out to be less helpful), I decided on a tree-based GP approach, where the programs evolved are Lisp-like structures. Below is a simple example from a Tic-tac-toe player. The program's output, in this case, are the MAKEMOVE nodes and each node computes its value based on any subtrees it may have, including the current board state read from SQ_X_Y nodes.

But first some more info on how I got to the stage where I could generate programs like the one above...

Wednesday, February 23, 2011

The Blind Watchmaker

Before I move on to more technical resources, if you want to know exactly why evolution works from a biological point of view, Richard Dawkins' "The Blind Watchmaker" is an excellent introduction.

Genetic programming is a simulation of biological evolution, and of course a simplified one, but it doesn't hurt to know how nature does it.

One of the pieces from this book that proved most useful, later on in my project, is how gradual selection pressure is achieved via co-evolution of species and sub-species. More on that later on...

Tuesday, February 22, 2011

The beginning...

It all started with the notion that to play board games like Tic-tac-toe, Connect Four, Checkers, Chess, Shogi and Go, the strategy to play each game may not be obvious, yet there are patterns that repeat across all these games.

In the case of Tic-tac-toe a simple min-max search may do well. Similarly, in Chess, if we apply min-max along with a host of extra heuristics to guide the search for the best move, we can end up with very strong play. However, for games like Go we know that the search space is too vast for a pure min-max approach to be practical.

Yet, when humans play these games there are often more abstract lessons learned in one game that can be applied in others. With application-specific approaches to AI these lessons are not easily transferable from one game to another.

I've always been a big fan of genetic algorithms and amazed at the novel solutions found in nature employed by Darwinian evolution. One particular variant in particular, genetic programming, where actual computer programs are evolved, is of special interest to me.

As opposed to application-specific search algorithms like min-max or machine learning algorithms like neural networks for pattern recognition, genetic programming is particularly applicable when relatively little is known about the problem domain, or when very little is known about what kind of shape a good solution would take. For example, it may be that a neural network may give us a somewhat good solution but without further degrees of freedom we will never know if for example a stochastic approach combined with a neural network would give us an even better solution.

Genetic programming has the potential to evolve any existing AI approach or combination of such and even to evolve brand new algorithms. In a sense genetic programming is the super set of all learning algorithms.

So I decided to start a hobby project where I used a genetic programming approach to evolve computer programs that can play board games without domain-specific logic or application-specific AI.

In a future post I will go into what exactly is being evolved and how the system teaches itself the rules. But for anyone just starting out with genetic programming, I will first cover some of the resources that helped me out enormously when learning about all this.