Sunday, February 27, 2011

Co-evolution

The main reason for introducing co-evolution was to provide gradual selection pressure. Instead of just evolving one population against a fixed fitness measure I instead setup two populations, A and B, each initialised with random individuals. I then picked a random individual from A and evolved B until it produced a player that could beat the individual from A. This individual, now the strongest from B, was used to evolve A. Once some player from A could beat it, I swapped again and so forth:
  1. Select random individual from A, Best of A, to get things started.
  2. Evolve B until it can beat Best of A and thus select Best of B.
  3. Evolve A until it can beat Best of B and thus select Best of A.
  4. Repeat from step 2.
This meant that small steps were continually taken to improve the AI and, in theory, I should have ended up with much stronger players than when evolving against Mr Random, as per my previous post on rule inference. That was not the case however - when playing against the best individuals it was evident that they were pretty poor, not much stronger than the ones evolved against Mr Random.

I tried much larger populations and leaving it running for several days on more powerful PCs but to no avail, so I decided to analyse the evolution process and the individual generations in more detail...

Saturday, February 26, 2011

Rule inference

At that point I was ready to start on the board game AI, and I decided to experiment with a Tic-tac-toe player first; once I had evolved a strong player of this game, I could then move onto more complex games. And since I was going to include only minimal game-specific code, the system would be easy to apply to other games. Individuals playing one game could even be used in evolution of players for other games.

But how exactly do you do that? You could program a very simple individual by hand that just makes very simple, but valid, moves and then use that as the starting point for the evolution run. I decided that was not general enough and had the potential to send the search in the wrong direction, especially for more complex games like Go.

Instead I decided to just expose a function to get the status (blank, black or white) of an individual board square at position (x, y) and another to make a move at a particular square at position (x', y'). It should then be possible to evolve programs that could examine the state across the board, apply some logic, and then make a move accordingly. In the first generation of a run, all programs were initialised completely randomly.

If a program tried to read the status outside the boundaries of the board it would just return blank, and if it tried to make a move outside the boundaries or where another piece was already placed, it would simply have no effect. The analogy is a natural environment where e.g. fish that mutate and evolve to swim into rocks simply just get stopped in their way (and ultimately die of hunger and become an extinct species). So all I was providing was a set of "laws of nature" for a game environment, in which individuals could be evolved with any conceivable strategy.

As a fitness measure, individuals would at first play against a dumb training agent, Mr Random, which I wrote that would simply just make random valid moves. They would each play 3 times as white and 3 times as black and the proportion of games won was a direct measure of an individual's fitness.

Early generations would mostly make invalid moves, so consequently they didn't do very well, even against random play. After numerous generations, however, I observed the fittest individuals at play and, as I had hoped, they had learned to play by the rules! This was a simple side effect of only rewarding individuals that happened, by chance, to play valid moves.

When I tried playing against the fittest individuals it did of course turn out to be very easy to beat them, as the hardest opponent they had ever met was Mr Random. Thus, the next step was to implement co-evolution...

Friday, February 25, 2011

Open BEAGLE

When I first started on my project, a few years ago, I decided to save time by making use of an existing tree-based GP system. I did some research into the various alternatives and by far the best framework I came across was Christian Gagné's Open BEAGLE. It is well-designed, flexible, easily extendable and very efficient.

Open BEAGLE also has a Yahoo mailing list offering help and support along the way.

As an initial test of the system, and my understanding of it, I played around with symbolic and numerical regression, where the input and output to a mathematical function is known but the function itself must be evolved.

All you need to do is loop over a series of input data (e.g. the integers 1 to 100), run some function (e.g. x2+x+1) on the input, store the output for fitness evaluation, and then provide only the input data to your population along with some basic operators (e.g. addition, subtraction, multiplication and division). Fitness is determined as the negated magnitude of the error when comparing an individual's output with the known output - the closer the error is to zero, the fitter the individual. Thus, if the error is exactly zero, you have a perfect fitness case.

If everything works you will end up with a program that matches the function exactly, after a few generations of evolution, e.g. for x2+x+1:


Next, I tried if this system could evolve more elaborate functions like 4x3+2x-27. I played around with how many generations were required and how large a population was necessary to get exact solutions. I also tried using a square root function without exposing a square root operator to the GP system, and was amazed when the GP system found very good approximate solutions within a few dozen generations of a few thousand individuals!

Thursday, February 24, 2011

Introduction to genetic programming

As a taster of genetic programming, there are some excellent slides from a course at University of Edinburgh, which covers a lot of theory behind genetic algorithms and, more specifically, genetic programming:
"Genetic Algorithms and Genetic Programming 2009-10"

My favourite book on the subject is "Genetic Programming: An Introduction" by Banzhaf et al. It is a well-written and concise introduction that will give you a sound grounding in all the theory necessary to evolve your own GP populations.

Once you are ready to either tweak existing GP systems or write your own from scratch, "A Field Guide to Genetic Programming" by Poli et al is a priceless resource. It is also available for free as pdf.

After reading through these slides and books (as well as some that turned out to be less helpful), I decided on a tree-based GP approach, where the programs evolved are Lisp-like structures. Below is a simple example from a Tic-tac-toe player. The program's output, in this case, are the MAKEMOVE nodes and each node computes its value based on any subtrees it may have, including the current board state read from SQ_X_Y nodes.


But first some more info on how I got to the stage where I could generate programs like the one above...

Wednesday, February 23, 2011

The Blind Watchmaker

Before I move on to more technical resources, if you want to know exactly why evolution works from a biological point of view, Richard Dawkins' "The Blind Watchmaker" is an excellent introduction.

Genetic programming is a simulation of biological evolution, and of course a simplified one, but it doesn't hurt to know how nature does it.

One of the pieces from this book that proved most useful, later on in my project, is how gradual selection pressure is achieved via co-evolution of species and sub-species. More on that later on...

Tuesday, February 22, 2011

The beginning...

It all started with the notion that to play board games like Tic-tac-toe, Connect Four, Checkers, Chess, Shogi and Go, the strategy to play each game may not be obvious, yet there are patterns that repeat across all these games.

In the case of Tic-tac-toe a simple min-max search may do well. Similarly, in Chess, if we apply min-max along with a host of extra heuristics to guide the search for the best move, we can end up with very strong play. However, for games like Go we know that the search space is too vast for a pure min-max approach to be practical.

Yet, when humans play these games there are often more abstract lessons learned in one game that can be applied in others. With application-specific approaches to AI these lessons are not easily transferable from one game to another.

I've always been a big fan of genetic algorithms and amazed at the novel solutions found in nature employed by Darwinian evolution. One particular variant in particular, genetic programming, where actual computer programs are evolved, is of special interest to me.

As opposed to application-specific search algorithms like min-max or machine learning algorithms like neural networks for pattern recognition, genetic programming is particularly applicable when relatively little is known about the problem domain, or when very little is known about what kind of shape a good solution would take. For example, it may be that a neural network may give us a somewhat good solution but without further degrees of freedom we will never know if for example a stochastic approach combined with a neural network would give us an even better solution.

Genetic programming has the potential to evolve any existing AI approach or combination of such and even to evolve brand new algorithms. In a sense genetic programming is the super set of all learning algorithms.

So I decided to start a hobby project where I used a genetic programming approach to evolve computer programs that can play board games without domain-specific logic or application-specific AI.

In a future post I will go into what exactly is being evolved and how the system teaches itself the rules. But for anyone just starting out with genetic programming, I will first cover some of the resources that helped me out enormously when learning about all this.