CharlieAI

Tuesday, March 8, 2011

Introducing Charlie

A few years ago I decided that to fully understand the dynamics of genetic programming, I would have to implement my own framework. Beagle had proved incredibly useful, but I required full control of the entire software design, down to minute details, so I had to write my own.

Named after a dog we had when I was a kid, and after a certain pioneer in the field of evolution, I called my new framework Charlie (if you were wondering why this blog was called CharlieAI, you now have your answer!)

I will spare you the implementation details and instead list the feature set I ended up with:

Tree-based genomes
Cross-over, mutation and re-selection operators
Node and sub-tree mutation
Tournament-based selection
Ephemeral random constants
Population statistics
Lisp-style and graph in .gif file output (Graphviz)
Multi-threaded evaluation
Single and co-evolution
Hall-of-fame archives

This was all built over six months or so, but I only had a few hours of time to devote every week so the whole thing could easily have been done in something like two weeks of full-time work. I must admit, though, that it was nice to have ample thinking time between implementation steps, to ensure it all fitted well together and performed efficiently.

Next I will cover the results I got from running symbolic regression experiments with this framework. In the meantime, if you would like any more details on any of the above features or any aspects of the software, do let me know and I will attempt to cover it in a future post.

Thursday, March 3, 2011

Hall-of-fame archives

To solve the problem with strategy loops, I implemented a hall-of-fame archive - a collection of the best individual from each generation that led to a new "champion". Generations that did not beat the opposing population were ignored.

Fitness was then measured as an individual's performance against the opposing population's entire archive. Only if all members could be beaten, would a new champion emerge and get added to its population's archive. Thus, loops were made impossible and only definite progress was allowed.

One issue with this approach was the increasing CPU costs of evaluating individuals as the archives expanded, but this is the approach I still use today. It would be interesting to try only using the e.g. ten most recent, and therefore strongest, members of the archive as that would still make progress likely, although not guaranteed, at a much reduced CPU cost.

Using hall-of-fame archives resulted in much stronger Tic-tac-toe players than previously achieved. At first a new champion would be evolved every few generations, but after about a few dozen members in the archive the time between new champions increased, seemingly exponentially, and thus stagnated - even if I left the simulation running for weeks.

While the evolved programs were much stronger than any I had previously evolved, they were still not as strong as I had expected. If I played certain patterns they would block well and eventually win, but they were playing far from perfectly and were still pretty easy to beat.

I was under no illusion that my search space was not huge, but I had thought that for a game as simple as Tic-tac-toe it would not be unrealistic to evolve a perfect player, so that the game would always end in a draw or a win for the AI.

I thought there must be something I could do to improve things, but didn't know exactly what; it was back to the drawing board...

Wednesday, March 2, 2011

Coevolutionary Principles

Before I continue my project log, I wanted to point you at an excellent chapter that I came across today:

Coevolutionary Principles (PDF)

The chapter is an extensive overview of co-evolution and a summary of progress made in the last two decades.

It is by Popovici, E., Bucci, A., Wiegand, P. and De Jong E.D. and from Handbook of Natural Computing, Springer-Verlag, which will be out later this month.

Tuesday, March 1, 2011

Strategy loops

After close examination of the tactics employed by the best individuals from successive generations, I noticed that strategies that were the best in early generations got superseded, as you would expect, but then reappeared several generations later!

While this at first was somewhat puzzling there turned out to be a logical explanation: If program x is always beaten by program y, and y by program z, then that does not mean that x cannot beat z. In practice it was more like this:

program a from population A is randomly selected
program b from population B beats a
c from A beats b
d from B beats c
e from A beats d
f from B beats e
g from A beats f
...
m from A beats l
n, equivalent to f, from B beats m
repeats from step 7.

So this was why the simulation never produced strong players.

Next, I will describe how I solved this problem, but I'd be interested to hear from you if you have any suggestions.

Sunday, February 27, 2011

Co-evolution

The main reason for introducing co-evolution was to provide gradual selection pressure. Instead of just evolving one population against a fixed fitness measure I instead setup two populations, A and B, each initialised with random individuals. I then picked a random individual from A and evolved B until it produced a player that could beat the individual from A. This individual, now the strongest from B, was used to evolve A. Once some player from A could beat it, I swapped again and so forth:

Select random individual from A, Best of A, to get things started.
Evolve B until it can beat Best of A and thus select Best of B.
Evolve A until it can beat Best of B and thus select Best of A.
Repeat from step 2.

This meant that small steps were continually taken to improve the AI and, in theory, I should have ended up with much stronger players than when evolving against Mr Random, as per my previous post on rule inference. That was not the case however - when playing against the best individuals it was evident that they were pretty poor, not much stronger than the ones evolved against Mr Random.

I tried much larger populations and leaving it running for several days on more powerful PCs but to no avail, so I decided to analyse the evolution process and the individual generations in more detail...

Saturday, February 26, 2011

Rule inference

At that point I was ready to start on the board game AI, and I decided to experiment with a Tic-tac-toe player first; once I had evolved a strong player of this game, I could then move onto more complex games. And since I was going to include only minimal game-specific code, the system would be easy to apply to other games. Individuals playing one game could even be used in evolution of players for other games.

But how exactly do you do that? You could program a very simple individual by hand that just makes very simple, but valid, moves and then use that as the starting point for the evolution run. I decided that was not general enough and had the potential to send the search in the wrong direction, especially for more complex games like Go.

Instead I decided to just expose a function to get the status (blank, black or white) of an individual board square at position (x, y) and another to make a move at a particular square at position (x', y'). It should then be possible to evolve programs that could examine the state across the board, apply some logic, and then make a move accordingly. In the first generation of a run, all programs were initialised completely randomly.

If a program tried to read the status outside the boundaries of the board it would just return blank, and if it tried to make a move outside the boundaries or where another piece was already placed, it would simply have no effect. The analogy is a natural environment where e.g. fish that mutate and evolve to swim into rocks simply just get stopped in their way (and ultimately die of hunger and become an extinct species). So all I was providing was a set of "laws of nature" for a game environment, in which individuals could be evolved with any conceivable strategy.

As a fitness measure, individuals would at first play against a dumb training agent, Mr Random, which I wrote that would simply just make random valid moves. They would each play 3 times as white and 3 times as black and the proportion of games won was a direct measure of an individual's fitness.

Early generations would mostly make invalid moves, so consequently they didn't do very well, even against random play. After numerous generations, however, I observed the fittest individuals at play and, as I had hoped, they had learned to play by the rules! This was a simple side effect of only rewarding individuals that happened, by chance, to play valid moves.

When I tried playing against the fittest individuals it did of course turn out to be very easy to beat them, as the hardest opponent they had ever met was Mr Random. Thus, the next step was to implement co-evolution...

Friday, February 25, 2011

Open BEAGLE

When I first started on my project, a few years ago, I decided to save time by making use of an existing tree-based GP system. I did some research into the various alternatives and by far the best framework I came across was Christian Gagné's Open BEAGLE. It is well-designed, flexible, easily extendable and very efficient.

Open BEAGLE also has a Yahoo mailing list offering help and support along the way.

As an initial test of the system, and my understanding of it, I played around with symbolic and numerical regression, where the input and output to a mathematical function is known but the function itself must be evolved.

All you need to do is loop over a series of input data (e.g. the integers 1 to 100), run some function (e.g. x²+x+1) on the input, store the output for fitness evaluation, and then provide only the input data to your population along with some basic operators (e.g. addition, subtraction, multiplication and division). Fitness is determined as the negated magnitude of the error when comparing an individual's output with the known output - the closer the error is to zero, the fitter the individual. Thus, if the error is exactly zero, you have a perfect fitness case.

If everything works you will end up with a program that matches the function exactly, after a few generations of evolution, e.g. for x²+x+1:

Next, I tried if this system could evolve more elaborate functions like 4x³+2x-27. I played around with how many generations were required and how large a population was necessary to get exact solutions. I also tried using a square root function without exposing a square root operator to the GP system, and was amazed when the GP system found very good approximate solutions within a few dozen generations of a few thousand individuals!