CharlieAI: Tic-tac-toe

Showing posts with label Tic-tac-toe. Show all posts

Thursday, March 3, 2011

Hall-of-fame archives

To solve the problem with strategy loops, I implemented a hall-of-fame archive - a collection of the best individual from each generation that led to a new "champion". Generations that did not beat the opposing population were ignored.

Fitness was then measured as an individual's performance against the opposing population's entire archive. Only if all members could be beaten, would a new champion emerge and get added to its population's archive. Thus, loops were made impossible and only definite progress was allowed.

One issue with this approach was the increasing CPU costs of evaluating individuals as the archives expanded, but this is the approach I still use today. It would be interesting to try only using the e.g. ten most recent, and therefore strongest, members of the archive as that would still make progress likely, although not guaranteed, at a much reduced CPU cost.

Using hall-of-fame archives resulted in much stronger Tic-tac-toe players than previously achieved. At first a new champion would be evolved every few generations, but after about a few dozen members in the archive the time between new champions increased, seemingly exponentially, and thus stagnated - even if I left the simulation running for weeks.

While the evolved programs were much stronger than any I had previously evolved, they were still not as strong as I had expected. If I played certain patterns they would block well and eventually win, but they were playing far from perfectly and were still pretty easy to beat.

I was under no illusion that my search space was not huge, but I had thought that for a game as simple as Tic-tac-toe it would not be unrealistic to evolve a perfect player, so that the game would always end in a draw or a win for the AI.

I thought there must be something I could do to improve things, but didn't know exactly what; it was back to the drawing board...

Sunday, February 27, 2011

Co-evolution

The main reason for introducing co-evolution was to provide gradual selection pressure. Instead of just evolving one population against a fixed fitness measure I instead setup two populations, A and B, each initialised with random individuals. I then picked a random individual from A and evolved B until it produced a player that could beat the individual from A. This individual, now the strongest from B, was used to evolve A. Once some player from A could beat it, I swapped again and so forth:

Select random individual from A, Best of A, to get things started.
Evolve B until it can beat Best of A and thus select Best of B.
Evolve A until it can beat Best of B and thus select Best of A.
Repeat from step 2.

This meant that small steps were continually taken to improve the AI and, in theory, I should have ended up with much stronger players than when evolving against Mr Random, as per my previous post on rule inference. That was not the case however - when playing against the best individuals it was evident that they were pretty poor, not much stronger than the ones evolved against Mr Random.

I tried much larger populations and leaving it running for several days on more powerful PCs but to no avail, so I decided to analyse the evolution process and the individual generations in more detail...

Saturday, February 26, 2011

Rule inference

At that point I was ready to start on the board game AI, and I decided to experiment with a Tic-tac-toe player first; once I had evolved a strong player of this game, I could then move onto more complex games. And since I was going to include only minimal game-specific code, the system would be easy to apply to other games. Individuals playing one game could even be used in evolution of players for other games.

But how exactly do you do that? You could program a very simple individual by hand that just makes very simple, but valid, moves and then use that as the starting point for the evolution run. I decided that was not general enough and had the potential to send the search in the wrong direction, especially for more complex games like Go.

Instead I decided to just expose a function to get the status (blank, black or white) of an individual board square at position (x, y) and another to make a move at a particular square at position (x', y'). It should then be possible to evolve programs that could examine the state across the board, apply some logic, and then make a move accordingly. In the first generation of a run, all programs were initialised completely randomly.

If a program tried to read the status outside the boundaries of the board it would just return blank, and if it tried to make a move outside the boundaries or where another piece was already placed, it would simply have no effect. The analogy is a natural environment where e.g. fish that mutate and evolve to swim into rocks simply just get stopped in their way (and ultimately die of hunger and become an extinct species). So all I was providing was a set of "laws of nature" for a game environment, in which individuals could be evolved with any conceivable strategy.

As a fitness measure, individuals would at first play against a dumb training agent, Mr Random, which I wrote that would simply just make random valid moves. They would each play 3 times as white and 3 times as black and the proportion of games won was a direct measure of an individual's fitness.

Early generations would mostly make invalid moves, so consequently they didn't do very well, even against random play. After numerous generations, however, I observed the fittest individuals at play and, as I had hoped, they had learned to play by the rules! This was a simple side effect of only rewarding individuals that happened, by chance, to play valid moves.

When I tried playing against the fittest individuals it did of course turn out to be very easy to beat them, as the hardest opponent they had ever met was Mr Random. Thus, the next step was to implement co-evolution...

Thursday, February 24, 2011

Introduction to genetic programming

As a taster of genetic programming, there are some excellent slides from a course at University of Edinburgh, which covers a lot of theory behind genetic algorithms and, more specifically, genetic programming:
"Genetic Algorithms and Genetic Programming 2009-10".

My favourite book on the subject is "Genetic Programming: An Introduction" by Banzhaf et al. It is a well-written and concise introduction that will give you a sound grounding in all the theory necessary to evolve your own GP populations.

Once you are ready to either tweak existing GP systems or write your own from scratch, "A Field Guide to Genetic Programming" by Poli et al is a priceless resource. It is also available for free as pdf.

After reading through these slides and books (as well as some that turned out to be less helpful), I decided on a tree-based GP approach, where the programs evolved are Lisp-like structures. Below is a simple example from a Tic-tac-toe player. The program's output, in this case, are the MAKEMOVE nodes and each node computes its value based on any subtrees it may have, including the current board state read from SQ_X_Y nodes.

But first some more info on how I got to the stage where I could generate programs like the one above...

Tuesday, February 22, 2011

The beginning...

It all started with the notion that to play board games like Tic-tac-toe, Connect Four, Checkers, Chess, Shogi and Go, the strategy to play each game may not be obvious, yet there are patterns that repeat across all these games.

In the case of Tic-tac-toe a simple min-max search may do well. Similarly, in Chess, if we apply min-max along with a host of extra heuristics to guide the search for the best move, we can end up with very strong play. However, for games like Go we know that the search space is too vast for a pure min-max approach to be practical.

Yet, when humans play these games there are often more abstract lessons learned in one game that can be applied in others. With application-specific approaches to AI these lessons are not easily transferable from one game to another.

I've always been a big fan of genetic algorithms and amazed at the novel solutions found in nature employed by Darwinian evolution. One particular variant in particular, genetic programming, where actual computer programs are evolved, is of special interest to me.

As opposed to application-specific search algorithms like min-max or machine learning algorithms like neural networks for pattern recognition, genetic programming is particularly applicable when relatively little is known about the problem domain, or when very little is known about what kind of shape a good solution would take. For example, it may be that a neural network may give us a somewhat good solution but without further degrees of freedom we will never know if for example a stochastic approach combined with a neural network would give us an even better solution.

Genetic programming has the potential to evolve any existing AI approach or combination of such and even to evolve brand new algorithms. In a sense genetic programming is the super set of all learning algorithms.

So I decided to start a hobby project where I used a genetic programming approach to evolve computer programs that can play board games without domain-specific logic or application-specific AI.

In a future post I will go into what exactly is being evolved and how the system teaches itself the rules. But for anyone just starting out with genetic programming, I will first cover some of the resources that helped me out enormously when learning about all this.