Saturday, February 26, 2011

Rule inference

At that point I was ready to start on the board game AI, and I decided to experiment with a Tic-tac-toe player first; once I had evolved a strong player of this game, I could then move onto more complex games. And since I was going to include only minimal game-specific code, the system would be easy to apply to other games. Individuals playing one game could even be used in evolution of players for other games.

But how exactly do you do that? You could program a very simple individual by hand that just makes very simple, but valid, moves and then use that as the starting point for the evolution run. I decided that was not general enough and had the potential to send the search in the wrong direction, especially for more complex games like Go.

Instead I decided to just expose a function to get the status (blank, black or white) of an individual board square at position (x, y) and another to make a move at a particular square at position (x', y'). It should then be possible to evolve programs that could examine the state across the board, apply some logic, and then make a move accordingly. In the first generation of a run, all programs were initialised completely randomly.

If a program tried to read the status outside the boundaries of the board it would just return blank, and if it tried to make a move outside the boundaries or where another piece was already placed, it would simply have no effect. The analogy is a natural environment where e.g. fish that mutate and evolve to swim into rocks simply just get stopped in their way (and ultimately die of hunger and become an extinct species). So all I was providing was a set of "laws of nature" for a game environment, in which individuals could be evolved with any conceivable strategy.

As a fitness measure, individuals would at first play against a dumb training agent, Mr Random, which I wrote that would simply just make random valid moves. They would each play 3 times as white and 3 times as black and the proportion of games won was a direct measure of an individual's fitness.

Early generations would mostly make invalid moves, so consequently they didn't do very well, even against random play. After numerous generations, however, I observed the fittest individuals at play and, as I had hoped, they had learned to play by the rules! This was a simple side effect of only rewarding individuals that happened, by chance, to play valid moves.

When I tried playing against the fittest individuals it did of course turn out to be very easy to beat them, as the hardest opponent they had ever met was Mr Random. Thus, the next step was to implement co-evolution...

No comments:

Post a Comment