Monday, April 11, 2011

Symbolic regression results

An important role of this blog is to act as a journal of my experiments. Onwards from now I will seek to document my experiments with enough detail to allow others to compare results, and so that I can compare results myself at a later date. So here we go...

I managed to dig out some notes from my early Charlie tests - at first I experimented using the following setup:

Function nodesAdd, Subtract, Multiply and Divide
Terminal nodes0, 1, 2 and x
Crossover probability90%
Reproduction probability9%
Mutation probability1%
Mutation depth2
Max tree size (number of nodes)unlimited
Max tree depthunlimited

Below is a table of experiments I made with the above configuration. The function that I was trying to evolve is on the left. Population size and maximum generations are also listed along with some comments on the results:

FunctionPopulation SizeGenerationsResult
7.2x3+x2-3x1000050No perfect solution and struggling.
7x3+x2-3x1000050Perfect solution in 16 generations
7x+x2-3x1000050Perfect solution in 1 generation
7.2x+x2-3x1000050No perfect solution
7.2x1000050No perfect solution. Best fitness score: ~442
7.21000050No perfect solution

I was surprised that the system struggled to generate a sub tree equal or close to 7.2. It seemed to evolve integers just fine but struggled with fractions. I therefore tried introducing two further terminal nodes: the constants 4 and 8 in the hope of guiding the evolution to the right ballpark figures. I repeated the last experiment above, but still got no perfect or even approximate solution:

FunctionPopulation SizeGenerationsResult
7.21000050No perfect solution

I got desperate and decided to introduce ephemeral constants, which was on my to-do list anyway. I also removed constants 4 and 8 to reduce the search space a bit again. This resulted in the following figures:

FunctionPopulation SizeGenerationsResult
7.21000050No perfect solution, but very close (~0.0001)
7.2x3+x2-3x1000050No perfect solution but no longer struggling.
7x3+x2-3x1000050No perfect solution.
7x+x2-3x1000050Perfect solution in 1 generation
7.2x+x2-3x1000050No perfect solution
7.2x1000050No perfect solution. Best fitness score: ~13
7.2x10000043No perfect solution but very close
7.2x10000100No perfect solution. Best fitness score: ~12
sqrt(x)1000050No perfect solution. Best score: ~34
7x+x2-3x+x31000050Perfect solution in 6 generations.

The results were better but by no means amazing. I then came across a subtle bug that meant some arguments were not forwarded to sub trees, and after fixing that I got the following results:

FunctionPopulation SizeGenerationsResult
7.21000050No perfect solution, but very close (~0.00001)
7.2x3+x2-3x1000050No perfect solution but no longer struggling.
7x3+x2-3x1000050Perfect solution in 12 generations.
7x+x2-3x1000050Perfect solution in 1 generation
7.2x+x2-3x1000050No perfect solution, but very close (~0.0001)
7.2x1000050No perfect solution, but very close: ~10e-6.
7.2x10000031No perfect solution but very close: ~10e-6.
sqrt(x)1000050No perfect solution. Best score: ~0.036
7x+x2-3x+x31000050Perfect solution in 3 generations.

That was more like it! :) It was time to try the Tic-tac-toe experiments with Charlie...