Showing posts with label symbolic regression. Show all posts
Showing posts with label symbolic regression. Show all posts

Monday, April 11, 2011

Symbolic regression results

An important role of this blog is to act as a journal of my experiments. Onwards from now I will seek to document my experiments with enough detail to allow others to compare results, and so that I can compare results myself at a later date. So here we go...

I managed to dig out some notes from my early Charlie tests - at first I experimented using the following setup:

Function nodesAdd, Subtract, Multiply and Divide
Terminal nodes0, 1, 2 and x
Crossover probability90%
Reproduction probability9%
Mutation probability1%
Mutation depth2
Max tree size (number of nodes)unlimited
Max tree depthunlimited

Below is a table of experiments I made with the above configuration. The function that I was trying to evolve is on the left. Population size and maximum generations are also listed along with some comments on the results:

FunctionPopulation SizeGenerationsResult
7.2x3+x2-3x1000050No perfect solution and struggling.
7x3+x2-3x1000050Perfect solution in 16 generations
7x+x2-3x1000050Perfect solution in 1 generation
7.2x+x2-3x1000050No perfect solution
7.2x1000050No perfect solution. Best fitness score: ~442
7.21000050No perfect solution

I was surprised that the system struggled to generate a sub tree equal or close to 7.2. It seemed to evolve integers just fine but struggled with fractions. I therefore tried introducing two further terminal nodes: the constants 4 and 8 in the hope of guiding the evolution to the right ballpark figures. I repeated the last experiment above, but still got no perfect or even approximate solution:

FunctionPopulation SizeGenerationsResult
7.21000050No perfect solution

I got desperate and decided to introduce ephemeral constants, which was on my to-do list anyway. I also removed constants 4 and 8 to reduce the search space a bit again. This resulted in the following figures:

FunctionPopulation SizeGenerationsResult
7.21000050No perfect solution, but very close (~0.0001)
7.2x3+x2-3x1000050No perfect solution but no longer struggling.
7x3+x2-3x1000050No perfect solution.
7x+x2-3x1000050Perfect solution in 1 generation
7.2x+x2-3x1000050No perfect solution
7.2x1000050No perfect solution. Best fitness score: ~13
7.2x10000043No perfect solution but very close
7.2x10000100No perfect solution. Best fitness score: ~12
sqrt(x)1000050No perfect solution. Best score: ~34
7x+x2-3x+x31000050Perfect solution in 6 generations.

The results were better but by no means amazing. I then came across a subtle bug that meant some arguments were not forwarded to sub trees, and after fixing that I got the following results:

FunctionPopulation SizeGenerationsResult
7.21000050No perfect solution, but very close (~0.00001)
7.2x3+x2-3x1000050No perfect solution but no longer struggling.
7x3+x2-3x1000050Perfect solution in 12 generations.
7x+x2-3x1000050Perfect solution in 1 generation
7.2x+x2-3x1000050No perfect solution, but very close (~0.0001)
7.2x1000050No perfect solution, but very close: ~10e-6.
7.2x10000031No perfect solution but very close: ~10e-6.
sqrt(x)1000050No perfect solution. Best score: ~0.036
7x+x2-3x+x31000050Perfect solution in 3 generations.

That was more like it! :) It was time to try the Tic-tac-toe experiments with Charlie...

Friday, February 25, 2011

Open BEAGLE

When I first started on my project, a few years ago, I decided to save time by making use of an existing tree-based GP system. I did some research into the various alternatives and by far the best framework I came across was Christian Gagné's Open BEAGLE. It is well-designed, flexible, easily extendable and very efficient.

Open BEAGLE also has a Yahoo mailing list offering help and support along the way.

As an initial test of the system, and my understanding of it, I played around with symbolic and numerical regression, where the input and output to a mathematical function is known but the function itself must be evolved.

All you need to do is loop over a series of input data (e.g. the integers 1 to 100), run some function (e.g. x2+x+1) on the input, store the output for fitness evaluation, and then provide only the input data to your population along with some basic operators (e.g. addition, subtraction, multiplication and division). Fitness is determined as the negated magnitude of the error when comparing an individual's output with the known output - the closer the error is to zero, the fitter the individual. Thus, if the error is exactly zero, you have a perfect fitness case.

If everything works you will end up with a program that matches the function exactly, after a few generations of evolution, e.g. for x2+x+1:


Next, I tried if this system could evolve more elaborate functions like 4x3+2x-27. I played around with how many generations were required and how large a population was necessary to get exact solutions. I also tried using a square root function without exposing a square root operator to the GP system, and was amazed when the GP system found very good approximate solutions within a few dozen generations of a few thousand individuals!