Showing posts with label Charlie. Show all posts
Showing posts with label Charlie. Show all posts

Monday, April 11, 2011

Symbolic regression results

An important role of this blog is to act as a journal of my experiments. Onwards from now I will seek to document my experiments with enough detail to allow others to compare results, and so that I can compare results myself at a later date. So here we go...

I managed to dig out some notes from my early Charlie tests - at first I experimented using the following setup:

Function nodesAdd, Subtract, Multiply and Divide
Terminal nodes0, 1, 2 and x
Crossover probability90%
Reproduction probability9%
Mutation probability1%
Mutation depth2
Max tree size (number of nodes)unlimited
Max tree depthunlimited

Below is a table of experiments I made with the above configuration. The function that I was trying to evolve is on the left. Population size and maximum generations are also listed along with some comments on the results:

FunctionPopulation SizeGenerationsResult
7.2x3+x2-3x1000050No perfect solution and struggling.
7x3+x2-3x1000050Perfect solution in 16 generations
7x+x2-3x1000050Perfect solution in 1 generation
7.2x+x2-3x1000050No perfect solution
7.2x1000050No perfect solution. Best fitness score: ~442
7.21000050No perfect solution

I was surprised that the system struggled to generate a sub tree equal or close to 7.2. It seemed to evolve integers just fine but struggled with fractions. I therefore tried introducing two further terminal nodes: the constants 4 and 8 in the hope of guiding the evolution to the right ballpark figures. I repeated the last experiment above, but still got no perfect or even approximate solution:

FunctionPopulation SizeGenerationsResult
7.21000050No perfect solution

I got desperate and decided to introduce ephemeral constants, which was on my to-do list anyway. I also removed constants 4 and 8 to reduce the search space a bit again. This resulted in the following figures:

FunctionPopulation SizeGenerationsResult
7.21000050No perfect solution, but very close (~0.0001)
7.2x3+x2-3x1000050No perfect solution but no longer struggling.
7x3+x2-3x1000050No perfect solution.
7x+x2-3x1000050Perfect solution in 1 generation
7.2x+x2-3x1000050No perfect solution
7.2x1000050No perfect solution. Best fitness score: ~13
7.2x10000043No perfect solution but very close
7.2x10000100No perfect solution. Best fitness score: ~12
sqrt(x)1000050No perfect solution. Best score: ~34
7x+x2-3x+x31000050Perfect solution in 6 generations.

The results were better but by no means amazing. I then came across a subtle bug that meant some arguments were not forwarded to sub trees, and after fixing that I got the following results:

FunctionPopulation SizeGenerationsResult
7.21000050No perfect solution, but very close (~0.00001)
7.2x3+x2-3x1000050No perfect solution but no longer struggling.
7x3+x2-3x1000050Perfect solution in 12 generations.
7x+x2-3x1000050Perfect solution in 1 generation
7.2x+x2-3x1000050No perfect solution, but very close (~0.0001)
7.2x1000050No perfect solution, but very close: ~10e-6.
7.2x10000031No perfect solution but very close: ~10e-6.
sqrt(x)1000050No perfect solution. Best score: ~0.036
7x+x2-3x+x31000050Perfect solution in 3 generations.

That was more like it! :) It was time to try the Tic-tac-toe experiments with Charlie...

Tuesday, March 8, 2011

Introducing Charlie

A few years ago I decided that to fully understand the dynamics of genetic programming, I would have to implement my own framework. Beagle had proved incredibly useful, but I required full control of the entire software design, down to minute details, so I had to write my own.

Named after a dog we had when I was a kid, and after a certain pioneer in the field of evolution, I called my new framework Charlie (if you were wondering why this blog was called CharlieAI, you now have your answer!)

I will spare you the implementation details and instead list the feature set I ended up with:

  • Tree-based genomes
  • Cross-over, mutation and re-selection operators
  • Node and sub-tree mutation
  • Tournament-based selection
  • Ephemeral random constants
  • Population statistics
  • Lisp-style and graph in .gif file output (Graphviz)
  • Multi-threaded evaluation
  • Single and co-evolution
  • Hall-of-fame archives

This was all built over six months or so, but I only had a few hours of time to devote every week so the whole thing could easily have been done in something like two weeks of full-time work. I must admit, though, that it was nice to have ample thinking time between implementation steps, to ensure it all fitted well together and performed efficiently.

Next I will cover the results I got from running symbolic regression experiments with this framework. In the meantime, if you would like any more details on any of the above features or any aspects of the software, do let me know and I will attempt to cover it in a future post.