Saturday, May 7, 2011

Loops in Genetic Programming

Dealing with loops and other iterative constructs in GP is far from trivial. You have to deal with closure and you have to avoid infinite or lengthy fitness evaluations. It only takes a few nested loops of a million iterations each in your individuals to grind everything to (nearly) a halt.

This is perhaps why loops have not seen much use in the GP field. Experiments with Explicit For-loops in Genetic Programming by Vic Ciesielski and Xiang Li provides good coverage of related work.

I ended up using a binary variant of protected for-loops where the first sub tree provides the number of iterations, and the second sub tree, the body of the loop, is then iterated. I did not restrict the number of loop nodes per program, but instead enforced an overall limit of 1000 iterations per program.

To give the programs something to work with, I also introduced a few variable nodes to read and write the variables X and Y. X was always set to the current iteration of a loop.

This allowed me to evolve a perfect solution to the "100 First Even Numbers" sequence in just 8 generations amongst a population of 10,000! :) The run parameters as well as the tree of the perfect solution can be seen below:


Function nodesAdd, Subtract, Multiply, Divide, SetX, SetY, OutputNumber, Loop
Terminal nodes0, 1, 2, Ephemeral Constant (0-10), GetX, GetY
Crossover probability90%
Reproduction probability9%
Mutation probability1%
Mutation depth2
Max tree size (number of nodes)30
Max tree depth10


2 comments:

  1. Hi Stig,

    I don't understand this tree ... Could you explain it in a bit more detail? The stuff off to the left side has me particularly confused; I'm not seeing what constants like 9.02325 and 9.19243 have to with determining whether numbers are odd or even.

    ReplyDelete
  2. Hi Paul,

    Consider the second (bottom one) of the two LOOP nodes. The left sub-tree tells the loop how many times to iterate the right sub-tree. Therefore, all that the left sub-tree really provides here is a high number. So 100 or greater would do the trick in this case. As it happens, the number is much greater but in the fitness function anything after the first 100 output numbers are ignored.

    The right sub-tree is where the main logic happens. Based on X (which is the current iteration cycle, starting at 0) it outputs X+X+2, which will result in even numbers starting from 2 (another way to write that would be 2X+2).

    So there is a lot of "baggage" in this tree that doesn't do anything useful, but that is quite common in genetic programming. Useless nodes are referred to as introns and may actually have benefit in future generations even if dormant in the current one...

    I hope that helps.

    ReplyDelete