Re: Factor

Factor: the language, the theory, and the practice.

Genetic Hello World

Thursday, January 6, 2011

#math

A recent article called Genetic Algorithm for Hello World describes the basic concepts involved in programming genetic algorithms. The original implementation is in Javascript, and has a nice online simulator. I thought it would be fun to contribute an implementation in Factor.

Genetic algorithms are generally comprised of the following concepts:

  1. A target chromosome which expresses a possible solution to the problem
  2. A fitness function which takes a chromosome as input and returns a higher value for better solutions
  3. A population which is just a set of many chromosomes
  4. A selection method which determines how parents are selected for breeding from the population
  5. A crossover operation which determines how parents combine to produce offspring
  6. A mutation operation which determines how random deviations manifest themselves

First, we need to define the vocabularies that we will use and a namespace:

USING: fry kernel make math math.order ranges random
sequences ;

IN: hello-ga

Target

Our goal is to generate the target string “Hello World!”.

CONSTANT: TARGET "Hello World!"

Fitness

The fitness of a chromosome is the “sum of the character-wise differences between the chromosome and the target string”.

: fitness ( chromosome -- n )
    TARGET 0 [ - abs - ] 2reduce ;

Population

Our starting population will be made up by “creating 400 totally random 12 character strings”.

CONSTANT: POPULATION 400

: random-chromosome ( -- chromosome )
    TARGET length [ 256 random ] "" replicate-as ;

: random-population ( -- seq )
    POPULATION [ random-chromosome ] replicate ;

Selection

We select two parents to survive or breed into the next generation by “taking two members of the population at complete random and keep the fittest as the first parent, then do the same with another two members and keep the fittest as the other parent”.

: fittest ( parent1 parent2 -- parent1' parent2' )
    2dup [ fitness ] bi@ > [ swap ] when ;

: tournament ( seq -- parent )
    dup [ random ] bi@ fittest nip ;

: parents ( seq -- parent1 parent2 )
    dup [ tournament ] bi@ ;

Crossover

After choosing two parents, we want to “ensure their genes survive in the next generation”. Sometimes (10% chance) the parents survive into the next generation, but most frequently, we mix the parents by picking a split point and then making one child from the head of parent1 plus tail of parent2 and another from the head of parent2 plus tail of parent1.

CONSTANT: CHILDREN-PROBABILITY 0.9

: children? ( -- ? )
    0.0 1.0 uniform-random-float CHILDREN-PROBABILITY < ;

: head/tail ( seq1 seq2 n -- head1 tail2 )
    [ head ] [ tail ] bi-curry bi* ;

: tail/head ( seq1 seq2 n -- tail1 head2 )
    [ tail ] [ head ] bi-curry bi* ;

: children ( parent1 parent2 -- child1 child2 )
    TARGET length 1 - [1..b) random
    [ head/tail append ] [ tail/head prepend ] 3bi ;

Mutation

When a mutation occurs (20% chance), we “choose a random position and alter the character that’s there by up to 5 places”.

CONSTANT: MUTATION-PROBABILITY 0.2

: mutation? ( -- ? )
    0.0 1.0 uniform-random-float MUTATION-PROBABILITY < ;

: mutate ( chromosome -- chromosome' )
    dup length random over [ -5 5 [a..b] random + ] change-nth ;

Simulation

Perform a single generation by selecting the parents (or children) and allowing mutations to occur. We do this in such a way that the number of chromosomes in each generation remains constant.

: (1generation) ( seq -- child1 child2 )
    parents children? [ children ] when
    mutation? [ [ mutate ] bi@ ] when ;

: 1generation ( seq -- seq' )
    [ length 2 / ] keep
    '[ _ [ _ (1generation) , , ] times ] { } make ;

Computing all generations required to achieve the target string is fairly easy:

: finished? ( seq -- ? )
    TARGET swap member? ;

: all-generations ( seq -- seqs )
    [
        [ 1generation dup , dup finished? not ] loop drop
    ] { } make ;

Try It

Compute all generations for a random population.

IN: scratchpad random-population all-generations

See how many generations it took.

IN: scratchpad dup length .
56

See how the fitness of the best chromosome changes over the generations.

IN: scratchpad dup [ [ fitness ] [ max ] map-reduce ] map .
{
    -414
    -382
    -329
    -288
    -271
    -238
    -217
    -191
    -169
    -167
    -160
    -143
    -134
    -113
    -119
    -94
    -83
    -79
    -63
    -59
    -61
    -54
    -50
    -43
    -39
    -38
    -36
    -31
    -32
    -32
    -28
    -27
    -22
    -18
    -15
    -15
    -14
    -13
    -13
    -11
    -11
    -10
    -8
    -7
    -7
    -7
    -6
    -6
    -4
    -4
    -3
    -3
    -2
    -2
    -2
    0
}

The code is available on my GitHub.