Writing my first evoling algorithm

phillip1882 · March 16, 2009

so, i've been a programmer now going on 3 years, and i know how to program in c, java, and recently i've learned python. python so far is my personal favourite. i've been reading up on evolving algorithms, but most websites only give theory without much code. so i thought what the heck, let's try my hand at it. my problem to tackle, writing an evolving algorithm that can play the game baduk (also known as go) relatively well.

i've written most of the evolution part, and am now at the point of writing the fitness function, which is probably the most difficult part.

in case your interested, here's my code (its in python)

#interpret the rule correctly
def analizerule(boardstate,ruleset):
  stringp = ""
  stringr = bstr(ruleset) #convert the rule to a binary string.
  for i in range(0,42):
     stringp = stringp +decrypt(i%2,stringr[3*i:3*i+3])
     #convert the string into ops, funcs, and values.
  for i in range(0,len(stringp)):
     if stringp[i] = "S": #execute skip function
        stringp = remove(stringp,i) 
  j = 0
  for i in range(0,len(stringp)):
     if string[i] = "L": #execute loop function
        stringp = add(j,stringp,i):
        j = i
  for i in range(0,len(string)):
     if stringp = "G": #execute get function
        stringp = replace(stringp,i)
  for i in range(0,len(stringp),2): #perform the operations.
     if stringp[i] == "&":
        boardstate = boardstate & int(stringp[i+1])
     if stringp[i] == "|":
        boardstate = boardstate | int(stringp[i+1])
     if stringp[i] == "^":
        boardstate = boardstate ^ int(stringp[i+1])
     if stringp[i] == "<":
        boardstate = boardstate << int(stringp[i+1])
     if stringp[i] == ">":
        boardstate = boardstate >> int(stringp[i+1])
  return boardstate

def bstr(value):
  string = ""
  while value > 0:
     temp = value % 2
     value = (value -temp)/2
     string = str(temp) +string
  while len(string) < 126:
     string = "0" +string
  return string

def decrypt(check, code):
  if check == 0: 
     if code == "000":
        return "&"
     if code == "001":
        return "|"
     if code == "010":
        return "^"
     if code == "011":
        return "<"
     if code == "100":
        return ">"
     if code == "101":
        return "S"
     if code == "110":
        return "L"
     if code == "111":
        return "G"
  if check == 1:
     if code == "000":
        return "1"
     if code == "001":
        return "2"
     if code == "010":
        return "3"
     if code == "011":
        return "4"
     if code == "100":
        return "5"
     if code == "101":
        return "6"
     if code == "110":
        return "7"
     if code == "111":
        return "8"

def remove(string, position):
  numskip = int(string[position+1]) -1
  if position +2*numskip +1 > len(string):
     string = string[:position]
  else:
     string = string[:position] +string[position+2*numskip+1:]
  return string

def add(start, string, position):
  #there are two types of loops, local and master.
  if start -position == -2:
     #master loop copies all operations from begining.
     for i in range(0,int(string[position+1])):
        string = string[:position] +string[:position] +string[position+1:]
  else:
     #local copies all operations only to previous local.
     for i in range(0,int(string[position+1])-1):
        string = (string[start+2:position] +string[start+2:position]
                 +string[position+1:])
  return string

def replace(string, position):
  #gets the operation up to 8 indexes prior
  index = position -2*int(string[position+1])
  if index < 0:
     index = 0
  string = (string[:position] +string[index] 
           +string[index+1] +string[position+1:])
  return string

#initialize possible rules, and mutation rules
def iniruleset(rulenum):
  #initial rules are random, but get better with mutation rules
  rulearray = rulenum*[0]
  for i in range(0,rulenum):
     rulearray[i] = random.randint(0,2**127-1)
  return rulearray

def mutate(rulearray,fitness):
  total = 0
  temparray = len(rulearray)*[0]
  for i in range(0,len(fitness)):
     total = total +fitness[i]
  for i in range(0,len(fitness)):
     fitness[i] = fitness[i]/total
     #each fitness value become proportional to the total of all fitness values
  for i in range(0,30,2):
     #select two rules at random, the higher the fitness value, 
     #the greater the chances of being selected.
     roulet = random.uniform(0,total)
     j = 0
     while roulet > 0:
        if j == len(fitness):
           break
        roulet = roulet -fitness[j]
        j = j+1
     j = j-1
     select1 = j
     roulet = random.uniform(0,total)
     j = 0
     while roulet > 0:
        if j == len(fitness):
           break
        roulet = roulet -fitness[j]
        j = j+1
     j = j-1
     select2 = j
     exval = random.randint(0,126)
     #exchange a portion of the two rules' binary bits.
     if exval > 0 and exval < 126:
       temp1 = bistr(rulearray[select1])
       temp2 = bistr(rulearray[select2])
       temparray[i] = dstr(temp1[:exval]+temp2[exval:])
       temparray[i+1] = dstr(temp2[:exval] +temp1[exval:])
  for i in range(0,len(rulearray)):
     rulearray[i] = temparray[i]
  for i in range(0,len(rulearray)):
     #gives a 1 in 1000 chance of a bit being fliped.
     temp1 = bstr(rulearray[i])
     for j in range(0,126):
        check = random.randint(0,1000)
        if check == 1:
          temp1 = temp1[:j] +str(int(temp1[j]) ^ 1) +temp1[j+1:]
     rulearray[i] = dstr(temp1)

def bistr(value):
  string = ""
  while value >0:
     temp = value%2
     string = str(temp) +string
     value = (value-temp)/2
  while len(string) <126
     string = "0" +string
  return string

def dstr(string):
  value = 0
  while len(string) >0:
     value = value*2
     value = value +int(string[0])
     string = string[1:]
  return value

#yeah this obviously isn't the most efficient code, 
#there are several improvements that could be made.

Pyrotex · March 16, 2009

Hi there! :evil:

well, that looks like really nice code. Python is similar to Perl, which is one of my favorites.

However, the hard part is deciphering just what it is that you're trying to do in each module of code.

Some high-level comments would be appreciated. :phones:

Pyro

CraigD · March 16, 2009

Very interesting, and a neat programming project :thumbs_up A couple of decades ago, when I first became acquainted with the idea of “genetic programs”, I wrote a program to evolve the best algorithm for playing ordinary 3x3 tic-tack-toe. To my surprise, even though the rule for playing tic-tack-toe perfectly is simple, my simple program didn’t find it, but rather resulted in a population that continued shifting between similar imperfect rules that played well against one another until a random mutation beat them all, then quickly shifted to that random mutation, for millions of generations, never settling down. Even if I introduced an individual with The lessons I took from this were that not only the fitness evaluating algorithm (I just used # wins - # losses in a round in which each individual played a pair of games, moving first in one of them, against each other one), but the unchanging structure of the rules – the “genome” – is important, too. My genome was a simple array of [math]9^3=729[/math]values from 1-9 specifying the move to be played for any board, including many impossible ones.

Can you describe in English how the rules – the “ruleset” arguments to “analizerule()” – plays a turn of go? I can’t at a glance tell how they work – they look like they allow “boardstate” to be changed in any possible way, not just place a single stone. :confused:

phillip1882 · March 16, 2009

yeah that's what i initially wanted was any rule possible.

here's basically how the i hope the code will work. board state will be a big binary number, representing the state of the whole board in some way. rule set will also be a binary number representing a series of and, or, xor, shift left, or shift right in some order, which is applied to the board state. my fitness function will look at the input state of the board, compare it to the output state of the board and the more illegal moves it plays, the worse a score it gets. if / when i get a rule that doesn't play any illegal moves, ill generate a new population of rules for strategy, these rules will use my legal move rule in some way until they get to the end of the game, and the rule that uses the legal move rule the best will get a higher fitness value.

one way i suppose to speed up the process would be to pre-program the legal moves, after all i'm going to need to need the legal rules anyway to evaluate my first fitness function. but i want to teach the computer not only how to fish, but how to get a fishing pole, line, hook, and bait, as it were.

phillip1882 · March 16, 2009

some ways i hope to improve my code...

currently the operations are applied directly to board state from left to right. as a result, some situations will clearly not be possible. i'm thinking of changing my get function into a store funtion which evaluates a portion of the operations and save them to a temp variable.

another potential problem is my rules are always represented by 126 bits. i would like to make my rules able to more permanently increase or decrease their length.

and of course there's certainly the possibility that no matter how long my rule is, it never plays a game of go with entirely legal moves. one possible solution would be to pre-program the legal rules, and have board state represent only legal moves that can be played.

Pyrotex · March 16, 2009

Well, in Go, playing on ANY empty position is technically "legal". So, I'm not sure how you would teach it to play legally except to teach it to play on unoccupied positions.

You might want to use something other than binary for the board state. Each position on the board can actually have 3 states: black, white, empty. Personally, I would assign the values: -1, 1, 0.

That way, at any point in the game, you just have to add the states of all the board, to see which side is ahead. If it's negative, black is winning, if it's positive, white is winning.

I would also start with a variably defined board: with G positions on a side, making G-squared total number of positions. Begin G with a "small" odd number, say 7 or 11. This will make the games shorter, you will find your bugs faster, and it may also increase the rate of training. Then when the game looks stable, increase G in steps until you reach 19.

Something I would add is a "database" to keep track of certain patterns. One obvious pattern would be a set of "connected" positions. For every such set on the board, you would associate its owner (black or white) and its size (number of connected positions) and the last position added to that set.

So your naive program could see "the board" and could automatically keep track of connected sets, and could generate legal moves.

At first, the moves would be random (into empty positions), but you would want to have that selection algorithm be the "evolutionary learner" -- at some point, it would select not randomly but according to whatever it "sees" on the board and in the connected sets. Which is just another way of saying that at the end of each game, "evolution" uses the database of connected sets to modify the algorithm that selects the next move.

Another idea: Why not save a "snapshot" of the board (and the database) for each and every move? So the game can "remember" what it did. It would have access to its history for that one game.

The only Goal the game would have would be: force the total state of the board to be as positive as possible (assuming the Game plays white).

alexander · March 16, 2009

it would also be nice to see some self-modifying code :hihi: once the program creates an algorithm it produces children that follow that algorithm but can also analyze results to produce children that will use their new algo to continue playing the game...

phillip1882 · March 16, 2009

well intially it will be able to place multiple stones, replace a black stone with a white stone, possibly even play off the board. i want to teach it to place one stone at a time on empty squares, and that surrounding a group of stones captures them, etc.

you algorithm is a good idea but inherently flawed, as just because one side has more stones on the board doesn't necessaryly equal winning. admittedly though my algorithm may not be much better.

the database idea sounds good though. i haven't dealt with a database yet and really should learn how to do so.

basically i'm considering two possible representations for the board state.

each square takes 3 possible states, as you said, but why represent it as 1,0,-1. why not 01, 10, 00, and thus the whole board state be 722 bits?

another possibility would be this, 2 bits representing color or empty, then 4 bits representing number of that color in a row.

as for self modifying code, that's precicely what my code is. initial parents are random algorithms, these algorithms are evaluated for how well they play, and then children are produced based on parents performance.

alexander · March 17, 2009

but do children produce children, do they look at data gathered by the parents, and most of all, is there a chance of a mutation?

phillip1882 · March 17, 2009

here's basically what my mutation algorithm does. based on parents performance, two parents are selected, with the parents that perform the best getting a higher chance of being chosen than other parents. then a portion of the two parent algorithms are copied, with the remaining portion swaped between the two.

ie. lets say parent 1 is 101010101010 = skip 3 skip 3, and parent 2 is 110011001100 = loop 4 or 5,

and exval is 4, then the child code1 will become 101011001100 = skip 4 or 5, and child code2 will be 110010101010 = loop 3 skip 3. then each individual bit is given a 1 in a 1000 chance of being fliped. then the child algorithm become the new parent, and produce children based on their performance, and so on.

do child algorithms look at parents? no. this is why i like the database idea. if i can give children not only parent algorithms, but also parent data, and if parents do better, then keep parents, i think the improvement rate would be much faster.

alexander · March 18, 2009

its worth a try for sure :D

Sign In

Writing my first evoling algorithm

Recommended Posts

phillip1882

Link to comment

Share on other sites

Pyrotex

Link to comment

Share on other sites

CraigD

Link to comment

Share on other sites

phillip1882

Link to comment

Share on other sites

phillip1882

Link to comment

Share on other sites

Pyrotex

Link to comment

Share on other sites

alexander

Link to comment

Share on other sites

phillip1882

Link to comment

Share on other sites

alexander

Link to comment

Share on other sites

phillip1882

Link to comment

Share on other sites

alexander

Link to comment

Share on other sites

Join the conversation

Browse

Activity