Gaming O-F iterated prisoner's dillemma tournament - rules finalised, START YOUR AGENTS

Jarvitä

New member
Joined
Aug 5, 2008
Messages
2,030
Reaction score
2
Points
0
Location
Serface, Earth
For the uninitiated: http://en.wikipedia.org/wiki/Prisoner's_dillemma#The_iterated_prisoners.27_dilemma

I wrote a python script that pits two IPD strategies against each other. Each is aware of its, and its opponent's current score and action history, and must decide whether to defect or cooperate based on that.

The game framework looks like this:

Code:
#       Scoring:
#       Both agents cooperate:         10 points each
#       P1 cooperates, P2 defects:      1 point to P1, 15 points to P2
#       Above, reversed:                Above, reversed
#       Both agents defect:            5 points each
#
#       Each playing function receives as its arguments the current score and
#       Action history of both players.
#
#       The number of iterations is random so as to render constant defection
#       non-dominant. The final score is normalised by the number of turns.



import player1
import player2
import random    

turns = random.randint(5,1000)
p1score, p1hist, p2score, p2hist = 0, [2], 0, [2]
scorehist1 = [0]
scorehist2 = [0]
for i in range(0,turns-1):
    p1hist.append(2)
    p2hist.append(2)
    scorehist1.append(0)
    scorehist2.append(0)
def ipd():
    global p1score, p1hist, p2score, p2hist
    for i in range(0,turns):
        p1act = player1.play(i, p1hist, p2hist, p1score, p2score)
        p2act = player2.play(i, p2hist, p1hist, p2score, p1score)
        if(not(p1act) and not(p2act)):
                p1score += 10
                p2score += 10
        elif(not(p1act) and (p2act)):
                p1score += 1
                p2score += 15
        elif((p1act) and not(p2act)):
                p1score += 15
                p2score += 1
        elif((p1act) and (p2act)):
                p1score += 5
                p2score += 5
            
        p1hist[i] = p1act
        p2hist[i] = p2act
        scorehist1[i] = p1score
        scorehist2[i] = p2score
              
ipd()
print "Number of iterations: " + str(turns)
print "Player 1:"
print ""
print "Score: " + str(p1score/float(turns))
print ""
print "Player 2: "
print ""
print "Score: " + str(p2score/float(turns))

And the strategy function looks like this:

Code:
def play(i, myhist, oppshist, myscore, oppsscore):
	# Fill out your playing strategy here
	return 0

If people would like to play, announce your participation here and send your strategies via PM (if you don't know Python well enough, you can discuss your strategy in private and receive help in implementing it as a Python function). A tournament date will then be determined, and the results posted. If you wish, your strategy can be revealed afterwards.

The tournament will be structured as follows: Each valid submitted agent will play against all others, and the final ranking will be obtained by summing an agent's score over all matches it took part in.

Confirmed players
Urwumpe - agent submitted
Arrowstar - agent submitted
 
Last edited:
Ooo, sounds fun! If I write my strategy up in MATLAB, think you can port it to python easily enough?
 
I take part, finally something useful for the Python skills I acquired at work. :lol:
 
That...could be arranged, I guess.

I guess I could also use psuedocode, if that's easier for you? I've never worked with Python and I'd rather be able to participate now than in two weeks after I've found time to learn the basics. :)
 
What is the winning condition for the tournament? Just in case it makes no sense to write something more sophisticated than a defector. Is it total score gained or is it a series of duels?
 
If the player with the highest score wins, anything but defecting is a loosing move with the above suggested scoring system. There would have to be some additional conditions.
 
If the player with the highest score wins, anything but defecting is a loosing move with the above suggested scoring system. There would have to be some additional conditions.

That is why I ask. Depending on the winning conditions, I might need a different strategy for the agent to act. I could try to be a bit more cooperative, more defecting, depends all on how the score is finally determined.

If I just play to minimize how often I loose, I would always defect. But that isn't really fun.

Maybe try a betting style scoring: Both agents start with x credits, the game ends after n turns or after the first player is without credits. If both players cooperate, 10 credits from them are added to the jackpot, if one player defects, he gets the jackpot + 10 credits and the other looses 10 credits, if both defect, both players loose 10 credits. What is left from the jackpot after n turns is either split even between the players or simply lost.

The inevitable defecting player would then be punished just like a too cooperative player. And just playing for a set time would also fail. I can try implementing it, my Mother of All Algorithms algorithm is already done. :lol:
 
What is the winning condition for the tournament? Just in case it makes no sense to write something more sophisticated than a defector. Is it total score gained or is it a series of duels?

Even in a duel, a defector is defeated by just about any non-trivial agent that takes the precaution of defecting on the first turn.

I'm not sure about the exact structure of the tournament, but I've been considering playing everyone against everyone else and summing the combined score (that is, the sum of both players' score) for each match your agent took part in. It's open to suggestions, however.

Maybe try a betting style scoring: Both agents start with x credits, the game ends after n turns or after the first player is without credits. If both players cooperate, 10 credits from them are added to the jackpot, if one player defects, he gets the jackpot + 10 credits and the other looses 10 credits, if both defect, both players loose 10 credits. What is left from the jackpot after n turns is either split even between the players or simply lost.

That makes sense and is a trivial modification to the current framework. Updated OP.
 
Last edited:
Here is my suggestion as one alternative:

Code:
def ipd_urwumpe():
        global p1score, p1hist, p2score, p2hist, splitpot
        jackpot = 0
        for i in range(0,100):
                p1act = player1.play(i, p1hist, p2hist, p1score, p2score)
                p2act = player2.play(i, p2hist, p1hist, p2score, p1score)
                if(not(p1act) and not(p2act)):
                        p1score -= 10
                        p2score -= 10
                        jackpot += 20
                elif(not(p1act) and (p2act)):
                        p1score -= 10
                        p2score += jackpot + 10
                        jackpot = 0
                elif((p1act) and not(p2act)):
                        p1score += jackpot + 10
                        p2score -= 10
                        jackpot = 0
                elif((p1act) and (p2act)):
                        p1score -= 10
                        p2score -= 10
			
                p1hist[i] = p1act
                p2hist[i] = p2act

                if (p1score <= 0) or (p2score <=0):
                        break

        print "Jackpot = ", jackpot
        if splitpot:
                p1score += jackpot / 2
                p2score += jackpot / 2

The splitpot flag is for testing purposes, how much effect this end of game has... so far almost zero, since the jackpot is rarely really full here.
 
This only rewards defecting. In the original IPD, agents are also rewarded for cooperating, not just by increasing the reward of future defection, so that mutual cooperation is about equal in expected utility to risking defection.
 
Last edited:
This only rewards defecting. In the original IPD, agents are also rewarded for cooperating, not just by increasing the reward of future defection, so that mutual cooperation is about equal in expected utility to risking defection.

That is just half of the truth - the original IPD rewards defection as well, since defection offers you better chances to not lose.

In the best case, cooperation results in a tie with 1000 points. In the best case of defection, you get 2000 points and the other just 100. If both defect, you get a tie with 500. So, the defection strategies are clearly at advantage, since cooperation is not strongly enough rewarded, or mutual defection not strong enough punished.

That is why I had the "split jackpot" as option there, which simply means cooperation results in no loss, mutual defection results in complete loss. defection is a destructive strategy now, since you can make your opponent loose faster, but you can't get a high score by that.
 
But without a reward for cooperation, there is zero incentive to take the risk of cooperating. By turning it into a negative-sum game, you've reduced cooperating to however long you're willing to accumulate the jackpot, and the least risky strategy would be to let it accumulate for zero turns.
 
But without a reward for cooperation, there is zero incentive to take the risk of cooperating. By turning it into a negative-sum game, you've reduced cooperating to however long you're willing to accumulate the jackpot, and the least risky strategy would be to let it accumulate for zero turns.

well, lets assume the jackpot is accumulated until game over, without defection. Then both get their share of the jackpot back. You could of course also apply a 50% interest to the final jackpot - this would outweight any defection strategy
 
well, lets assume the jackpot is accumulated until game over, without defection. Then both get their share of the jackpot back. You could of course also apply a 50% interest to the final jackpot - this would outweight any defection strategy

It would also leave the match undecided, increasing the utility of defecting early, which maximizes at defecting after 0 turns. To maintain the dilemma, both players defecting should reset the jackpot in addition to the punishment.
 
It would also leave the match undecided, increasing the utility of defecting early, which maximizes at defecting after 0 turns. To maintain the dilemma, both players defecting should reset the jackpot in addition to the punishment.

That would balance too much against defection, since there is no incentive anymore to cheat a little bit especially late in the game.
 
OK, I feel the whole idea of jackpots and interests is steering too far away from the original problem. I propose the original format (+10 for mutual cooperation, +15/+1 for cooperate/defect, +5 for mutual defection) be retained, constant defection be rendered non-dominant by making the number of turns random within bounds, normalising the match score with the number of turns, and structuring the tournament such that each agent plays against every other one, and the final ranking be determined by summing the scores over all matches.
 
OK, then I can send you my agent
 
Back
Top