The Prisoner’s Dilemma – in a nutshell.
Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.
What is Game Theory? Game theory is the study of models of conflict, cooperation and interaction between rational decision-makers. A key idea in the study of Game Theory is the Nash Equilibrium (named after John Nash), which is a solution to a game involving two or more players who want the best outcome for themselves and must take account of the actions of others.
Specifically, if there is a set of ‘game’ strategies with the property that no ‘player’ can benefit by changing their strategy while the other players keep their strategies unchanged, then that set of strategies and the corresponding payoffs constitute the Nash equilibrium. Assume, for example, there is a simple two-player game in which each Player (Bill and Ben) can adopt a ‘Friendly’ (smiles) or a ‘Hostile’ (scowls) approach. Now, depending on their respective actions, let’s say the game organiser awards monetary payoffs to each player.
An example of a payoff structure is shown in the next table and is known to each player.
|
Ben ‘Friendly’ |
Ben ‘Hostile’ |
Bill ‘Friendly’ |
750 to A; 1000 to B |
25 to A; 2000 to B |
Bill ‘Hostile’ |
1000 A; 50 to B
|
30 to A; 51 to B |
Now, what is Bill’s best response to each of Ben’s actions?
If Ben acts ‘Friendly’, Bill’s best payoff is to act ‘Hostile.’ This yields a payoff of 1000. If he had acted ‘Friendly’ he would have earned a payoff of only 750.
If Ben acts ‘Hostile’, Bill’s best response is if he acts ‘Hostile’. He earns 30 instead of a payoff of 25 if he acted ‘Friendly.’
In both cases his best response is to act ‘Hostile’.
Now, what is Ben’s best response to each of Bill’s actions?
If Bill acts ‘Friendly’, Ben’s best payoff is if he acts ‘Hostile.’ This yields a payoff of 2000. If he had acted ‘Friendly’ he would have earned a payoff of only 1000.
If Bill acts ‘Hostile’, Ben’s best response is if he acts ‘Hostile’. He earns 51 instead of a payoff of 50 if he acted ‘Friendly.’
In both cases his best response is to act ‘Hostile.’
A Nash Equilibrium exists when Ben’s best response is the same as Bill’s best response.
Bill and Ben have the same best response to either action of his opponent. Both should act ‘Hostile’, in which case Bill wins 30 and Ben wins 51.
But if both had been able to communicate and reach a joint, enforceable decision, they would both presumably have acted ‘Friendly.’
So, in conclusion, they would have been better off by smiling. Instead, they both scowled, which was the rational thing for them both to do, even though it was the less satisfactory outcome for both. A case of the best strategy being the worst strategy.
Let’s turn now to the world of espionage in seeking out a Nash equilibrium. Let’s assume that there are two possible codes, and Agent Anna can select either of them and so can Agent Barbara. The payoff to selecting non-matching codes is zero. An example of a payoff structure is shown in the next slide and is known to each Agent.
|
Barbara uses Code ‘A’ |
Barbara uses Code ‘B’ |
Anna uses Code ‘A’ |
1000 to Anna; 500 to Barbara |
0 to Anna; 0 to Barbara |
Anna uses Code ‘B’ |
0 to Anna; 0 to Barbara |
500 to Anna; 1000 to Barbara |
So where is the Nash equilibrium?
Let’s look at the Top Left box. Here neither Agent Anna nor Agent Barbara can increase their payoff by choosing a different action to the current one. So there is no incentive for either Agent to switch given the strategy of the other Agent. So this is a Nash equilibrium.
How about Bottom right. This is the same. Again, neither Agent Anna nor Agent Barbara can increase their payoff by choosing a different action to the current one. So there is no incentive for either Agent to switch given the strategy of the other Agent. So this is also a Nash equilibrium.
How about Top right. By choosing to use Code B instead of code A, Agent Anna obtains a payoff of 500, given Agent Barbara’s actions. Similarly for Agent Barbara, who would gain by switching to code A, given Agent Anna’s strategy. So this box (Agent Anna uses code A and Agent Barbara uses code B) is NOT a Nash equilibrium, as both Agents have an incentive to switch given what the other Agent is doing.
How about Bottom left? This is the same as Top right. There are again incentives to switchgiven what the other Agent is doing. So it is NOT a Nash equilibrium.
In conclusion, this game has two Nash equilibria – top left (both Agents use code A) and bottom right (both Agents use code B).
Let’s turning now to the classic ‘Live or Die’ problem. In this problem, there are two drivers, Peter and Paul. If both Peter and Paul drive on the left of the road, they will be safe, whilst they will crash if one decides to adhere to one side of the road and the other to the opposite.
|
Paul drives on the left |
Paul drives on the right |
Peter drives on the left |
Safe, Safe |
Crash, Crash |
Peter drives on the right |
Crash, Crash |
Safe, Safe |
At Top left and at Bottom right, there is no incentive for either Driver to switch to the other side of the road given the driving strategy of the other driver. They will both be safe if they adopt this strategy. So both Top left and Bottom right are Nash equilibria.
In both other scenarios (Top right and Bottom left), there is a very strong incentive to switch to the other side given the driving strategy of the other Driver. So neither Top right nor Bottom left is a Nash equilibrium.
In summary, there are two Nash equilibria in the ‘Live or Die’ problem.
Now let’s consider the case of two companies, Alligator PLC and Crocodile PLC, who each have the option of using one of two emblems. Let’s call the first the Blue Badger Emblem and the other the Black Bull emblem.
|
Crocodile uses Black Bull emblem |
Crocodile uses Blue Badger emblem |
Alligator uses Black Bull emblem |
1000 to Alligator; 500 to Crocodile |
500 to Alligator; 1000 to Crocodile |
Alligator uses Blue Badger emblem |
500 to Alligator; 1000 to Crocodile |
1000 to Alligator; 500 to Crocodile |
Top left: Crocodile gains by switching from Black Bull to Blue Badger.
Top right: Alligator gains by switching from Black Bull to Blue Badger.
Bottom left: Alligator gains by switching from Blue Badger to Black Bull.
Bottom right: Crocodile gains by switching Blue Badger to Black Bull.
So this game has no Nash equilibrium. There is always an incentive to switch.
So how many Nash equilibria can there be in these sorts of game? Let us recall that if there is a set of ‘game’ strategies with the property that no ‘player’ can benefit by changing their strategy while the other players keep their strategies unchanged, then that set of strategies and the corresponding payoffs constitute what is known as the ‘Nash equilibrium’.
There may be one (e.g. the Friendly/Hostile game). There may be more than one (e.g. Spy problem, ‘Live or Die’ problem). There may be none (e.g. company emblems problem).
This leads us to the classic ‘Prisoner’s Dilemma’ problem. In this scenario, two prisoners, linked to the same crime, are offered a discount on their prison terms for confessing if the other prisoner continues to deny it, in which case the other prisoner will receive a much stiffer sentence. However, they will both be better off if both deny the crime than if both confess to it. The problem each faces is that they can’t communicate and strike an enforceable deal. The box diagram below shows an example of the Prisoner’s Dilemma in action.
|
Prisoner 2 Confesses |
Prisoner 2 Denies |
Prisoner 1 Confesses |
2 years each |
Freedom for P1; 8 years for P2 |
Prisoner 1 Denies |
8 years for P1; Freedom for P2 |
1 year each |
The Nash Equilibrium is for both to confess, in which case they will both receive 2 years. But this is not the outcome they would have chosen if they could have agreed in advance to a mutually enforceable deal. In that case they would have chosen a scenario where both denied the crime and received 1 year each.
Note that the action that gave each of the prisoners the least jail time did not depend on what the other prisoner did. There was what is called a ‘dominant strategy’ for each player, and hence a single dominant strategy equilibrium. That’s the definition of a dominant strategy. It is the strategy that will give the highest payoff whatever the other person does.
Often there is no dominant strategy. We have already looked at such a situation. Driving on the right or on the left. If others drive on the right, your best response is to drive on the right too. If they drive on the left, your best response is to drive on the left. In the US, everyone driving on the right is an equilibrium, in the sense that no one would want to change their strategy given what others are doing. In game theory, if everyone is playing their best response to the strategies of everyone else, these strategies are, as we know, termed a Nash equilibrium. In Japan, though, Drive on the Left is a Nash equilibrium. So the Live or Die ‘game’ has two Nash equilibria but no dominant strategy equilibrium.
Many interactions do not have dominant strategy equilibria, but if we can find a Nash equilibrium, it gives us a prediction of what we should observe. So a Nash equilibrium is a stable state that involves interacting participants in which none can gain by a change of strategy as long as the other participants remain unchanged. It is not necessarily the best outcome for the parties involved, but it is the outcome we would most likely predict. Once again, we find that the best strategy in a world of rational self-interested people is not the one that is actually in their self-interest.
Perhaps the best example of an attempted real-life resolution to the Prisoner’s Dilemma was demonstrated in the TV ‘Golden Balls’ quiz show. In the game, two players must select a ball which, unknown to the other player, is either a ‘Split’ or ‘Steal’ Ball. If both choose Split, they share the prize money. If both choose ‘Steal’ they each go away with nothing. If one chooses ‘Steal’ and one chooses ‘ Split’, the contestant who chose ‘Steal wins all the money, and the contestant who chose ‘Split’ gets nothing. In this game, the Nash equilibrium among self-interested players is Steal-Steal as Steal dominates Split (wins all the money compared to sharing the money if choosing Split) but loses nothing to Steal compared to choosing ‘Split’ (wins nothing either way). Steal in the Golden Balls game is this equivalent to Confess in the traditional Prisoner’s Dilemma game.
The YouTube video shown linked below is a classic demonstration of an attempt to resolve the dilemma.
Exercise
Is every Nash Equilibrium a Dominant Strategy Equilibrium? Is every Dominant Strategy Equilibrium a Nash Equilibrium? Illustrate your answer, using an example.
In the Golden Balls game, with no communication allowed outside the game format, is there a dominant strategy for each player? Is there a dominant strategy equilibrium? Is there are a Nash equilibrium? If so, what is it?
References and Links
Social Interaction: Game Theory. CORE. https://core-econ.org/the-economy/book/text/04.html#41-social-interactions-game-theory
Equilibrium in the Invisible Hand Game. CORE. https://core-econ.org/the-economy/book/text/04.html#42-equilibrium-in-the-invisible-hand-game
The Prisoners’ Dilemma. CORE. https://core-econ.org/the-economy/book/text/04.html#43-the-prisoners-dilemma
Social Preferences: Altruism. CORE. https://core-econ.org/the-economy/book/text/04.html#44-social-preferences-altruism
Altruistic Preferences in the Prisoners’ Dilemma. CORE. https://core-econ.org/the-economy/book/text/04.html#45-altruistic-preferences-in-the-prisoners-dilemma
Social interactions: Conflicts in the choice among Nash equilibria. CORE. https://core-econ.org/the-economy/book/text/04.html#413-social-interactions-conflicts-in-the-choice-among-nash-equilibria
Social Interactions: Conclusion. CORE. https://core-econ.org/the-economy/book/text/04.html#414-conclusion