Repeated Game Strategies – in a nutshell.

April 11, 2019

Further and deeper exploration of paradoxes and challenges of intuition and logic can be found in my recently published book, Probability, Choice and Reason.

If there is a set of ‘game’ strategies with the property that no ‘player’ can benefit by changing their strategy while the other players keep their strategies unchanged, then that set of strategies and the corresponding payoffs constitute what is known as the ‘Nash equilibrium’.

This leads us to the classic ‘Prisoner’s Dilemma’ problem. In this scenario, two prisoners, linked to the same crime, are offered a discount on their prison terms for confessing if the other prisoner continues to deny it, in which case the other prisoner will receive a much stiffer sentence. However, they will both be better off if both deny the crime than if both confess to it. The problem each faces is that they can’t communicate and strike an enforceable deal. The box diagram below shows an example of the Prisoner’s Dilemma in action.

	Prisoner 2 Confesses	Prisoner 2 Denies
Prisoner 1 Confesses	2 years each	Freedom for P1; 8 years for P2
Prisoner 1 Denies	8 years for P1; Freedom for P2	1 year each

The Nash Equilibrium is for both to confess, in which case they will both receive 2 years. But this is not the outcome they would have chosen if they could have agreed in advance to a mutually enforceable deal. In that case they would have chosen a scenario where both denied the crime and received 1 year each.

So a Nash equilibrium is a stable state that involves interacting participants in which none can gain by a change of strategy as long as the other participants remain unchanged. It is not necessarily the best outcome for the parties involved, but it is the outcome we would most likely predict.

The Prisoner’s Dilemma is a one-stage game, however. What happens in games with more than one round, where players can learn from the previous moves of the other players?

Take the case of a 2-round game. The payoff from the game will equal the sum of payoffs from both moves.

The game starts with two players, each of whom is given £100 to place into a pot. They can then secretly choose to honour the deal or to cheat on the deal, by means of giving an envelope to the host containing the card ‘Honour’ or ‘Cheat’. If they both choose to ‘Honour’ the deal, an additional £100 is added to the pot, yielding each an additional £50. So they end up with £150 each. But if one honours the deal and the other cheats on the deal, the ‘Cheat’ wins the original pot (£200) and the ‘Honour’ player loses all the money in that round. A third outcome is that both players choose to ‘Cheat’, in which case each keeps the original £100. So in this round, the dominant strategy for each player (assuming no further rounds) is to ‘Cheat’, as this yields a higher payoff if the opponent ‘Honours’ the deal (£200 instead of £150) and a higher payoff if the opponent ‘Cheats’ (£100 instead of zero). The negotiated, mutually enforceable outcome, on the other hand, would be to agree to both ‘Honour’ the deal and go away with £150.

But how does this change in a 2-round game.

Actually, it makes no difference. In this scenario, the next round is the final round, in which you may as well ‘Cheat’ as there are no future rounds to realise the benefit of any goodwill realised from honouring the deal. Your opponent knows this, so you can assume your opponent who wishes to maximise his total payoff, will be hostile on the second move. He will assume the same about you.

Since you will both ‘Cheat’ on the second and final move, why be friendly on the first move?

So the dominant strategy is to ‘Cheat’ on the first round.

What if there are three rounds? The same applies. You know that your opponent will ‘Cheat’ on the final round and therefore the penultimate round as well. So your dominant strategy is to ‘Cheat’ on the first round, the second round and the final round. The same goes for your opponent. And so on. In any finite, pre-determined number of rounds, the dominant strategy in any round is to ‘Cheat.’

But what if the game involves an indeterminate number of moves? Suppose that after each move, you roll two dice. If you get a double-six, the game ends. Any other combination of numbers, play another round. Keep playing until you get a double-six. Your score for the game is the sum of your payoffs.

This sort of game in fact mirrors many real-world situations. In real life, you often don’t know when the game will end.

What is the best strategy in repeated play? For the game outlined above, we shall denote ‘Honour the deal’ as a ‘Friendly’ move and ‘Cheat’ as a hostile move. But the notion of a Friendly or Hostile approach can adopt other guises in different games.

There are seven proposed strategies here.

Always Friendly. Be friendly every time
Always Hostile. Be hostile every time
Retaliate. Be Friendly as long as your opponent is Friendly but if your opponent is ever Hostile, you be Hostile from that point on.
Tit for tat. Be Friendly on the first move. Thereafter, do whatever your opponent did on the previous move.
Random. On each move, toss a coin. If Heads, be Friendly. If tails, be Hostile.
Alternate. Be Friendly on even-numbered moves, and Hostile on odd-numbered moves, or vice-versa.
Fraction. Be Friendly on the first move. Thereafter, be Friendly if the fraction of times your opponent has been Friendly until that point is less than a half. Be Hostile if it is less than or equal to a half.

Which of these is the dominant strategy in this game of iterated play? Actually, there is no dominant strategy in an iterated game, but which strategy actually wins if every strategy plays every other strategy.

‘Always Hostile’ does best against ‘Always Friendly’ because every time you are Friendly against an ‘Always Hostile’, you are punished with the ‘sucker’ payoff.

‘Always Friendly’ does best against Retaliation, because the extra payoff you get from a Hostile move is eventually negated by the Retaliation.

Thus even the choice of whether to be Friendly or Hostile on the first move depends on the opponent’s strategy.

For every two distinct strategies, A and B, there is a strategy C against which A does better than B, and a strategy D against which B does better than A.

So which strategy wins when every strategy plays every other strategy in a tournament? This has been computer simulated many times. And the winner is Tit for Tat.

It’s true that Tit for Tat can never get a higher score than a particular opponent, but it wins tournaments where each strategy plays every other strategy. In particular, it does well against Friendly strategies, while it is not exploited by Hostile strategies. So you can trust Tit for Tat. It won’t take advantage of another strategy. Tit for Tat and its opponents both do best when both are Friendly. Look at this way. There are two reasons for a player to be unilaterally hostile, i.e. to take advantage of an opponent or to avoid being taken advantage of by an opponent. Tit for Tat eliminates the reasons for being Hostile.

What accounts for Tit for Tat’s success, therefore, is its combination of being nice, retaliatory, forgiving and clear.

In other words, success in an evolutionary ‘game’ is correlated with the following characteristics:

Be willing to be nice: cooperate, never be the first to defect.

Don’t be played for a sucker: return defection for defection, cooperation for cooperation.

Don’t be envious: focus on how well you are doing, as opposed to ensuring you are doing better than everyone else.

Be forgiving if someone is willing to change their ways and co-operate with you. Don’t bear grudges for old actions.

Don’t be too clever or too tricky. Clarity is essential for others to cooperate with you.

As Robert Axelrod, who pioneered this area of game theory in his book, ‘The Evolution of Cooperation’: Tit for Tat’s “niceness prevents it from getting into unnecessary trouble. Its retaliation discourages the other side from persisting whenever defection is tried. Its forgiveness helps restore mutual cooperation. And its clarity makes it intelligible to the other player, thereby eliciting long-term cooperation.”

How about the bigger picture? Can Tit for Tat perhaps teach us a lesson in how to play the game of life? Yes, in my view it probably can.

Further Reading and Links

Axelrod, Robert (1984), The Evolution of Cooperation, Basic Books

Axelrod, Robert (2006), The Evolution of Cooperation (Revised ed.), Perseus Books Group

Axelrod, R. and Hamilton, W.D. (1981), The Evolution of Cooperation, Science, 211, 1390-96. http://www-personal.umich.edu/~axe/research/Axelrod%20and%20Hamilton%20EC%201981.pdf

https://en.wikipedia.org/wiki/The_Evolution_of_Cooperation

From → game theory, Nutshells

Repeated Game Strategies – in a nutshell.

Share this:

Related

Leave a comment Cancel reply

Prof. Leighton Vaughan Williams

Recent Posts

Categories

A+ links

All Conversation articles

All Select Networks

Audio Files

Betting

Betting Taxation

Book Chapters

Books

Centres

Charity

Choice and Reason

Competition Commission

David Henry Morris Williams, C. Eng.

Editorial

Employment

Evidence to UK Parliament

Gambling Commission

HM Revenue and Customs

Memberships and Fellowships

My Adobe Voice

National Audit Office

Other Publications

Papers Online

Personal

Political Forecasting

Press and media

Probability

Profile

Published Papers

Radio Interviews

Select Abstracts

Select Books

Select Broadcasts

Select Clippings

Select Pages

Select Papers

Select Presentations

Select Social Media

Select Stories

Select Websites

Select Wiki

Selected Talks

Short stories

Thought Experiment

Twisted Logic

Twitter

Useful Links

Various Blogs

XYZ

Flickr Photos