gametheory《经济学理论》 Lecture 13: Repeated Games(2)

Eco514-Game Theory Lecture 13: Repeated Games(2) Marciano siniscalchi October 28. 1999 Introduction [Again, by and large, I will follow OR, Chap. 8, so I will keep these notes to a minimum. Review of key definitions Recall our three payoff aggregation criteria: discounting, i.e (u2)≥1>(2 (also recall that the payoff profile corresponding to a stream (ut)is taken to be(1 8)2t18t-u(a)); limit of means (u21()21分 lim inf下砖-吨0 and overtaking (u)21/:(w )21 lim>(u;-w1)>0 Also recall the definition of machines Definition 1 Fix a normal-form game G. A machine for Player i E N is a tuple Mi= (Q,q,f,n), (i Qi is a finite set(whose elements should be thought of as labels) (ii)qi is the initial state of the machine (iii)fi: Q -; is the action function: it specifies what Player i does at each state; and (iv)T: Qi x AQi is the transition function: if action a E A is played and Player i's machine state is qi E Qi, then at the next stage Player i' s machine state will be Ti(gi, a
Eco514—Game Theory Lecture 13: Repeated Games (2) Marciano Siniscalchi October 28, 1999 Introduction [Again, by and large, I will follow OR, Chap. 8, so I will keep these notes to a minimum.] Review of key definitions Recall our three payoff aggregation criteria: discounting, i.e. (u t i )t≥1 i (w t i )t≥1 ⇔ X t≥1 δ t−1 (u t i − w t i ) > 0 (also recall that the payoff profile corresponding to a stream (u t ) is taken to be (1 − δ) P t≥1 δ t−1u(a t )); limit of means: (u t i )t≥1 i (w t i )t≥1 ⇔ lim inf t→∞ X T t=1 u t i − w t i T > 0; and overtaking: (u t i )t≥1 i (w t i )t≥1 ⇔ lim inf t→∞ X T t=1 (u t i − w t i ) > 0. Also recall the definition of machines: Definition 1 Fix a normal-form game G. A machine for Player i ∈ N is a tuple Mi = (Qi , q0 i , fi , τi), where: (i) Qi is a finite set (whose elements should be thought of as labels); (ii) q 0 i is the initial state of the machine; (iii) fi : Qi → Ai is the action function: it specifies what Player i does at each state; and (iv) τi : Qi × A → Qi is the transition function: if action a ∈ A is played and Player i’s machine state is qi ∈ Qi , then at the next stage Player i’s machine state will be τi(qi , a). 1

ote:OR actually allows for arbitrary state spaces. For most proofs, finite state spaces are enough. There is one exception, which I will note below Definition 2 Fix a normal-form game G. The minmax payoff for r player i, vi, is defined by t=mina-∈A,maxa∈A4t(a1,a-) We are not going to worry about implementing mixtures of action profiles in this lecture so I just remind you of the definition of enforceability Definition 3 Fix a normal-form game G. A payoff profile u E R is enforceable (resp strictly enforceable) if u;> v;(resp. ui>vi)for alliE N Perfect folk theorems The strategies in the proof of the Nash folk theorem( say, with discounting) call for indefinite punishment following a deviation from the equilibrium. This is fine in games such as the Prisoner's Dilemma, in which the minmax actions are actually an equilibrium(indeed, in PD, they are the only equilibrium! ) That is, the threat of indefinite punishment is indeed credible A D A2,31,6 D0.10,1 Figure 1: Indefinite Punishment is not credible However, consider the game in Figure 1. The Row player can hold the Column player down to his minimum payoff by choosing D, but D is strictly dominated for her. Hence, while (2, 3) can be enforced as a Nash equilibrium payoff profile in the infinitely repeated version of the game by assuming that players switch to(D, D) following a deviation, this would not work if we required subgame perfection, regardless of the payoff aggregation criterion A warm-up exercise: Perfect Folk Theorems for Limit-of-Means Thus, we must be smarter than that. a key intuition is that, after all, a deviator need not be punished forever, but only long enough to wipe out any gains from his deviation Formally, fix a game G=(N, (Ai, lilieN )and let M= maxienaea ui (a). Fix an outcome aof G; clearly, a one-time deviation by Player i cannot yield more than M-ui(a). Thus, if Player i deviates, we need only punish him so as to wipe out this payoff differential; clearly, how long this will be depends on the payoff aggregation criterion Intuitively, limit-of-means should make things very simple, because, following a deviation punishers face only a finite number of periods in which they must forego their equilibrium
Note: OR actually allows for arbitrary state spaces. For most proofs, finite state spaces are enough. There is one exception, which I will note below. Definition 2 Fix a normal-form game G. The minmax payoff for player i, vi , is defined by vi = mina−i∈A−i maxai∈Ai ui(ai , a−i). We are not going to worry about implementing mixtures of action profiles in this lecture, so I just remind you of the definition of enforceability: Definition 3 Fix a normal-form game G. A payoff profile u ∈ RN is enforceable (resp. strictly enforceable) if ui ≥ vi (resp. ui > vi) for all i ∈ N Perfect Folk Theorems The strategies in the proof of the Nash folk theorem (say, with discounting) call for indefinite punishment following a deviation from the equilibrium. This is fine in games such as the Prisoner’s Dilemma, in which the minmax actions are actually an equilibrium (indeed, in PD, they are the only equilibrium!). That is, the threat of indefinite punishment is indeed credible. A D A 2,3 1,6 D 0,1 0,1 Figure 1: Indefinite Punishment is not credible However, consider the game in Figure 1. The Row player can hold the Column player down to his minimum payoff by choosing D, but D is strictly dominated for her. Hence, while (2,3) can be enforced as a Nash equilibrium payoff profile in the infinitely repeated version of the game by assuming that players switch to (D,D) following a deviation, this would not work if we required subgame perfection, regardless of the payoff aggregation criterion. A warm-up exercise: Perfect Folk Theorems for Limit-of-Means Thus, we must be smarter than that. A key intuition is that, after all, a deviator need not be punished forever, but only long enough to wipe out any gains from his deviation. Formally, fix a game G = (N,(Ai , ui)i∈N ) and let M = maxi∈N,a∈A ui(a). Fix an outcome a ∗ of G; clearly, a one-time deviation by Player i cannot yield more than M −ui(a ∗ ). Thus, if Player i deviates, we need only punish him so as to wipe out this payoff differential; clearly, how long this will be depends on the payoff aggregation criterion. Intuitively, limit-of-means should make things very simple, because, following a deviation, punishers face only a finite number of periods in which they must forego their equilibrium 2

payoff stream in order to punish Player i. Under limit-of-means aggregation, finite periods do not matter, so punishers are indifferent between punishing and not punishing There is only one subtlety: of course, the same argument applies to Player i: no one-time deviation can be profitable for her! So, what exactly are we trying to deter? te The answer is that, although no finite deviation from the infinite repetition of a* is prof- itable for Player i, the following strategy could be. Suppose that, in the putative equilibrium we wish to construct, a deviation is followed by L rounds of punishment(you can think of L'=I if you wish). Thus, if Player i deviates once, she gets an extra payoff of (at most) M-ui(a), but then loses ui (a*)-vi utils in each of the subsequent L perioc Now suppose L is small, so that M-ui(a*>Lfu(a)-vi. For example, in the game of Figure 1, suppose that i is the Column player, that a*=(A, A), and that L= 1. Then a deviation yields 3 utils, whereas one round of punishment costs 2 utils. Then, Player i can dopt the following strategy: deviate, then play a best-response to p-i(in Figure 1, play D for L periods; then, as soon as play is supposed to return to a*, deviate and best-respond to the minmax profile, and so on. This is a profitable deviation. Observe that it is also a neat example of a game in which the one-deviation property holds Thus, we must choose L large enough so ViE N, M-u(a)< Lui(a)-vil In figure 1. it is enough to choose l To complete the argument, we must specify what happens if more than one player de- viates, or if somebody deviates from the punishment stage. As in the proof of the Nash Folk Theorem, multiple deviations lead to the lowest-index player being punished(they will not occur in equilibrium anyway, so players cannot hope to get away with deviating because somebody else will deviate, too Finally, if one or more punishers fail to punish, we simply disregard these further(un- profitable) deviations: again, in equilibrium they are not going to occur, so players cannot count on them to improve their predicament after a first deviation This Proposition 0.1 OR, Proposition 146.2 Fix a game G. Any feasible, strictly enforce- able payoff profile of G is a subgame-perfect equilibrium payoff profile of the limit-of-means infinitely repeated version of G Machines You will undoubtedly notice that describing these strategies verbally is awkward; doing so formally(as we have tried to do in the proof of the Nash Folk theorem) is even worse Machines can help. For instance, here is a set of machines that players can use to implement the strategies in the proof of Proposition 0.1: Player i uses machine M (Q, q, fi, T)(where Q, q, T are common to all players) defined as follows
payoff stream in order to punish Player i. Under limit-of-means aggregation, finite periods do not matter, so punishers are indifferent between punishing and not punishing. There is only one subtlety: of course, the same argument applies to Player i: no one-time deviation can be profitable for her ! So, what exactly are we trying to deter? The answer is that, although no finite deviation from the infinite repetition of a ∗ is profitable for Player i, the following strategy could be. Suppose that, in the putative equilibrium we wish to construct, a deviation is followed by L rounds of punishment (you can think of L 0 = 1 if you wish). Thus, if Player i deviates once, she gets an extra payoff of (at most) M − ui(a ∗ ), but then loses ui(a ∗ ) − vi utils in each of the subsequent L periods. Now suppose L is small, so that M − ui(a ∗ ) > L[ui(a ∗ ) − vi ]. For example, in the game of Figure 1, suppose that i is the Column player, that a ∗ = (A, A), and that L = 1. Then a deviation yields 3 utils, whereas one round of punishment costs 2 utils. Then, Player i can adopt the following strategy: deviate, then play a best-response to p−i (in Figure 1, play D) for L periods; then, as soon as play is supposed to return to a ∗ , deviate and best-respond to the minmax profile, and so on. This is a profitable deviation. [Observe that it is also a neat example of a game in which the one-deviation property holds!] Thus, we must choose L large enough so ∀i ∈ N, M − ui(a ∗ ) < L[ui(a ∗ ) − vi ] In Figure 1, it is enough to choose L 0 = 2. To complete the argument, we must specify what happens if more than one player deviates, or if somebody deviates from the punishment stage. As in the proof of the Nash Folk Theorem, multiple deviations lead to the lowest-index player being punished (they will not occur in equilibrium anyway, so players cannot hope to get away with deviating because somebody else will deviate, too). Finally, if one or more punishers fail to punish, we simply disregard these further (unprofitable) deviations: again, in equilibrium they are not going to occur, so players cannot count on them to improve their predicament after a first deviation. This proves: Proposition 0.1 [OR, Proposition 146.2] Fix a game G. Any feasible, strictly enforceable payoff profile of G is a subgame-perfect equilibrium payoff profile of the limit-of-means infinitely repeated version of G. Machines You will undoubtedly notice that describing these strategies verbally is awkward; doing so formally (as we have tried to do in the proof of the Nash Folk theorem) is even worse. Machines can help. For instance, here is a set of machines that players can use to implement the strategies in the proof of Proposition 0.1: Player i uses machine Mi = (Q, q0 , fi , τ ) (where Q, q0 , τ are common to all players) defined as follows. 3

First, Q=N, P(,t)). N is the normal state, in which a* is played. P(, t )is the state in which i is punished, and t more rounds of punishment are required Second, q=N. Third, fi(N)=a*, fi(P(, t))=p-ii if jti, and f, (P(, t)=ri(p-i This should be obvious Finally, T(, ) is such that we remain in N if nobody deviates, we switch from N to P(, L)if j is the lowest-index deviator, we always move from P(, t)to P(, t-1)if t+0 and we always move from P(, O) back to N Easy Discounting The strategy profile thus constructed is not a subgame-perfect equilibrium of the game in Figure 1 if payoffs are aggregated via discounting, for any discount factor. Suppose the Column player deviates: then the Row player is supposed to choose D for 2 periods, and hence receive a How payoff of 0 units. She will then receive 2 forever after. However, if she deviates to A, nothing happens: the Column player continues to choose D, so again the row player can choose A. Obviously, she prefers(1, 1, 2, 2, .. )to(0,0, 2, 2,...)! Thus, the key intuition is that we must somehow ensure that punishers are willing te punish. In principle, one could think of punishing punishers, but this may fail to work with discounting: essentially, second deviations might require longer punishment periods(because the burden of carrying out the first punishment lasts for L periods, and not just one), third deviations might require even longer punishments, and so on. This is certainly the case in the game of Figure 1 The alternative is to reward punishers. This leads to OR's Proposition 151.1 o I am only going to offer a few comments on the proof. The"conciliation"states C()serve reward punishers, in a way. However, this is subtle: we never go back to the Nash state C(O). What happens is, if i deviates and i punishes him, then after punishment we move to C(, which i prefers to C(i). Otherwise, we go to C(i) and stay there until somebody else deviates.In my opinion, this is also a punishment of sorts, but of course you are welcome to differ Also the remark about the first condition on d being sufficient has to do with the fact that, after L periods of punishment, we move to C(): since ui(a())<u(a())by assump- tion, this is actually a further punishment(which, er, actually reinforces my interpretation but never mind that). The point is that this punishment may not be enough, or may come 1 Again, let me repeat this because I first got it wrong in class(but then you guys spotted me!):whenever we are in state C(), we remain there if somebody deviates, and move to the state P(k, L)in which Player a deviation by Player j is the threat of further punishment; a() need not be a Nash equilibrium peps after k's punishment begins if k deviates from a(). Thus, what supports a() as a continuation equilibrium
First, Q = {N, P(j, t)}. N is the normal state, in which a ∗ is played. P(j, t) is the state in which j is punished, and t more rounds of punishment are required. Second, q 0 = N. Third, fi(N) = a ∗ i , fi(P(j, t)) = p−j,i if j 6= i, and fj (P(j, t)) = rj (p−j ). This should be obvious. Finally, τ (·, ·) is such that we remain in N if nobody deviates, we switch from N to P(j, L) if j is the lowest-index deviator, we always move from P(j, t) to P(j, t − 1) if t 6= 0, and we always move from P(j, 0) back to N. Easy! Discounting The strategy profile thus constructed is not a subgame-perfect equilibrium of the game in Figure 1 if payoffs are aggregated via discounting, for any discount factor. Suppose the Column player deviates: then the Row player is supposed to choose D for 2 periods, and hence receive a flow payoff of 0 units. She will then receive 2 forever after. However, if she deviates to A, nothing happens: the Column player continues to choose D, so again the Row player can choose A. Obviously, she prefers (1,1,2,2,. . .) to (0,0,2,2,. . .)! Thus, the key intuition is that we must somehow ensure that punishers are willing to punish. In principle, one could think of punishing punishers, but this may fail to work with discounting: essentially, second deviations might require longer punishment periods (because the burden of carrying out the first punishment lasts for L periods, and not just one), third deviations might require even longer punishments, and so on. This is certainly the case in the game of Figure 1. The alternative is to reward punishers. This leads to OR’s Proposition 151.1. I am only going to offer a few comments on the proof. The “conciliation” states C(j) serve to reward punishers, in a way. However, this is subtle: we never go back to the Nash state C(0). What happens is, if j deviates and i punishes him, then after punishment we move to C(j), which i prefers to C(i). Otherwise, we go to C(i) and stay there until somebody else deviates.1 In my opinion, this is also a punishment of sorts, but of course you are welcome to differ. Also: the remark about the first condition on δ being sufficient has to do with the fact that, after L periods of punishment, we move to C(j); since ui(a(j)) < ui(a(0)) by assumption, this is actually a further punishment (which, er, actually reinforces my interpretation, but never mind that). The point is that this punishment may not be enough, or may come 1Again, let me repeat this because I first got it wrong in class (but then you guys spotted me!): whenever we are in state C(j), we remain there if somebody deviates, and move to the state P(k, L) in which Player k’s punishment begins if k deviates from a(j). Thus, what supports a(j) as a continuation equilibrium after a deviation by Player j is the threat of further punishment; a(j) need not be a Nash equilibrium per se. 4

too late, so we make sure that Player i is hit really hard for L periods before switching to the conciliation phase Finally, the second condition on d has the following interpretation: by punishing Player j Player i potentially loses M-ui(p-, Ti(p-i))for the L periods following Player j 's deviation (be it a deviation from a(0)or from whatever j was supposed to play). On the other hand after L rounds of punishments, the game will switch to the C() conciliation stage. Now, Player i prefers to be in the C( state than in the C(i) state, by assumption, so if the discount factor is large enough, she will not deviate The only subtle point is that she may actually deviate at any t E 1, .., L)(where time is measured starting from the first stage after Player j's deviation). If she deviates at t, she will actually be held down to vi starting from t+1 and until t+L>L+1; the condition in the text is thus stronger than it needs be(it assumes a payoff of M from t+ l to L, and a punishment payoff of ui(a(i)) from L+l to t+L
too late, so we make sure that Player j is hit really hard for L periods before switching to the conciliation phase. Finally, the second condition on δ has the following interpretation: by punishing Player j, Player i potentially loses M −ui(p−j , rj (p−j )) for the L periods following Player j’s deviation (be it a deviation from a(0) or from whatever j was supposed to play). On the other hand, after L rounds of punishments, the game will switch to the C(j) conciliation stage. Now, Player i prefers to be in the C(j) state than in the C(i) state, by assumption, so if the discount factor is large enough, she will not deviate. [The only subtle point is that she may actually deviate at any t ∈ {1, . . . , L} (where time is measured starting from the first stage after Player j’s deviation). If she deviates at t, she will actually be held down to vi starting from t + 1 and until t + L ≥ L + 1; the condition in the text is thus stronger than it needs be (it assumes a payoff of M from t + 1 to L, and a punishment payoff of ui(a(i)) from L + 1 to t + L).] 5
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- gametheory《经济学理论》 Lecture 15: Sequential equilibrium.pdf
- gametheory《经济学理论》 ecture 7: Interactive Epistemology(2).pdf
- gametheory《经济学理论》 Lecture 10: Extensive Games with(Almost) Perfect.pdf
- gametheory《经济学理论》 ecture 6: Interactive Epistemology(1).pdf
- gametheory《经济学理论》 1 Game Theory Multiperson Decision Theory.pdf
- gametheory《经济学理论》 Lecture 14: General Extensive Games.pdf
- gametheory《经济学理论》 lecture 11: Subgame perfection.pdf
- gametheory《经济学理论》 Lecture 8.5: More on Auctions.pdf
- gametheory《经济学理论》 lecture 16: Applications of Sequential and Perfect.pdf
- gametheory《经济学理论》 Problem Set 1: Due Friday.pdf
- gametheory《经济学理论》 Problem Set 2: Due Thursday.pdf
- gametheory《经济学理论》 Marciano siniscalchi.doc
- gametheory《经济学理论》 Problem Set 4: Due Tuesday, November.pdf
- gametheory《经济学理论》 Problem Set 3: Due Thursday, October 28.pdf
- gametheory《经济学理论》 Problem set 5: Due Tuesday, November 23.pdf
- gametheory《经济学理论》 Lecture 12: Repeated Games.pdf
- 经济学—微观经济学——微观经济学 (总复习讲解).ppt
- 经济学—微观经济学——总体均衡与福利经济学概论.ppt
- 经济学—微观经济学——关于分配理论的讲解.ppt
- 经济学—微观经济学——产业组织理论讲解.ppt
- gametheory《经济学理论》 Lecture 2:(Iterated) Best Response Operators.pdf
- gametheory《经济学理论》 Lecture 4: Games with Payoff Uncertainty(1).pdf
- gametheory《经济学理论》 gnawing game Marciano siniscalchi January 10, 2000.pdf
- gametheory《经济学理论》 The Trembling Hand: Normal-Form Analysis and Extensive-Form Implications.pdf
- gametheory《经济学理论》 Forward Induction Marciano siniscalchi January 10, 2000.pdf
- gametheory《经济学理论》 Lecture 3: Nash equilibrium.pdf
- gametheory《经济学理论》 Lecture 5: Games with Payoff Uncertainty(2).pdf
- gametheory《经济学理论》 Lecture 5: Games with Payoff Uncertainty(2.pdf
- 广东外语外贸大学:《国际贸易理论与政策》 贸易理论篇分析与研究.doc
- 中国人民大学:《经贸知识英语》 概论.ppt
- 《宏观经济学》课程PPT教学课件:第四章 货币市场的均衡.ppt
- 《宏观经济学 》第五章 双重均衡的宏观经济模型.ppt
- 《宏观经济学 》 第六章 扩展的凯恩斯模型一三重均衡的宏观经济模型.ppt
- 《宏观经济学》课程PPT教学课件:第七章 财政收支与财政政策.ppt
- 《宏观经济学》课程PPT教学课件:第一章 导论——宏观经济学的产生与发展.ppt
- 江苏省行政学院:《城市经济学》课程教学资源(PPT课件讲稿)概论.ppt
- 《中国税制》 个人所得税的概述.doc
- 《中国税制》 个人所得税的含义与发展状况.ppt
- 南京审计学院: 教师授课进度表.doc
- 《中国税制》 企业所得税.doc