Resilient cooperators stabilize long-run cooperation in Prisoner’s Dilemma

Experimental design

Our experiment was designed to closely resemble a number of previous studies of finitely repeated Prisoner's Dilemma (PD)^2,3,10. Anonymous individuals were randomly paired to play a series of ten-round repeated games of PD, where in each round each player was required to choose one of two actions—cooperate (C) or defect (D)—after which they received a payoff from the payoff matrix displayed in Table 1 (see also Supplementary Fig. 1 for screenshots). We note that the payoffs were chosen to satisfy the usual PD inequalities (T=7)>(R=5)>(P=3)>(S=1) and 2R>T+S; moreover, they were chosen to correspond to the normalized quantities g==1 and l==1, which are toward the low end of the normal range for previous studies^{2,3,10,33,34,35,36}. After each round, both players were shown the action of the other player, and each could see their own payoff as well as cumulative payoffs up to that game and for the entire experiment (see Supplementary Fig. 1). After each ten-round game players entered a virtual waiting room until all other games had completed (a counter informed players how many others were also waiting), at which point they were randomly reassigned to new partners and a new set of games commenced. This process was repeated 20 times over the course of a single session, where we again emphasize that players remained anonymous and unidentifiable throughout (see Supplementary Fig. 2 for a visual representation of a single day).

Table 1: Per round payoff for (Row Player, Column Player).

Our experiment’s main point of departure from previous work was that rather than conducting our experiment for a single session we retained the same population of subjects for 20 such sessions, held at the same time on consecutive weekdays over the period 4 August – 31 August 2015. The experiment commenced with 113 subjects recruited in advance from Amazon’s Mechanical Turk. To minimize latency in the user interface and language barriers in delivering instructions, we restricted participation to residents of the US and Canada; however, the subject pool was otherwise diverse with respect to location (31 US states), age (18–61) and gender (47% female) (see Supplementary Figs 3 and 4 for more details of the player population). Also to minimize latency, we split the population into two sessions held each day at 13:00 hours EDT (n=56) and 15:00 hours EDT (n=57), respectively. Players were assigned randomly to a session at the outset of the experiment and were retained in that session for the duration of the experiment. Although there were some slight differences between the two sessions, behaviour—including attrition—was qualitatively indistinguishable, thus for all results stated in the main text we treat the two sessions as a single population (noting that pooling of subjects from multiple experimental sessions is a common practice in traditional lab experiments). Sessions lasted an average of 35 min and players were paid in proportion to their cumulative payoff. Players earned an average of $4.47 per session corresponding to an hourly wage of ∼$7.66, substantially higher than the self-reported average wage for tasks on Mechanical Turk³⁷. To minimize attrition, we also offered an additional one-time bonus of $20 for completing at least 18 of the 20 sessions, payable at the end of the experiment. Subjects who missed more than two sessions were excluded from the experiment and prevented from completing any remaining sessions, thereby forfeiting the bonus along with any unearned compensation. Of the initial population 94 subjects (83%) satisfied our completion criterion, earning an average variable compensation of $87.03 and $107.03 in total (we found no significant differences between dropouts and non-dropouts; see ‘Methods’ section for more details of recruiting and attrition). Over the course of the experiment these subjects played an average of 375 ten-round games each, making 3,720 individual decisions each for a total of 374,251 decisions collectively (see Supplementary Fig. 5 for a visual representation of the entire experiment).

Initial cooperation and unravelling

Figure 1a shows cooperation levels in rounds 1 (green), 8 (blue), 9 (purple) and 10 (red) over the course of the experiment. On day 1 the first round cooperation rates started at over 80%, a figure that is not unprecedented among previous studies³⁸, but is substantially higher than the usual range of 40–60% (refs 3, 9, 30, 33). There are a number of reasons why our set-up may have led to overall higher-than-typical cooperation. First, although previous work^39,40 has found that players recruited from MTurk cooperate at similar rates to those in lab studies, it is possible that the recent evolution of the MTurk community has resulted in a population that is more cooperative than the usual, also non-representative⁴¹, population of subjects present in traditional lab experiments. Second, prior work¹⁰ has noted that cooperation rates in finitely repeated games are sensitive to choices in the game matrix parameters g and l, where lower values correspond to more cooperation. As noted above, our values g=1 and l=1 were at the low end of previous studies, thus it is not surprising that we recover relatively high cooperation rates. Third, prior work¹⁰ has also shown that the duration of a finitely repeated game is highly predictive of initial cooperation levels. Our games, which were ten rounds long, were relatively long compared with previous experiments; thus once again it is not surprising that cooperation levels were relatively high. Moreover, analogous logic would suggest that the overall duration of the experiment could also be related to cooperation levels. Because our design required us to inform participants about the length of the experiment, this knowledge may also have led to more cooperative behaviour. Finally, although players were not explicitly told the size of the population with whom they were being matched, they could have inferred this information from the counter in the virtual waiting room. Likewise, they were not directly informed that they were playing with the same population every day but could have inferred as much from their instructions, and hence could have reasonably concluded that they would anonymously encounter the same players several times over the course of the experiment. It is plausible, therefore, that the general expectation of repeated interactions also facilitated cooperative behaviour.

In other respects Fig. 1 shows that early behaviour closely resembled results from similar previous experiments. Specifically, Fig. 1b shows that cooperation levels, which remained high during the early rounds of each repeated game, dropped to a relatively low level in the final rounds, exhibiting the so-called ‘end game’ effect predicted by the rationality hypothesis¹. Moreover, between games cooperation levels exhibited the well documented ‘restart effect’⁴² in which cooperation jumps sharply from the last round of game j to the first round of game j+1. Other than the relatively high average level of cooperation, therefore, the dynamics of session play was qualitatively similar to previous experiments of comparable duration^2,3,10,38. Importantly, first session play also lends support to the rationality hypothesis: cooperation levels in round 1 (green line) increased slightly over the course of the session, but decreased steadily for rounds 9 (purple line) and 10 (red line), consistent with previous claims of unravelling^2,10,31. Also importantly, Fig. 1a shows that the decrease in cooperation during rounds 9 and 10 continued for several days, but then slowed dramatically for the remainder of the experiment. Supporting this claim, Fig. 1c,d show that cooperation levels on days 11 and 20, respectively, continued to start high for each game and drop sharply as the end-game approached, but that there was much less change over the course of a session. Moreover, the relatively small decreases in rounds 9 and 10 cooperation that did occur over the course of a session largely ‘reset’ themselves at the start of the next session such that there was little change from day to day.

Unravelling stabilizes after several days

Figure 2 shows the same general trends in three different ways. First, Fig. 2a shows the average rate of cooperation by round, broken down by day. Consistent with the observations from Fig. 1, the pattern of cooperation at first changes from day to day, increasing in early rounds and decreasing in later rounds, but then appears to stabilize after several days (green through purple). Second, Fig. 2b shows the daily average of the game restart effect—that is, the difference between round 10 on game j and round 1 on game j+1—over the course of the experiment. Again consistent with the results above, the restart effect increases sharply for the first several days as the end-game effect visible in Fig. 2a becomes more pronounced, but again it stabilizes after several days. Finally, Fig. 2c shows the session restart effect (as distinct from the game restart effect): the difference in cooperation levels for rounds 9 and 10, respectively, during game 1 of day d+1 compared with game 20 of day d (orange box plots). For comparison, Fig. 2c also shows the corresponding difference between successive games within the same session (green box plots). Whereas the across-game difference is slightly negative within a session, the across-session effect is large and positive (on 17.2 and 13.6% for rounds 9 and 10 respectively), largely accounting for the ‘reset’ effect noted above in Fig. 1.

**Figure 2: Stabilization of cooperation.**

Taken together, Figs 1 and 2 suggest that play can be broken into two phases: an ‘unravelling’ phase during which players start defecting on progressively earlier rounds, and a ‘stable’ phase during which unravelling abates. Addressing this question more systematically, Fig. 3 shows the distribution of round of first defection, r_d for each day of the experiment. To identify the onset of a stable phase, we apply a two-sample Kolmogorov–Smirnov (K–S) test to successive days, finding that day-to-day changes are significant up to day 7 but then insignificant thereafter (see ‘Methods’ section for details). In addition, the onset of a ‘stable’ state at roughly day 7 can be inferred in at least two other ways: first, by noting the change of slope in the cooperation rates for rounds 9 and 10 (Fig. 1a); and second, by observing the between-game ‘restart effect’, which rises for the first several days and then stabilizes, again around day 7 (see Fig. 2b). Although these measures are less precise than the K–S test applied to the distribution of round of first defection, they both yield similar results. We therefore identify day 7 as the end of the unravelling phase (although we note that the precise day on which stabilization occurs is relatively unimportant for our results) and hereafter treat the period spanning days 7–20 as the stable phase.

**Figure 3: Stabilization of defection.**

Figure 3 also reveals three additional trends of interest. First, during the unravelling phase the left-hand bar—comprising a small group of early defectors—largely disappears, consistent with the assertion¹⁰ that players first converge on one of a number of ‘threshold’ strategies. That is, they cooperate conditionally until some predetermined round r_i after which they defect unconditionally (one player continued to defect in all rounds throughout the experiment). Second, among initially cooperative players there is a drift toward earlier first defection, again consistent with the conjecture that rational players, having settled on a threshold strategy, begin to slowly unravel. Finally, however, Fig. 3 also provides some direct evidence for the existence of a significant minority of players who do not appear to follow the unravelling pattern. Specifically, we observe that fully cooperative games occurred at rates between 15 and 20% for the duration of the experiment. Since players were paired randomly, and a game where neither player defected requires both players to be conditional cooperators, then a frequency of 16% of games with no defection implies a 40% frequency of conditional cooperators.

Identification of resilient cooperators

Summarizing, Figs 1, 2, 3 suggest that, consistent with the rational cooperation hypothesis, a majority of players first converge onto one of a number of threshold rules, and then subsequently exhibit ‘unravelling’ as their thresholds creep earlier with experience. Strikingly, however, Figs 1, 2, 3 also suggest that a significant minority do not exhibit this pattern, but rather consistently behave like conditional cooperators. To test for these different player types more systematically, we exploit the roughly 3,720 observations per player to identify individual-level strategies as well as their evolution over time. Specifically, we estimate for each player i a unique strategy s_i(j) for each game j from among eleven predefined strategies: ten ‘threshold’ strategies T_x for each round x=1, …, 10, according to which a player conditionally cooperates up to round x−1 and then defects unilaterally from round x, and CC for players who conditionally cooperate for the duration of the game (see ‘Methods’ section for details). Figure 4a shows inferred strategies for the 94 players who completed the experiment: each row of 400 cells represents a single player i, where each cell is coloured to indicate i’s inferred strategy for a single game j. Figure 4a reveals three main results. First, consistent with previous work^2,3,10, the 11 predefined strategies account for a large fraction of all player-game observations; specifically, the fraction of ‘other’ strategies declines from about 19% on day 1 to <1% by day 7 (see Supplementary Fig. 6). Second, Fig. 4a shows that roughly 60% (n=58) of players exhibited behaviour consistent with the rational cooperation hypothesis: starting out playing CC but then switching to progressively less cooperative threshold strategies (that is, T₁₀, T₉, T₈, T₇). Third, however, almost 40% of players (n=36) displayed no such systematic unravelling tendency, consistently playing CC throughout the experiment. Figure 4b which shows a histogram of % games playing CC during the stable interval (days 7–20) shows that in fact these 36 players, who occupy the right-hand mode of the histogram, all play CC in at least 80% of games. Finally, Fig. 4c shows the average daily payoffs for the 36 players who played CC (blue line) versus that of the other players (red line): the two groups had similar payoffs on the first day, when all players were cooperating at similar rates; however, for all subsequent days CC players received lower payoffs than threshold players by a large and significant margin (|t|>5.3, P<10⁻⁶ for each day d≥2).

On the basis of this evidence we conclude (a) that roughly 40% of players were ‘resilient cooperators’ who persistently behaved as conditional cooperators even at substantial cost to themselves; and (b) the remainder were ‘rational’ in that they cooperated only inasmuch as they believed it was in their selfish best interest to do so. We also confirm this behavioural classification of resilient cooperators with self-reported evidence from an exit survey conducted at the completion of the experiment; of the 94 subjects who completed the entire experiment, 38 reported that they had intentionally cooperated as long as their partner did, and had resisted the temptation to defect first. Moreover, they reported that they had maintained this strategy throughout the experiment even after perceiving others to have behaved selfishly (see ‘Methods’ section for more details of self-reported strategies). Importantly we found that 33 of the individuals whom we identified as conditional cooperators in this manner were also among the 36 individuals in the right-hand mode of Fig. 4b, indicating extremely high agreement between quantitative and qualitative classification schemes (see Supplementary Fig. 7 for additional analysis of resilient cooperators by gender and age, and Supplementary Fig. 8 for analysis by experience).

Resilient cooperators permanently stabilize cooperation

The existence of resilient cooperators in turn suggests an explanation for the observed slowdown in unravelling: as the rational players learned the true fraction of conditional cooperators in the population, they converged on a ‘partially unravelled’ state that balanced the risk of exploitation by other rational players with the potential gains from cooperation with CC players. If correct, this explanation would also suggest that the observed slowdown was permanent and that cooperation levels by the end of the experiment were close to their asymptotic limit. To test these related hypotheses we simulated an agent-based model comprising two types of agents: resilient cooperators who unconditionally play CC for the entire duration; and ‘rational’ players who continually update their beliefs about the distribution of player types in the population and then choose among available threshold strategies T_x so as to maximize their expected payoff given their beliefs. Specifically, in each game the rational players: (a) form beliefs about the strategies being played by other agents based on their past opponents’ play; (b) conditional on these beliefs, calculate their expected utility for each available strategy; and (c) stochastically update their current strategy in proportion to each potential strategy’s expected utility (see ‘Methods’ section for details). By systematically varying the fractionα of resilient cooperators we can explore their impact on unravelling.

Figures 5a,b show the results of the simulation for α=0 and α=0.4, respectively, for N=100 agents. In the absence of resilient cooperators (Fig. 5a), rational players exhibit exactly the unravelling predicted by the rational cooperation hypothesis^1,2,10,31: over the course of 400 games, players unravel almost uniformly through T₁₀ all the way down to T₁, albeit progressively more slowly for lower thresholds. In contrast, when 40% of players are resilient cooperators (Fig. 5b), corresponding to what we observed in our experiment, unravelling is curtailed, with T₉ emerging as the modal strategy and significant fractions occupying T₁₀ and T₈. Encouragingly Fig. 5b bears a close resemblance to Fig. 4a, suggesting that in fact the entire distribution of steady-state strategies of agents in the simulation is similar to that for our experimental subjects.

**Figure 5: Resilient cooperators stabilize cooperation in an agent-based model.**

In addition to replicating the high-level results of our experiment, the learning model also makes two predictions. First, as shown in Fig. 6a, cooperation in rounds 8, 9 and 10 for the α=0.4 case remains stable for at least 4,000 games, ten times the length of our experiment. This result suggests that the apparent stabilization of cooperation that we observe in the experiment after 7 days is not simply a slowing down of the unravelling process, but an end to it. In other words, the model predicts that with sufficiently many resilient cooperators present in a population of rational cooperators, cooperation can be sustained indefinitely. Second, the model also makes a prediction about how many resilient cooperators are necessary to sustain cooperation even among rational cooperators. To show this result, we first define r_∞ as the average first round of defection r_d for rational players as it approaches its asymptotic limit (in practice we estimate r_∞ by running the simulations for at least 2,000 games). Figure 6b shows estimated r_∞ as a function of α along with the values of α≈0.4, r_∞≈8.2 obtained from our experiment (averaged over the stable phase, days 7–20). In addition to reinforcing the agreement between experiment and simulation noted above, Fig. 6b also predicts the full functional dependency of r_∞(α). Notably, r_∞ appears to undergo a sharp transition, resembling an epidemic threshold⁴³, at some critical value α_*≈0.1: for α<α_* unravelling progresses all the way to the beginning of the game (r_∞=1), whereas for α>α_*, r_∞ increases sharply and nonlinearly, eventually approaching r_∞=10 (that is, no unravelling) when α=1 (see Supplementary Figs 9 and 10 for robustness checks).

**Figure 6: Asymptotic behaviour of the simulation model.**

Resilient cooperators stabilize long-run cooperation in Prisoner’s Dilemma

Experimental design

Initial cooperation and unravelling

Unravelling stabilizes after several days

Identification of resilient cooperators

Resilient cooperators permanently stabilize cooperation

Trending Articles

Police confirm man stabbed to death in Selsdon was Andrew David Else of Croydon

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

Muloraki Au

Windows Server の Essentials エディションは、ドメインのメンバーサーバーとして利用できません。

Police charge man, 23, with assault and criminal damage following incident in...

(Notes & Audio) The 26 Promises of Allah to the Ummah

Raj Panchayat 3rd / Third Grade Teacher Revised Result 2012 Level 1-2...

Practice Sheet of Right form of verbs for HSC Students

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

I Offer a Relaxing Swedish Massage for adult males and females of all ages. :...

Drug dealing brothers caught with £74k stash in Newtown Linford home

Scanmatik 2 SM2 clone diver v2.21.22 free no pass

Notification of Pre-Mature Increment to All the Upgraded Employees since...

Hull man, 27, dies after crashing car into a tree on the A165 near Brandesburton

Brunei reaffirms healthcare commitment

Kalank - Malayalam (1CD ) - subtitles

99 God Status for Whatsapp, Facebook

Skint TV teen to be sentenced

Kanulanu Thaake Lyrics and translation | Manam (2014)

Stephanie cheung vs victoria hay vs estrina ang