Strategic Uncertainty in Markets for Nonrenewable Resources: A Level-k Approach

Existing models of nonrenewable resources assume that sophisticated agents compete with other sophisticated agents. This study instead uses a level-k approach to examine cases where the focal agent is uncertain about the strategy of his opponent or predicts that the opponent will act in a nonsophisticated manner. Level-0 players are randomized uniformly across all possible actions, and level-k players best respond to the action of player k − 1. We study a dynamic nonrenewable resource game with a large number of actions. We are able to solve for the level-1 strategy by reducing the averaging problem to an optimization problem against a single action. We show that lower levels of strategic reasoning are close to the Walras and collusive benchmark, whereas higher level strategies converge to the Nash-Hotelling equilibrium. These results are then fitted to experimental data, suggesting that the level of sophistication of participants increased over the course of the experiment.


Introduction
Existing models of nonrenewable resource markets assume that optimizing agents compete against other optimizingv agents in environments characterized by Cournot or Stackelberg competition (see [1][2][3][4][5] and many others).This implies that agents will, in the Nash equilibrium, have a perfect understanding of what their opponents will do.In reality, however, agents may experience considerable uncertainty about the strategy the opponent will follow.For example, agents may not have all the information (such as cost functions, stock sizes) required to calculate an opponent's Nash equilibrium strategy [6,7].In such cases, they must form beliefs about the possible strategies chosen by the opponent and maximize their expected profits given these beliefs.
Similarly, even with perfect information, agents may in some cases expect their opponent not to follow the Nash-Hotelling equilibrium.For example, the opponent may face political pressure to maximize current revenue and produce at maximal capacity or otherwise be too focused on maximizing present revenue rather than discounted profits (see, e.g., [8][9][10]).In such cases, agents must again form beliefs about the possible strategies chosen by the opponent and maximize their expected profits given these beliefs.
We model the ensuing uncertainty about the opponent's chosen strategy using a level- framework (see, e.g., [11][12][13][14][15][16]), a commonly used approach to model nonequilibrium opponents in behavioral economics.The framework starts by specifying the strategy for a level-0 player, who is argued to choose a random production trajectory from the set of all possible trajectories.Level- players then best respond to the strategy of a level-( − 1) opponent.We use a standard linear demand function that allows us to compute the Nash, collusive, and Walrasian benchmark.We then show that higher level strategies converge to the Nash-Hotelling equilibrium, while lower level strategies may closely approximate the Walras and collusive benchmark.Finally, we fit the results to experimental data from van Veldhuizen and Sonnemans [17] to empirically estimate the distribution of types and find that participants appear to be using higher level strategies in latter parts of the experiment.
The contribution of this paper is threefold.First, we add to the literature on nonrenewable resources by suggesting a novel way to analyze nonequilibrium behavior.Second, we contribute to the literature on level- reasoning in behavioral economics by applying level- to the novel setting of nonrenewable resources.Third, we fit our theoretical results to data from an existing laboratory experiment.

Model and Benchmarks
We assume that two players, player  and player  − 1, are active in a nonrenewable resource market characterized by linear demand and Cournot competition.(The reason for giving players integer numbers as identifiers is that we later will assume player  plays the level-.)Both players are assumed to start with a fixed stock of resources ( 0 ) which they can allocate over a discrete number of time periods ().The quantity of the resource extracted by player  in period  is denoted as    .Further, let the set of quantities player  allocated over the  periods be denoted by   = ( 1  ,  2  , . . .,    ).We will refer to   as player 's trajectory in the remainder of the text.
Given two trajectories   ,  −1 of players  and  − 1, respectively, and assuming a linear demand function, the profit for player  is defined as Here,  and  are parameters of the demand function and  is the discount factor.In line with the lab experiment, we will use symmetric firms and take  0 = 170,  = 6,  = 1,  = 372, and  = 1/(1 + ) = 1/1.1.Player 's problem is to choose a strategy that maximizes the sum of discounted profits, subject to the resource constraint: Player  − 1 solves an analogous problem.There are three relevant benchmarks to consider: the Nash equilibrium, collusion, and Walras.For the Nash equilibrium, both players maximize their own profits given the strategy of the opponent.In the collusive case, the two players maximize joint profits, and in the Walras case, each player maximizes its profits while taking prices as given.Table 1 shows the relevant trajectories; for a more detailed derivation see van Veldhuizen and Sonnemans [17].
Two results are worth highlighting.First, all three benchmarks have decreasing trajectories.This follows directly from the standard result that the shadow price of the resource increases over time at the rate of interest [18].Second, relative to Nash, the collusive quantities are smaller in early periods and larger in late periods; the converse is true for Walras, which is again a standard result [19].

Level-𝑘 Approach
We now move away from the symmetric benchmarks discussed in the previous section in order to derive the optimal strategies for players of different level-.As the first step, the level- framework requires a level-0 strategy to be specified.A typical approach (see, e.g., [12][13][14]) is to assume that a level-0 player uniformly randomizes across all possible actions.In a dynamic game such as ours, randomization can occur at two levels: actions (i.e., the quantity chosen in period ), and trajectories (i.e., resource allocations over all periods).In addition, the standard framework does not tell us whether the resource constraint should be required to hold with equality.We will start by presenting the results for randomization over trajectories while allowing the resource constraint not to hold with equality.That is, we initially only require that each trajectory  0 = ( 1 0 , . . .,   0 ) satisfies We discuss the results of the other three cases in Section 5.
3.1.Level-0.We will start by considering the set of all possible trajectories of the level-0 player, denoted by  0 .For the purpose of this derivation and to facilitate a comparison with the experimental data, we consider the case where players can only choose discrete extraction quantities.We therefore discretize the interval of possible quantities  fl [0,  0 ] into   ∈ N \ {0, 1} equidistant values with distance The discrete set of valid extraction quantities for a given period is then We denote the corresponding set of possible trajectories for the level-0 player that consist of quantities in Q by Ĝ0 ⊆  0 .
Note that lim →0 Ĝ0 =  0 .In the experiment we have that  = 1, and therefore Ĝ0 contains all integer extraction quantities in .
We denote the cardinality of Ĝ0 by  ℓ fl | Ĝ0 |.We can now number trajectories in Ĝ0 Ĝ0 = { 0,0 ,  0,1 , . . .,  0, ℓ −1 } and add a corresponding index ℓ to the quantities defining each trajectory in Ĝ0 : 3.2.Level-1.The next step is to derive the strategy of the level-1 player.The level-1 player's goal is to maximize the expected sum of his discounted profits , conditional on his opponent playing the level-0 strategy.Player 1's objective Π 1 can be expressed as Then the optimization problem for level-1 reads as follows: maximize

Averaging Problem.
Writing out Π 1 explicitly, using the definition of ( 1 ,  0 ) given in (1), and changing the order of the two sums (the averaging over strategies and the sum over time periods), we can write This allows us to accumulate the averaging process in an effective quantity   eff that expresses the average production of the level-0 player in period .
Since the level-0 player is assumed to choose among the trajectories Ĝ0 in a uniformly random fashion, the averaging process of all the strategies is symmetric with respect to the time index.It follows that the value of   eff is independent of the period: Proposition 1.The value of  eff depends only on the number of periods and the total available resource per player: Proof.We define the set Ĝ 0 ⊂ Ĝ0 that contains all trajectories with a total resource extraction of : Ĝ 0 fl { 0,ℓ : ∑    0,ℓ = }.
The cardinality of Ĝ 0 is denoted by  Ĝ 0 fl | Ĝ 0 |.Its value can be expressed using a binomial coefficient: Using the combinatorial identity, we can calculate the number of all possible strategies We now rewrite  eff in terms of the sum over all elements of the matrix   0,ℓ : Using (13), we arrive at the following expression for  eff : We emphasize that we obtained an expression for  eff that does not depend on , the step size of the discretization.It follows that we can now take the limit  → 0,   → ∞ and recover the same result for the case of continuous allocation quantities ( Ĝ0 ≡  0 ).

Solving the Level-1 Problem.
In the previous section, we have shown that the problem (9) that requires the evaluation of a potentially large sum can be reduced to a much simpler problem.In particular, we have shown that where Using this definition of Π 1 , the problem ( 9) is computationally equivalent to the original problem (2) where the opponent is known to play a particular trajectory.We can now formulate this problem as a nonlinear program with one integer decision variable (  ) for each of the 6 periods and find the globally best strategy by numerically solving the problem.We use the mixed-integer nonlinear programming (MINLP) solver SCIP [20] to solve the problem to global optimality.A phase space analysis of the level-1 problem can be found in Fügenschuh et al. [21].
A global optimum will be attained even when the problem is nonconvex, as is the case when producers are choosing between discrete extraction quantities, as in the experiment below.In the continuous case, the problem is convex and the global optimum can also be found with a local NLP solver.

Solving for 𝑘 > 1.
We can now find solutions up to an arbitrary level- using an iterative process.For each level-, we solve (2) using the trajectory computed in the last level for the  − 1 player.We implemented this as a shell script that solves each level with an individual call to our solver of choice SCIP, using the results of the previous level as input.

Computational Results
Table 1 displays the computational results.To allow for a better comparison with the experiment, we required all quantities to be integers for all levels except level-0 (which is an average).The results for level- for  ≥ 1 are identical if we start the iterations with a rounded effective quantity of 24 instead of the exact value of 170/7.
The level-0 player extracts on average a quantity of 170/7 ≈ 24.29 in each period.Relative to the Nash equilibrium, the level-0 player therefore on average moves its production toward later periods.The level-1 player responds to this by moving production to earlier periods.As a result, the level-1 strategy is much closer to Walras than to the Nash equilibrium.Faced with the high initial extraction of the level-1 player, the level-2 player then responds by moving production back to later periods and so on.This alternating process quickly converges to the Nash equilibrium quantities; at level-7, the solution has converged; that is, all players of level- for  ≥ 7 play exactly the same quantities.

Alternative Definitions of Level-0
In the previous sections, we assumed that the level-0 player picks any of the possible trajectories with an equal probability.In this section we will consider the case, where the player picks one of the possible actions (i.e., quantities) in each period with an equal probability.As demonstrated in Figure 1 for a stylized example, this leads to different probabilities for the strategies.
Let us revisit the objective of the level-1 player for this definition of level-0: ))) . ( Again it is possible to condense the (weighted) averaging process in an effective quantity   eff2 .This is stated in the following proposition.Note that, in contrast to Proposition 1, the value of   eff2 depends not only on the total available resource  0 , but also on the actual period .Proposition 2. For  = 1, 2, . . .,  it holds that Proof.By induction over the time period .For  = 1 we have that Denote by   the residual resource in period .For the induction step, assume the proposition is true for some .That means   eff2 =  0 /2  .This value is also computed directly from the tree as (cf. Figure 1), where and as abbreviation we set The value of  +1 eff2 is computed as follows: where q 2 0 = 0 P = 1 6 q 2 0 = 2 P = 1 9 q 2 0 = 1 P = 1 9 q 2 0 = 0 P = 1 9 1 Figure 1: Probability tree for the case of randomizing over quantities for  0 = 2 and  = 2.In the case where the level-0 player picks a strategy at random, all strategies would have the same probability of  = 1/ ℓ = 1/6.In the alternative case shown in this figure, the player picks between all available quantities in each period with the same probability.This leads to a different distribution of probabilities over the strategies, as shown in the tree.
We show that the next period  + 1 leads to a division by 2 of the residual resource   .In other words, it remains to show that 2 +1 =   .On the one hand, computing   in (24) leads to On the other hand, computing  +1 in (27) leads to which completes the proof.
Intuitively, the level-0 producer who uniformly randomizes over actions extracts on average half of his resource in period 1.This implies that at the start of period 2 he will on average only have half of his resources left.Randomizing implies that he will then extract half of his remaining resource in period 2. This process continues until the final period.
After deriving the general expression for the effective quantities for the alternative definition of level-0, we can compute  eff2 for the six periods of our game with a resource stock  0 = 170.For the following levels, the procedure is the same as described in Sections 3.2.2 and 3.3.
The results of these derivations are presented in Table 2. Relative to the Nash equilibrium, the level-0 player defined this way extracts a much larger fraction of his resource in period 1 and a much smaller fraction in periods 3 to 5. The level-1 player responds by moving his extraction from period 1 to periods 3 to 5, leading to a u-shaped trajectory.Level-2 best responds to level-1 by moving extraction back to period 1.As before, this leads to an alternating process that quickly converges to the Nash equilibrium trajectory.Finally, we consider the effect of requiring the resource constraint to hold with equality.For the definition of level-0 discussed in the previous sections, deriving  eff for the case of full resource consumption (i.e., ∑    0 =  0 ) is much simpler.In this case we have the following.

Proposition 3. If only trajectories satisfying ∑ 𝑇
=1   0 =  0 are considered in Ĝ0 , the value of  eff is given as follows: Proof.We have that and following the idea of ( 16) we can write For our parameters, this changes the average level-0 trajectory to 170/6 in each period, with only a minor effect on the level-1 and level-2 quantities and none for higher levels.
For the alternative definition of level-0 discussed here, we can similarly require the level-0 player to extract his entire resource in the last period.

Proposition 4. If only trajectories satisfying ∑ 𝑇
=1   0 =  0 are considered in Ĝ0 , then for  = 1, 2, . . .,  − 1 it holds that and for the final period  = This will leave all the effective quantities unchanged (in comparison to the case where not all resources must be spent), except for the last one  6 eff2 , where the value of  6 eff2 would be twice the value given in row  = 0 of Table 2 (i.e., equal to  5 eff2 ≈ 5.31) which is a small difference that has only a minor effect on level-1 and no effect on higher levels.Overall, requiring the resource constraint to hold with equality therefore does not impact our computational results.

Experimental Data
The model studied in this paper was implemented in a laboratory experiment by van Veldhuizen and Sonnemans [17].In their experiment, participants went through 10 repetitions of the same 6-period set-up.In each repetition, participants started with a limited resource they could use over the 6 periods of the game.The resource was then replenished at the start of the next repetition.We refer to their paper for more details on the experimental design and instructions.However, it is important to note that participants were rematched to a different opponent for every repetition (or round).Their first period quantity in each of the 10 repetitions could therefore not be based on any knowledge of the prior behavior (i.e., the level-) of their opponent and can hence serve as a proxy for their level in the cognitive hierarchy.
We classify all participants in the experiment by the level that most closely corresponds to their first period choice of quantity, as per Table 1.We use the point prediction for the Nash equilibrium but allow quantities to deviate slightly from the point prediction of level- and the other benchmarks, in order to capture all quantities that lie between Walras and collusion.However, the results presented below are robust to using just the point prediction for the level- players as well.We only classify quantities that are either three units smaller than the collusive quantity or three units larger than   3, except that the Nash equilibrium quantity is coded as level-5; Collusive and Walras quantities are coded as level-0.
the Walras quantity as level-0.We use the three-unit wiggle room in order not to immediately classify any deviation from Walras or collusion as level-0 behavior.Quantities between collusion and Walras could of course be generated by level-0 players as well but correspond more closely to the predictions for players on higher levels.
Table 3 presents the level implied by the chosen quantity of each participant in round 1 and round 10 of the experiment.The number of participants classified as level-0 seems to decrease quite strongly from round 1 (24 participants) to round 10 (5 participants).The number of participants classified as level-1, level-2, and particularly level-3 increased correspondingly.Figure 2 presents the average level for each of the 10 rounds, where the Nash quantity is coded as level-5, and collusive/Walras quantities are coded as level-0.In line with Table 3, there is a clear and significant upward trend in the average level.The coefficient of a linear regression of participants' level on the round is significant ( < 0.001, standard errors clustered by participant).These results suggest that participants learned to make higher level decisions over successive repetitions of the experiment.Classifying participants using the alternative results presented in Table 2 does not substantively affect our results.Using the results of Table 2, every producer classified as level-1 to level-4 in Table 3 is classified as one level higher.In addition, six producers in round 1 and two producers in round 10 would be reclassified from level-0 to level-1.The coefficient for round in the linear regression is still significant ( < 0.001).

Discussion
Existing models of nonrenewable resources assume sophisticated agents compete with other sophisticated agents.This study instead uses a level- approach to examine situations where the focal agent is uncertain about the strategy of his opponent or predicts the opponent will act in a nonsophisticated manner.We modeled the uncertainty about the opponent's chosen strategy using a level- framework.
Interestingly, when the level-0 player is randomized over all possible trajectories, the level-1 player's optimal strategy is quite close to the Walras benchmark.Intuitively, this player best responds to a random opponent, who on average underextracts the resource in early periods (relative to Nash).As a result, the level-1 player's best response is to instead overextract the resource in early periods.Similarly, the level-2 player best responds to level-1 and is therefore closer to the collusive quantity than to Nash.Thus, for producers who expect their opponents to be randomized in this way (or expect the opponent to best respond to randomizers), it is not optimal to choose the Nash equilibrium strategy.Instead, it is optimal to choose a strategy that closely approximates, respectively, collusion or Walras.Only higher levels of rationality will more closely approximate the Nash equilibrium.We obtain similar results under three alternative definitions of the level-0 player, in the sense that quantities chosen by lower levels are likely to correspond more closely to Walras and collusion than to Nash and that higher levels converge to the Nash equilibrium trajectory.
We then applied these computational results to data from a laboratory experiment, which allowed us to classify participants by the level of reasoning implied by their choices in the first period.For the early part of the experiment, many participants were classified as level-0.However, by the time they had garnered some experience in the game, their implied level of rationality increased.This seems intuitive and in line with the idea that repeated exposure to the same problem increases the quality of participants' responses due to imitation, improved understanding of the game, and possibly updated beliefs about the likely type of the average opponent.
Throughout our analysis we have assumed that level- best responds to level-( − 1), as is the standard approach in the literature.An alternative approach (e.g., the cognitive hierarchy model of Camerer et al. [14]) assumes that higher levels form beliefs about the distribution of lower level types and best respond to the mix of these types.In our setting, this approach would change the trajectory of higher level types, with, for example, level-2 player's trajectory lying somewhere in between the level-2 and level-1 trajectory estimated above.This approach will also not, in general, converge to the Nash equilibrium, provided that even high types assume that there is a nonnegligible fraction of level-0 and level-1 players.
A further extension of our results would allow higher level players to update their beliefs about the distribution of lower level types based on the actions of their opponents.This could then make it optimal for the higher level types to masquerade as a lower level type, in order to induce lower level opponents to adopt a more favorable trajectory.Though a full analysis of this set-up is beyond the scope of the present paper, we consider this a promising extension for future work.
Finally, the results also illustrate the applicability of numerical methods in solving iterated problems such as ours.For continuous cases, analytic methods are able to compute the Nash equilibrium either directly or as the limit of a level- model where  converges to infinity.However, we are not aware of analytical methods that are able to directly compute the strategy for a given finite value of , especially when the set of possible production quantities is discrete, rather than continuous.By contrast, our paper illustrates that numerical solvers are able to provide the strategy for up to any level .Numerical tools seem particularly well-suited in cases where the exact parameters of interest are known, such as the experimental data set analyzed in this paper.

Figure 2 :
Figure 2: The figure plots the average level implied by the quantities chosen by participants in each round in the experiment.Quantities are coded as per Table3, except that the Nash equilibrium quantity is coded as level-5; Collusive and Walras quantities are coded as level-0.

Table 1 :
The first three rows present the Nash, Walras, and collusive quantities for each period.The fourth row gives the average production quantity for a level-0 for each period.The remaining rows give the trajectories for different level- players.

Table 2 :
The first three rows present the Nash, Walras, and collusive quantities for each period as in Table1.The fourth row gives the average production quantity for the alternative definition of level-0 for each period.The remaining rows give the trajectories for different level- players.

Table 3 :
This table shows the number of participants who chose a particular extraction quantity in the first period of, respectively, the first and last (10th) repetition (or round) of the experiment.The corresponding levels follow from the analysis in Section 6.