A Differential Game of Transboundary Pollution Control and Ecological Compensation in a River Basin

-is paper investigates a Stackelberg differential game between an upstream region and a downstream region for transboundary pollution control and ecological compensation (EC) in a river basin. Among them, the downstream region as the leader chooses its abatement investment level and an ecological compensation rate to encourage upstream investing in water pollution control firstly. After then, the upstream region as the follower determines its abatement investment level to maximize welfare. FFurthermore, we take into consideration the effects of efficiency-improving and cost-reducing learning by doing which are originated from abatement investment activity of both regions simultaneously. -e results show the following. (i) -ere is an optimal ecological compensation rate and under which a Pareto improvement result can be obtained. (ii) Carrying out EC will shift some abatement investment from the downstream region into the upstream region. (iii) -e efficiency-improving and cost-reducing learning by doing derived from abatement investment activity of both regions can decrease the optimal ecological compensation rate, increase abatement investment,and improve the social welfare.


Introduction
Nowadays humans in almost all economic activities have been discharging large amounts of pollutants into water bodies. Accompanying water bodies, the unpurified pollutants would flow from upper regions of a basin into downstream [1]. erefore, the pollution problem in a river basin is actually a transboundary pollution problem.
Many scholars have shed light on transboundary pollution control. For example, van der Ploeg and de Zeeuw [2] developed a differential game model to investigate the international transboundary pollution control and the cooperative and non-cooperative Nash equilibrium results in linear Markov perfect strategy. Long [3] devoted the transboundary pollution differential game between two neighboring countries, and an interesting result that the symmetric open-loop Nash equilibrium yields more pollution than in a cooperative solution is found. Dockner and Van Long [4] modeled a simple dynamic game of two neighboring countries. ey showed that when the governments of the two countries are restricted to use linear strategies, non-cooperative behavior may result in overall losses for both countries. Petrosyan and Zaccour [5] discussed the issue of the fair distribution of the total cooperative cost incurred by countries in a cooperative game of transboundary pollution reduction. Jørgensen and Zaccour [6] focused on the issue of cooperative transboundary industrial pollution control and gave a payment distribution mechanism that supports the subgame consistent solution. Bertinelli et al. [7] showed that compared with the open-loop strategy, the feedback strategy of transboundary pollution control may lead to less social waste. Maybe it is the first time to take emission permits trading into the issue of transboundary industrial pollution game by Li [8], which focuses on the noncooperative and cooperative optimal emission paths of two neighboring countries. Chang et al. [9] investigated the optimal pollution abatement strategies of two countries involved in transboundary pollution under cooperative and non-cooperative games. e research mentioned above shows that cooperation is more efficient than non-cooperation in transboundary pollution control. Peng and Zhang [10] and Zeng et al. [11] also believed that the problem of water pollution often involves multiple administrative regions and it is difficult to solve by one local government's unilateral action.
In recent years, as one of the cooperative transboundary pollution control mechanisms, ecological compensation (EC) has emerged in many pollution control practices around the world, and it is widely advocated as a promising regulatory tool. For example, e Chinese government has issued a series of documents to promote EC. In 2005, the state council of the People's Republic of China issued a document about EC. is document points out that the EC mechanisms should be established as soon as possible. Recently, several more specific documents were issued by China's central government, such as the "Guiding Opinions on Establishing and Perfecting the Long-Term Mechanism of Ecological Compensation and Protection in the Yangtze River Economic Zone" (PRC and S.C.o.t., 2018), the "Action Plan for Establishing a Market-Oriented and Diversified Compensation Mechanism for Ecological Protection" (PRC and S.C.o.t., 2019), etc. In accordance with these policies, the EC system has been implemented in some river basins in China, such as Xin An Jiang River, Chishui River, and Tuo Jiang River. Especially, the local governments in many other river basins in China, such as the Yangtze River and Xiang Jiang River, are planning to implement the EC system.
As practice expands, the study on EC has drawn close attention of many scholars [12][13][14][15][16]. However, the research at present on EC mainly focuses on the concept, standard, and evaluation of EC [17]; some important issues such as the interaction between the compensating region and the compensated region and the operation mechanism of EC have not received due attention with a few exceptions. Recently, applying evolutionary game theory, Gao et al. [17] investigated the interaction between the upstream governments, the downstream governments, and the central government in the Eastern Route of South-to-North Water Transfer Project in China. ey showed that the implementation of the EC system in this project depends heavily on the supervision of the central government. Jiang et al. [18] developed a differential game model to study the optimal strategy of the upstream and downstream in a river basin. e authors showed that the EC criterion can incentivize effectively upstream regions to invest more in pollution abatement. Furthermore, Jiang et al. [19] applied a stochastic differential game approach to investigate transboundary pollution control under EC. ree key insights were found; the first one is that compared to the case that the players use the Stackelberg non-cooperative game, the EC mechanism will lead to a higher environmental quality.
In a word, recent research studies, such as Gao et al. [17], Jiang et al. [18], and Jiang et al. [19], have shed light on the interaction between the compensating region and the compensated region. However, in the current study, the issue that both upstream region and downstream region invest for transboundary pollution control under EC still has not been covered and the important effects of learning by doing on abatement investment have not been taken into account.
In fact, this paper can be viewed as an extension of Jiang et al. [19]. However, apart from Jiang et al. [19], there are three distinct differences in our paper: (i) In the model of Jiang et al. [19], only the upstream region in a river basin invests in pollution abatement while the downstream region does not perform abatement investment. Its responsibility is simply to make up for the upstream region's abatement investment. In our paper, both the upstream region and downstream regions in a river basin can invest in pollution abatement. is is because that in most cases, both the upstream region and downstream regions in a river basin have an abatement responsibility although the EC is performed.
(ii) Different from the Jiang et al. [19], in this paper, the leader of the game is the downstream region who determines the ecological compensation rate and her investment rate firstly. en, the follower, e.g., the upstream region, decides abatement investment under the given compensation rate.
Maybe, there are two modes applied in EC practice. e one is as Jiang et al., [19] and the other is as that investigated in this paper. We find that the latter model has already been implemented in some basins in China. For example, in "Measures for the implementation of ecological compensation for water quality in Jinhua river basin" (issued in 2018 by Jinhua city government, Zhejiang province, China), the compensation standard is determined for different water qualities firstly, and then the upstream region in a river basins can choose investment levels to meet different water quality requirements raised by the downstream region. Under this mode, the compensation standard that reflects the willingness of the downstream is given in advance; also, the upstream region has sufficient flexibility to choose its own optimal investment level according to different compensation standards. erefore, the implementation of this model can reflect the will of both regions fully.
(iii) Both the effects of efficiency-improving and costreducing learning by doing in abatement investment of both regions are considered simultaneously. Since the pioneering work of Arrow [20]; the positive effects of cost-reducing learning by doing have been widely studied [8,[21][22][23][24][25].
While the positive effects of efficiency-improving learning by doing on investment have been little studied so far although the effects are universal [26,27]. To our best knowledge, this is the first time that both the effects of efficiency-improving and cost-reducing learning by doing are taken into account simultaneously in pollution control game of a river basin.

Complexity
A differential game method is used to solve the Stackelberg game model, and some important managerial findings have been found. Especially, we show that performing EC can improve both regions' welfare and lead to a Pareto improved result. e paper is organized as follows. In the next section, the basic model will be displayed. e strategies of both the upstream region and the downstream region are achieved in Section 3. Section 4 raises some numerical examples and policy implications. Section 5 concludes the paper.

The Game
Consider a simple but widely used model in which a river basin is divided into upstream region (denoted by 1) and downstream region (denoted by 2). Both regions produce an identical consumed good with an identical and given fixed endowment of production factors and an identical and given technology over continuous time t ∈ (0, ∞). In addition, inevitably, production in both regions involves pollution externalities, and let us use x i (t), (i � 1, 2) to denote the quantity of water pollutants discharged by both regions. Following Moslener and Requate [28,29], Chang and Wang [30], and Li and Guo [31], the revenue function of both regions is given as where i � 1, 2 and A and B are constants. Next, investigate the evolution of the stock of pollutants. Let us assume that s 1 (t) and s 2 (t) are both regions' stock of pollutants at time t, respectively. Without considering the efficiency-improving learning by doing in abatement investment, according to Jiang et al. [18] and Li and Guo [31], the evolving of the negative environmental externality s i (t), (i � 1, 2) of both regions are given, respectively, by where k i (t), i � 1, 2 donates the pollution abatement investment performed by both regions, respectively, and it is assumed that one unit of investment abates one unit of pollution for simplicity . e parameter η > 0 represents the pollutions' decay rate and the parameter ϕ > 0 stands for the transfer coefficient of pollution from upstream region to downstream region. In this paper, we will take into account the efficiencyimproving learning by doing in abatement investment. In the real world, the phenomenon of efficiency-improving learning by doing is ubiquitous. In the study of transboundary pollution control in a river basin, however, it has received little attention so far. To some extent, it can be called "skill comes by exercise" or "practice makes perfect." Grosse et al. [26] argued that performance improvements of individuals, groups, or organizations over time are a result of accumulated experience. erefore, we consider the phenomenon of efficiency-improving learning-by-doing in this paper. Consequently, the dynamic equations of (2) and (3) change into where the term − bA 1 (t) and − bA 2 (t) indicates that the abatement efficiency and the abatement number increase with the accumulated experience A 1 (t) and A 2 (t) obtained from the abatement investment. e parameter b > 0 denotes the marginal contribution of accumulated experience A 1 (t) and A 2 (t) of both regions. Now, we consider the abatement investment cost. Following Lambertini et al. [32] and Martín-Herrán et al. [33], assume the investment cost is increasing and concave. In addition, following Li [8] and Wei et al. [27], we identify that there exists a cost-reducing learning by doing in abatement investment, e.g., the investment cost reduces with the accumulated experience. To sum up, the abatement investment cost of the Region i (i � 1, 2) is given by where the parameter β > 0 denotes the marginal contribution of knowledge accumulation on the decreasing of the cost of abatement investment. Now, we are in the position to inspect the evolving of accumulated experience A i (t) resulted from abatement investment. Following Li and Pan [34], Chang et al. [9], and Wei [27], we assume that there is a following dynamic equation: where the parameter μ > 0 stands for the learning rate of knowledge accumulation and the parameter c > 0 is the decaying memory rate of accumulated experience in abatement investment. Furthermore, following Lambertini et al. [32], Martín-Herrán et al. [33], Li and Guo [31], and Wei et al. [27], we assume a quadratic damage function of emissions as A final but crucial assumption is that, as practiced in many river basin, in order to encourage the upstream region to invest in abatement investment for reducing the transfer of pollution from the upstream region to the downstream region, the downstream region is willing to compensate upstream region for its investment in abatement investment with compensation rate τ.
erefore, the objective of the upstream region is to find the optimal emission levels x 1 (t) and optimal abatement investment k i (t) to maximize following discounted revenue flow over the continuous time t, t ∈ [0, ∞); we describe this issue as Similarly, the optimization problem of the downstream region can be given as e game between the upstream region and the downstream region is a Stackelberg game. Among them, the downstream region as the leader decides the compensation rate τ at first, and then the upstream region as the follower determines its abatement investment level under the given compensation rate as in practice in China. In addition, we assume the strategy that the both regions applied is an openloop strategy. In next section, we will apply the backward induction to solve the optimization problems of the both regions.

Game Equilibrium
3.1. e Upstream Region's Optimal. Now, we consider the upstream region's optimal strategy under the given compensation rate τ which is set by the downstream region, and an open-loop strategy is applied. Under this strategy, the upstream region commits to an abatement strategy which only depends on time t and independently of the state of s 1 (t) and A 1 (t). From (8), we draw up following currentvalue Hamiltonian: where λ 1 (t) and λ 2 (t) are dynamic costate variables measuring the shadow prices of the associated state equations _ s 1 (t) and _ A 1 (t), respectively. Using the first-order conditions, costate conditions, and state equation of current-value Hamiltonian (10), one gets the following dynamic system:

Complexity
In order to obtain the general managerial insight, we devote our mind to the steady-state equilibrium. If there does exists a steady-state equilibrium, let us apply the superscript "∧" to identify the equilibrium results. Solving (11) under state equilibrium conditions, we get , Now, let us use Proposition 1 to analyze the stability properties of the steady-state equilibrium k 1 (τ), x 1 (τ), s 1 (τ), A 1 (τ)}.

Proposition 1.
ere exist admissible parameter constellations such that the steady-state equilibrium k 1 (τ), Proof. See the proof of Proposition 1 in the Appendix.

e Downstream Region's
Optimal. Now, we investigate the downstream region's optimal open-loop strategy. e downstream region control its output x 2 (t), abatement investment k 2 (t), and the compensation rate τ to maximize its benefits. Note that when the downstream region sets the compensation rate, it must take into account the response of the upstream region. We obtain current-value Hamiltonian (13) from (7) e dynamic system (14) is obtained by using the firstorder conditions, costate conditions, and state equation of current-value Hamiltonian (13).

Proposition 2.
ere exist admissible parameter constellations such that the steady-state equilibrium Proof. See the proof of Proposition 2 in the Appendix.
Next, we solve the optimal compensation rate τ. From (7), we get the downstream region's value function (16) under steady-state equilibrium conditions.

Numerical Examples and Policy Implications
In Section 3, we have obtained the optimal strategies of the upstream region and the downstream region of a river basin. We perform some numerical analysis to gain more managerial insights. For this purpose, we assign the values of the basic parameters in Table 1.
Using the results of Section 3 and the basic dataset in Table 1, we obtain Figures 1 and 2. Figure 1 shows the following: (i) ere is an optimal ecological compensation rate.
(ii) e optimal ecological compensation rate decreases with the learning rate of knowledge accumulation resulted from abatement investment, the efficiency parameter of cost-reducing learning by doing, and the efficiency parameter of efficiency-improving 6 Complexity learning by doing.
ese findings mean that the learning by doing in abatement investment has a certain substitute function to EC. e increasing learning rate of knowledge accumulation and improving the work efficiency of accumulated knowledge driving down investment costs and decreasing emissions will decrease the optimal ecological compensation rate. It is because that learning by doing in abatement investment can decrease investment cost and improve investment efficiency, so it can partially replace ecological compensation to incentive the upstream region to increase investment.
(iii) e optimal ecological compensation rate increases with the transfer coefficient of pollution from the upstream region to the downstream region. is is because that if the transfer coefficient of pollution increases, to avoid receiving more pollution, the downstream region is willing to pay more compensation to the upstream region in exchange for the upstream region to reduce emission.
Next, we apply Figure 2 to display the trajectories of social welfare with several key variables under the cases of implementation and do not implement EC, respectively, where NU 11 and NU 21 are the social welfare of the upstream   Figure 1: e paths of the optimal ecological compensation rate τ changed with the learning rate of knowledge accumulation resulted from abatement investment μ, the efficiency parameter of cost-reducing learning by doing β, the efficiency parameter of efficiency-improving learning by doing b and the transfer coefficient of pollution from upstream region to downstream region ϕ, respectively.  Figure 2: e paths of social welfare of both regions changed with several key parameters μ, β, and ϕ and under the optimal ecological compensation rate and the cases of Implementing or not ecological compensation, respectively.

MR MC
A H   Figure 3, the curve MR 1 describes the marginal revenue of investing in abatement investment performed by the upstream region, MR 2 is the marginal revenue of the downstream region generated by the external impact of the investment carried out by the upstream region, and MR denotes the total marginal revenue of both regions. e curve MC 1 is the marginal cost of investment of the upstream region. If the two regions do not cooperate, the upstream region's optimal investment level is k 11 , while if the both regions cooperate, the upstream region's optimal investment level should be k 12 . From Figure 3, one can see that when the investment level shifts from the k 11 to k 12 , the downstream region's profit increases by area S ABGH , while the Gains in upstream region decreased by S C DF , and the total cooperative surplus can be expressed by S DE F . erefore, the cooperation mechanism can be realized through the following EC methods: the downstream region compensates the upstream region X, and S C DF < X < S ABGH . By which, the upstream region will shift its investment from k 11 to k 12 for higher benefit because through this shift, its benefits increase by X − S C DF > 0. In addition, the EC benefit that the downstream can share is 0 < S ABGH − X < S DE F . erefore, the EC can result in a Pareto improvement. (ii) Learning by doing (including both efficiency-improving and cost-reducing learning by doing) can promote social welfare substantially because with the increase of efficiency parameters of the efficiency-improving and cost-reducing learning by doing, the social welfare increases, respectively. (iii) e social welfare of the upstream region increases while the downstream region's is decreasing with the transfer coefficient of pollution from upstream region to downstream region ϕ under both cases of implementing and not implementing ecological compensation.
Next, we use Figure 4 to show the path of abatement investment. From it, one can see the following: (i) Incontrast to the situation of not implementing EC, the implementation of the EC makes the upstream region invest more in pollution abatement; the downstream region, however, will invest less. It is because the pollutants discharged by the upstream region affect both the upstream and the downstream region, while the pollutants discharged by the downstream region damage the downstream region only. erefore, through EC to shift some of the investment in emission reduction from downstream region to upstream region will increase the welfare level of both regions and the whole society. reason is that under learning by doing it is not only cheaper but more efficient to invest, so it is advantageous for both regions to invest more. (iii) When the transfer coefficient of pollution increases, the abatement investment of the upstream region decreases while the downstream region's is increasing. If the EC is carried out, the upstream region will be motivated by it to invest more in abatement investment while the downstream region's investment decreases because it is willing to shift some of its investment to upstream region for greater social welfare.

Conclusions
e main purpose of this paper is tantamount to investigate the abatement investment and ecological compensation strategies of both upstream and downstream regions in a river basin. To achieve these goals, a Stackelberg differential game model between the both regions is developed and some significant results are found: (i) ere is an optimal ecological compensation rate.
Under this compensation rate, both upstream and downstream regions in a river basin can improve their welfare level. us, Pareto improved results are achieved. (ii) contrast to the situation of not implementing ecological compensation, the implementation of ecological compensation makes upstream region to increase investment in abatement investment; downstream region, however, is investing less in abatement investment.
(iii) e efficiency-improving and cost-reducing learning by doing in abatement investment of both regions can decrease the optimal ecological compensation rate, increase abatement investment, and improve the social welfare. Compared to the case that the both regions apply noncooperative strategy, the efficiency-improving and cost-reducing learning by doing can lead to more increases in both regions' social welfare and the upstream regions' investment, while less increases in downstream regions' investment, if the EC is performed. (iv) When the transfer coefficient of pollution increases, the abatement investment of the upstream region decreases while the downstream region's abatement investment increases. In addition, carrying out ecological compensation will shift some abatement investment from downstream region into upstream region.
e present paper has discussed the optimal ecological compensation mechanism in a non-cooperative transboundary pollution control in a river basin. Some meaningful results are obtained; meanwhile, we note that there is a valuable issue to further investigate. is paper only considers one pollution. However, the reality says there may be more than one pollutant emitted in production. erefore, such problems as what compensation mechanism should be taken to settle the transboundary pollution with multiple pollutants should be a future research direction.
Using the parameters in Table 1, we obtain From |ΦE − J 11 | � 0, we get a negative characteristic root: Φ 1 � − 0.1721. So, the steady-state points are the saddle point equilibrium under the given set of parameters.
en, performing some sensitivity analysis by changing the value of a parameter by +10% while leaving others unchanged in turn, we find the steady-state points are the saddle point equilibrium all the way. erefore, the steady-state points are the saddle point equilibrium near the given set of parameters.
Finishing the proof. Proof of Proposition 2. e proof is similar to proof of Proposition 1. We leave it out.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.