This paper analyzes a dynamic Stackelberg differential game model of watershed transboundary water pollution abatement and discusses the optimal decision-making problem under non-cooperative and cooperative differential game, in which the accumulation effect and depreciation effect of learning-by-doing pollution abatement investment are taken into account. We use dynamic optimization theory to solve the equilibrium solution of models. Through numerical simulation analysis, the path simulation and analysis of the optimal trajectory curves of each variable under finite-planning horizon and long-term steady state were carried out. Under the finite-planning horizon, the longer the planning period is, the lower the optimal emission rate is in equilibrium. The long-term steady-state game under cooperative decision can effectively reduce the amount of pollution emission. The investment intensity of pollution abatement in the implementation of non-cooperative game is higher than that of cooperative game. Under the long-term steady state, the pollution abatement investment trajectory of the cooperative game is relatively stable and there is no obvious crowding out effect. Investment continues to rise, and the optimal equilibrium level at steady state is higher than that under non-cooperative decision making. The level of decline in pollution stock under finite-planning horizon is not significant. Under the condition of long-term steady state, the trajectories of upstream and downstream pollution in the non-cooperative model and cooperative model are similar, but cooperative decision-making model is superior to the non-cooperative model in terms of the period of stabilization and steady state.
National Social Science Foundation of China18ZDA040Ministry of Education of the People's Republic of China17JJD790017Evaluation Commission of Social Science Achievements of Hunan Province of ChinaXSP20ZDA0071. Introduction
In recent years, more and more experts and scholars use dynamic optimal control theory to study various complex pollution control problems. The differential game method based on dynamic optimal control theory provides an effective research tool for the treatment of transboundary pollution abatement. And it can analyze the interaction between the strategies of the participants and the dynamic change trajectory of pollution stock. Kilgour et al. [1] first used game theory to analyze the total amount control of pollution in river basins, taking total amount control of COD (chemical oxygen demand) as an example. Stimming [2] studied a differential game problem of participating enterprises under two kinds of environmental policies: pollution tax and emission permit. It was concluded that the total investment and emission amount under feedback strategy decision making were higher than that in open-loop strategy. List and Mason [3] applied differential game theory to analyze the optimal institutional arrangement of transboundary pollution abatement under suboptimal conditions and established a differential game model under two systems of cooperative and non-cooperative decision making, respectively. The results showed that if the emission yields of the two regions are asymmetric and the initial emission level is low, non-cooperative decision making will be better than cooperative from the perspective of total income. Breton et al. [4] used a two-party finite-time differential game model to analyze the cooperative implementation of environmental projects. The key assumption of the model is that pollution abatement investments in one country can reduce the stock of pollution in other countries, and the cost of pollution damage is linear. Löfgren et al. [5] analyzed the impact of environmental pollution taxes with future uncertainties on manufacturers' pollution control investment decisions. Yeung [6] assumed that different governments adopt cooperative and non-cooperative decision making to control environmental pollution and implement the policy of levying pollution tax, with the ultimate goal of maximizing social welfare. The social production department decides the output of the enterprise on its own, aiming at maximizing its own profits. The cooperative differential game of transboundary pollution is analyzed and discussed. In addition, a dynamic consistency of cooperative strategy is further studied, and the stochastic differential game analysis of this problem is extended. Antelo and Loureiro [7] established a three-stage game model of oligopoly competition pollution abatement investment and found that manufacturers will make different pollution investment decisions for the symmetry of pollution control technology information. In [8], a cooperative stochastic differential game of transboundary industrial pollution is presented, and a payment distribution mechanism is derived to maintain the subgame consistency. Additionally, there are several published studies of transboundary pollution problems from other views, such as renewable resource, clean technologies, harmonization of international and domestic law, abatement cost, R&D spillovers, and so on (for instance, [9–12]). These papers [8, 13, 14] took emission permit trading into account to study the problem of transboundary pollution. It has been shown in this literature that coordination of countries’ emission strategies leads to a lower total level of pollution and to a higher total welfare than when countries use non-cooperative emission strategies. Chang et al. [15] obtained optimal emission levels and abatement expenditures in a finite-horizon transboundary pollution game with emission trading between two regions. And the paper found that cooperation between the regions leads to increased abatement and lower emissions, resulting in a lower pollution stock. In another paper [16], Chang et al. presented a stochastic differential game to study this kind of problem. More generally, the process of emission permit price is assumed to be stochastic and to follow a geometric Brownian motion (GBM). All the results demonstrate that the stochastic emission permit prices can motivate the players to make more flexible strategic decisions in the games.
The results of theoretical research and empirical analysis have shown that the technology acquired through learning by doing may not last indefinitely, but may depreciate. And learning-by-doing “forgetting” or “depreciation” is of great significance to production planning and scheduling [17–21]. When using economics to analyze environmental problems, scholars usually assume that technology is constant or that technological progress is considered exogenous. Unlike traditional analysis, technological advancement in real production is an endogenous and dynamic process, which is influenced by factors such as regional government policies and knowledge acquisition. Jaffe et al. [22] argued that the relationship between technological change and the environment has an important impact on environmental problems. The positive interaction between technology accumulation and good environmental pollution abatement policies should be better understood. Kline and Rosenberg [23] studied a variety of technology industries and found that in some cases, learning by doing in production process contributed more to technological progress than the original development itself. Bramoullé and Olson [24] argued that the government's Pigouvian tax, an environmental regulation policy, could put pollution abatement on the right technological track. By using the effect of learning by doing over time on the distribution of pollution abatement among heterogeneous technologies, the best conditions for sharing all technologies could be determined. At the same time, it is found that the more mature pollution abatement technology is more inclined to adopt infant technology.
It is learning that accumulates pollution abatement experience and thus reduces pollution control investment costs. Due to the accumulation of pollution control experience, the cost of governance should be reduced, which is measured by cumulative governance. In fact, “learning by doing” from the experience of pollution control can promote the transformation of governance processes and technologies. These changes can reduce costs. However, participants’ net present value income includes not only production income and emission trading income but also pollution abatement costs. Some scholars use the value function of production to measure the learning value of learning by doing. It has been widely used in some operation research literature, such as [25–27]. Learning by doing is an important source of technological progress. Similarly, the technology of pollution abatement investment needs to be developed continuously in practice. In recent years, many authors have discussed the technology of pollution abatement investment [25] and the technology of investment accumulation [28]. Some studies have mentioned the environmental policy and abatement cost in the presence of learning by doing [24, 25, 29].
However, it seems that the existing literature rarely deals with the dynamic effects of accumulation and depreciation of learning by doing to analyze the changing trajectory of investment in transboundary pollution abatement. Furthermore, our work is in the spirit of the study by Li and Pan [25] and Zhong and Zhang [30]—the first time to investigate the effect of experience which is measured by the cumulative abatement investment from time 0 to t. In recent works [31–35], the investment cost reduces with accumulative experience. In this paper, it is assumed that the accumulation of pollution abatement investment increases at a constant rate. As knowledge is forgotten, depreciated, or replaced by new one, knowledge will depreciate at a certain rate. Similarly, there is a process of “learning by doing” that has been forgotten or replaced by new ones in actual production and application of pollution abatement technologies. In order to explore the strength of this forgotten or substitution effect of polluting enterprises in the process of pollution abatement investment and its impact on the instantaneous emission rate and pollution stock, this article builds models, focuses on solving the optimal solution of related variables, and analyzes it. Our main purpose is to study the pollution abatement investment decision under the accumulation and depreciation of knowledge. Therefore, we use a mature dynamic differential game model to simplify and more intuitively analyze the changes of various variables by numerical simulation.
The rest of this paper is organized as follows. In Section 2, we will establish our basic dynamic general equilibrium model. In addition, the optimal Nash equilibrium solution of instantaneous emission rate, investment intensity of pollution abatement, and the pollution stock in upstream and downstream regions under non-cooperative game and cooperative game are presented in Sections 3 and 4, respectively. In Section 5, the path simulation and analysis of the optimal trajectory curves of each variable under finite-planning horizon and long-term steady state equilibrium were carried out by numerical simulation. Some discussions and further analysis are provided in Section 6. Finally, Section 7 concludes the paper.
2. The Basic Model
There are two adjacent regions (n=1,2) in a river basin, which we call upstream region 1 and downstream region 2, in our transboundary pollution model. It is assumed that both regions discharge organic pollutants into the river basin. For region n (n=1,2), production always leads to a quantity of by-products, namely, emissions Ent (n=1,2). We assume that Qnt is the industrial output of upstream and downstream region, which indicates that at time t, the industrial production of region 1 and region 2 is Q1t and Q2t. The instantaneous emissions of pollutants from industrial output are Ent (n=1,2), that is, at timet, the pollution emissions of region 1 and region 2 are E1t and E2t, respectively. It is assumed that pollution emissions in various regions are positively related to industrial production. And the instantaneous linear production function can be expressed as(1)Qnt=QnEnt.
According to literature [3, 4, 36], it is assumed that the regional industrial income function is RnQnt, which can be expressed by the following quadratic functional form in terms of emissions:(2)RnEnt=EntAn−12Ent,where Ann=1,2 is a positive constant. Thus, the profit function RnEnt is expressed as a quadratic concave function of Ent.
Environmental pollution damage cost is a linear function of pollution stock Pnt; following [4], the cost function is Dnt=DnPntDn≥0. Among them, Dn indicates the degree of environmental damage per unit pollution stock to regionn.
The investment intensity function of pollution abatement is Intn=1,2. It is known that pollution abatement can be realized only when technique and labor are invested. So, we should face the abatement cost which could decrease the net revenue. Following [3], we assume that upstream investment abatement cost can be described by following the quadratic form:(3)C11I11=12μ1I112t,0<μ1<1.
Equation (3) means that the marginal cost is increasing with respect to the level of pollution abatement. So, the investment cost of water pollution abatement in downstream region 2 and the investment cost of water pollution control in upstream region 1 can be expressed as follows:(4)C22I22=12μ2I222t,0<μ2<1,(5)C21I21=12μ3I22t+I21t2−I222t=μ3I22tI21t+12μ3I212t,0<μ3<μ2<1,where I22t indicates the investment intensity of downstream region 2 in local area and I21t indicates the investment intensity of downstream in upstream region 1. This functional form captures the idea that the cost of region 2 to region 1 depends on what the downstream is investing because investors will only acquire “learning by doing” for upstream regions after they have collected downstream investment experience in region 2.
By means of [24, 35], the experience of applying pollution abatement technology Gnnt is measured by the cumulative abatement from time 0 to t, that is,(6)Gnnt=Gnn0+gnn∫0TInnsds,Gnn0=Gnn0,where Gnn0n=1,2 denotes the initial experience level of applying pollution abatement technology. Similar to the above, gnnn=1,2 is a positive parameter and it represents the differences between the two regions’ ability in accumulating experience. According to the learning-by-doing theory, the amount of cumulative experience will lead to a decline in the unit cost.
With the rapid development of society and technology, the accumulation of pollution abatement investment increases at a constant rate. And the amount of cumulative experience will lead to a decline in the unit cost. As knowledge is forgotten, depreciated, or replaced by new one, knowledge will depreciate at a certain rate. According to Jorgenson [37] and Griliches [38], a proportional or geometric depreciation rule seems to be a good choice to represent the depreciation of aggregate stock of knowledge.
To simplify, we apply the proportional and linear approach using znnn=1,2 to represent depreciation rate of the aggregate stock of knowledge evolving over time. Correspondingly, the learning-by-doing function has also changed because of taking knowledge depreciation into consideration:(7)G11t=G10+g11∫0TI11sds−z11∫0TG11sds,G21t=G20+g21∫0TI21sds−z21∫0TG21sds,G22t=G20+g22∫0TI22sds−z22∫0TG22sds.
Here, we define parameters g11,g21,g22>0 as rate of knowledge accumulation under investment in pollution abatement technology and parameters z11,z21,z22>0 as depreciation rate of learning by doing. The investment intensity of pollution abatement in upstream region is I11t.
In this model, the stocks of pollution in the upstream and downstream regions of the basin are P1t and P2t at timet. And the pollution stock in the two basins can be expressed by the following two differential equations:(8)P1t•=E1t−I11t−I21t−θ1+dP1t,P10=P10,P1t≥0,P2t•=E2t−I22t−θ2P2t+dP1t,P20=P20,P2t≥0,where θ1P1t and θ2P2t represent the stocks of self-purification pollution. As we all know, with the change of time and temperature, the water in nature has certain self-purification ability. Without the loss of generality, it is assumed that the basin has the purification rate of water pollution, namely, the coefficient of self-purification ability of water θnn=1,2,θ1,θ2≥0. dP1t is the number of pollution transferred from upstream to downstream. d is assumed to be the transfer coefficient, and 0≤d≤1.
The number of initial emission permits in region 1 is E10 and region 2 is E20. It is assumed that the emission trading market is a fully competitive market and the price of emission trading right is constant at w. If the amount of emission exceeds the initial allocation, the emission right can be purchased in the permit market. On the contrary, if there is a surplus of the emission right, it can be reserved for the next year or sold in the emission permit trading market.
Given the above assumptions, we can get the concrete expression of the income function Wnt of the two regions:(9)W1t=E1tA1−12E1t−wE1t−I11t−I21t−E10−12μ1I112t+ε11G11t−G10,W2t=E2tA2−12E2t−wE2t−I22t−E20−12μ2I222t−12μ3I212t−μ3I21tI22t+ε21G21t−G20+ε22G22t−G20.
We define ε11,ε21,ε22>0 as the conversion rates of learning by doing. Then, ε11G11t−G10, ε21G21t−G20, and ε22G22t−G20 represent the cost savings brought by learning by doing, and it can also be considered as the income from learning-by-doing conversion.
3. Non-Cooperative Game
The amount of pollution abatement investment provided by the downstream region in the same basin will affect the investment enthusiasm of the upstream region in pollution control and the amount of pollutant discharge in the region. In turn, the amount of pollutant discharge in the upstream region will affect the amount of pollutant discharge in the downstream region, which constitutes a dynamic game relationship between the two sides. Considering the decision making in continuous time, this constitutes a dynamic differential game relationship.
Under this model, both regions aim to maximize the net present value of their own long-term income. The pollution discharge in the upstream region will affect the income of the downstream region by affecting the pollution stock in the same water basin. The decision-making problem of independent discharge from the two regions constitutes a differential game problem with E1t, E2t, I11t, I21t, and I22t as control variables and P1t, P2t, G11t, G21t, and G22t as state variables, aiming at maximizing the net present value of their respective income.
The current goal of region 1 is to maximize the expected present flow of instantaneous net revenue in terms of the emission path and the abatement level. We describe this issue as(10)J1Nt=maxE1,I21,I11≥0∫0Te−rtA1E1t−12E12t−wE1t−I11t−I21t−E10−12μ1I112t+ε11G11t−G10−D1P1tdt,(11)s.t. P1t•=E1t−I11t−I21t−θ1+dP1t,P10=P10,P1t≥0,G11t•=g11I11t−z11G11t,G110=G10,G11t≥0.
Likewise, the current goal of downstream region 2 is to maximize the expected present flow of instantaneous net revenue in terms of the emission path and the abatement level. Similarly, the optimization problem of the downstream region can be given as(12)J2Nt=maxE2,I21,I22>0∫0Te−rtA2E2t−12E22t−wE2t−I22t−E20−12μ2I222t−12μ3I212t−μ3I21tI22t+ε21G21t−G20+ε22G22t−G20−D2P2tdt,(13)s.t. P1t•=E1t−I11t−I21t−θ1+dP1t,P10=P10,P1t≥0,P2t•=E2t−I22t−θ2P2t+dP1t,P20=P20,P2t≥0,G21t•=g21I11t−z21G21t,G210=G20,G21t≥0,G22t•=g22I22t−z22G22t,G220=G20,G22t≥0.
The current value Hamilton function of equation (10) is(14)H1Nt=A1E1t−12E12t−wE1t−I11t−I21t−E10−12μ1I112t+ε11G11t−G10−D1P1t+λ1NE1t−I11t−I21t−θ1+dP1t+λ2Ng11I11t−z11G11t.
The current value Hamilton function of equation (12) is(15)H2Nt=A2E2t−12E22t−wE2t−I22t−E20−12μ2I222t−12μ3I212t−μ3I21tI22t+ε21G21t−G20+ε22G22t−G20−D2P2t+λ3NE2t−I22t−θ2P2t+dP1t+λ4NE1t−I11t−I21t−θ1+dP1t+λ5Ng21I21t−z21G21t+λ6Ng22I22t−z22G22t,where λnNn=1,2,…,6 are the dynamic adjoint variables associated with the state equation about Pnt. Here, the dual variables λnNn=1,2,…,6, also called shadow prices or common state variables, are Lagrange multipliers, which are the derivatives of the two players’ value functions, i.e., revenues, with respect to the pollution stock Pnt.
To maximize (14) and (15), the first-order conditions of current Hamiltonian function are the following:(16)∂H1Nt∂E1t=A1−w−E1t+λ1Nt=0,(17)∂H1Nt∂I11t=w−μ1I11t+λ2Ntg11−λ1Nt=0,(18)∂H2Nt∂E2t=A2−w−E2t+λ3Nt=0,(19)∂H2Nt∂I22t=w−μ2I22t−μ3I21t+λ6Ntg22−λ3Nt=0,(20)∂H1Nt∂I21t=−μ3I22t−μ3I21t+λ5Ntg21−λ4Nt=0.
The current value costate equations are(21)λ1Nt•=rλ1Nt−∂H1Nt∂P1t=r+θ1+dλ1Nt+D1,(22)λ2Nt•=rλ2Nt−∂H1Nt∂G11t=r+z11λ2Nt−ε11,(23)λ3Nt•=rλ3Nt−∂H2Nt∂P2t=r+θ2λ3Nt+D2,(24)λ4N•t=rλ4Nt−∂H2Nt∂P1t=r+θ1+dλ4Nt−dλ3Nt,(25)λ5Nt•=rλ5Nt−∂H2Nt∂G21t=r+z21λ5Nt−ε21,(26)λ6Nt•=rλ6Nt−∂H2Nt∂G22t=r+z22λ6Nt−ε22,where transversality conditions are λnNT=0,n=1,2,…,6.
Solving equations (16)–(20), we have(27)E1Nt=A1−w+λ1Nt,(28)E2Nt=A2−w+λ3Nt.
According to the actual situation, here we focus on the analysis of E1Nt>0 and E2Nt>0, namely, 0<w<minA1+λ1Nt,A2+λ3Nt.(29)I11Nt=w−λ1Nt+λ2Ntg11μ1,(30)I22Nt=w−λ3Nt+λ4Nt−λ5Ntg21+λ6Ntg22μ2−μ3,(31)I21Nt=μ3λ3Nt−w−μ2λ4Nt+μ2λ5Ntg21−μ3λ6Ntg22μ3μ2−μ3.
In the long run, T⟶∞, limT⟶∞InNt,n=11,21,22 tends to a steady state. We apply the superscript “∧” to identify the non-cooperative equilibrium results. Equations (21)–(26) are standard first-order differential equations. Substituting into equations (29)–(31) and solving equations under state equilibrium conditions, we get(32)I^11Nt=1μ1w+g11ε11r+z11+D1r+θ1+d,(33)I^22Nt=1μ2−μ3w−D2dr+θ2r+θ1+d+D2r+θ2−g21ε21r+z21+g22ε22r+z22,(34)I^21Nt=1μ2μ2−μ3−μ3w−μ3D2r+θ2+μ2D2dr+θ2r+θ1+d+μ2g21ε21r+z21−μ3g22ε22r+z22.
Further, substituting equations (32)–(34) into equation (8), collating, and solving, we obtain the optimal trajectory of pollution stock in the upstream and downstream regions under the non-cooperative game as follows:(35)P1Nt=P10−E1Nt−I11Nt−I21Ntθ1+de−θ1+dt+E1Nt−I11Nt−I21Ntθ1+d,(36)P2Nt=P20−M1θ2−M2θ2−θ1−de−θ2t+M1θ2+M2θ2−θ1−de−θ1+dt,where M1=E2Nt−I22Nt+E1Nt−I11Nt−I21Nt/θ1+dd and M2=P10−E1Nt−I11Nt−I21Nt/θ1+dd.
4. Cooperative Game
Now, consider another model, assuming that the upstream and downstream regions of the basin reach some agreement, set up a joint decision-making department or unified decision making by a higher management department, and jointly coordinate the pollution discharge strategies of the two regions with the goal of maximizing total net present value of long-term income. Then, the decision-making problem of combined pollution discharge in upstream and downstream regions constitutes a differential game problem, which takes instantaneous emission discharge (emission discharge rate) as control variable and pollution stock in water area, investment in pollution abatement, and learning by doing as state variables, and the total net present value of the whole benefit of the basin is maximized. We describe this issue as(37)JCt=maxE1,E2,I11,I21,I22>0∫0Te−rtA1E1t+A2E2t−12E12t+E22t−wE1t−I11t−I21t−E10−wE2t−I22t−E20−12μ1I11t+I21t2−12μ2I222t+ε11G11t−G10+ε21G21t−G20+ε22G22t−G20−D1P1t−D2P2tdt,(38)s.t. P1t•=E1t−I11t−I21t−θ1+dP1t,P10=P10,P1t≥0,P2t•=E2t−I22t−θ2P2t+dP1t,P20=P20,P2t≥0,G11t•=g11I11t−z11G11t,G110=G10,G11t≥0,G21t•=g21I11t−z21G21t,G210=G20,G21t≥0,G22t•=g22I22t−z22G22t,G220=G20,G22t≥0.
The current value Hamilton function of equation (37) is(39)HCt=A1E1t+A2E2t−12E12t+E22t−wE1t−I11t−I21t−E10−wE2t−I22t−E20−12μ1I11t+I21t2−12μ2I222t+ε11G11t−G10+ε21G21t−G20+ε22G22t−G20−D1P1t−D2P2t+λ1CE1t−I11t−I21t−θ1+dP1t+λ2CE2t−I22t−θ2P2t+dP1t+λ3Cg11I11t−z11G11t+λ4Cg21I21t−z21G21t+λ5Cg22I22t−z22G22t,where λnCn=1,2,…,5 are the dynamic adjoint variables associated with the state equation about Pnt. Here, the dual variables λnCn=1,2,…,5, also called shadow prices or common state variables, are Lagrange multipliers, which are the derivatives of the two players’ value functions, i.e., revenues, with respect to the pollution stock Pnt.
To maximize (39), the first-order conditions of current Hamiltonian function are the following:(40)∂HCt∂E1t=A1−w−E1t+λ1Ct=0,(41)∂HCt∂E2t=A2−w−E2t+λ2Ct=0,(42)∂HCt∂I11t=w−μ1I11t+I21t+λ3Ctg11−λ1Ct=0,(43)∂HCt∂I22t=w−μ2I22t+λ5Ctg22−λ2Ct=0,(44)∂HCt∂I21t=w−μ1I11t+I21t+λ4Ctg21−λ1Ct=0.
The current value costate equations are(45)λ1Ct•=rλ1Ct−∂HCt∂P1t=r+θ1+dλ1Ct+D1−dλ2Ct,(46)λ2Ct•=rλ2Ct−∂HCt∂P2t=r+θ2λ2Ct+D2,(47)λ3Ct•=rλ3Ct−∂HCt∂G11t=r+z11λ3Ct−ε11,(48)λ4Ct•=rλ4Ct−∂HCt∂G21t=r+z21λ4Ct−ε21,(49)λ5Ct•=rλ5Ct−∂HCt∂G22t=r+z22λ5Ct−ε22,where transversality conditions are λnCT=0,n=1,2,…,5.
Solving equations (40)–(44), we have(50)E1Ct=A1−w+λ1Ct,(51)E2Ct=A2−w+λ2Ct,(52)I11Ct+I21Ct=w−λ1Ct+λ3Ctg11μ1,(53)I22Ct=w−λ2Ct+λ5Ctg22μ2.
Similarly, in the long run, T⟶∞, limT⟶∞InCt,n=11,21,22 tends to a steady state. We apply the superscript “∧” to identify the cooperative equilibrium results. Equations (45)–(49) are standard first-order differential equations; solving the above equations and then substituting the results into equations (52) and (53), we get(54)I^11Ct+I^21Ct=1μ1w+D1r+θ1+d+D2dr+θ2r+θ1+d+g11ε11r+z11,(55)I^22Ct=1μ2w+D2r+θ2+ε22g22r+z22.
Further, substituting equations (54) and (55) into equation (8), collating, and solving, we obtain the optimal trajectory of pollution stock in the upstream and downstream regions under the cooperative game as follows:(56)P1Ct=P10−E1Ct−I11Ct−I21Ctθ1+de−θ1+dt+E1Ct−I11Ct−I21Ctθ1+d,(57)P2Ct=P20−K1θ2−K2θ2−θ1−de−θ2t+K1θ2+K2θ2−θ1−de−θ1+dt,where(58)K1=E2Ct−I22Ct+E1Ct−I11Ct−I21Ctθ1+dd,K2=P10−E1Ct−I11Ct−I21Ctθ1+dd.
5. Numerical Simulation
Through the analysis mentioned above, we have obtained the optimal emission, pollution abatement investment, and pollution stock under non-cooperative game and cooperative game. In this section, we will analyze their trajectories of each variable and find the difference between the non-cooperative model and cooperative model and simulate the optimal emission decision making and investment level of pollution abatement. The parameters used in the numerical examples are presented in Table 1, and we use the version 7.0 of Wolfram Mathematical Matlab to obtain the numerical solutions. The parameters are as follows [39].
The parameters used in the numerical example.
A1
A2
w
r
D1
D2
E10
E20
8.5
7.65
0.2
0.08
0.2
0.24
5
6
P10
P20
θ1
θ2
d
μ1
μ2
μ3
100
110
0.1
0.11
0.5
0.4
0.5
0.3
5.1. Comparative Analysis of Optimal Emission Level
Drawing on [15], in this section, we simulate the dynamic trajectory of optimal emission in the basin at a finite-planning horizon, i.e., T = 10, as shown in Figures 1–3.
Optimal trajectory of pollution emission under non-cooperation when T = 10.
Optimal trajectory of pollution emission under cooperation when T = 10.
Optimal trajectory of pollution emission under cooperation and non-cooperation when the planning period T = 10.
Figure 1 simulates the dynamic trajectory of upstream and downstream pollution emissions in a non-cooperative decision-making model. Figure 2 shows the dynamic trajectory of pollution emissions when the upstream and downstream regions adopt a cooperative decision-making model. From the figures, we can easily find out that, excluding other interference factors, at a finite-planning horizon, the pollution emission level in the upstream of the basin is higher than that in the downstream, whether in the non-cooperative decision-making model or cooperative decision-making model. Assuming that the planning period is 10, game participants adjust their decision making with the change of time. In the initial stage, due to the pressure of environmental regulation, the pollution level of the upstream and downstream regions of the basin is relatively low. With the continuous promotion of a limited number of repeated games, the participants in the game gradually adopt a non-cooperative Nash equilibrium strategy to increase the level of pollution emissions. Figure 3 simulates the dynamic trajectory of pollution emission under the cooperative and non-cooperative decision-making model. The curve trajectory shows an upward trend and tends to be the same at the planning period T = 10. It shows that under the finite-planning horizon, whether the cooperative decision or the non-cooperative decision is adopted at the beginning stage, the final game result is the non-cooperative Nash equilibrium. This also verifies the previous conclusions.
When T⟶∞, we call it long-term steady state. In a differential game, the information that participants have at the beginning stage is similar to the finite-planning horizon, and they only know the initial state of the system. At different time t, participants in the game take different decisions. Therefore, the optimal strategy is a time-dependent dynamic strategy. Next, we simulate the dynamic trajectory of optimal emission in the basin at a long-term steady state as shown in Figures 4–6. Under the assumption of long-term steady state, Figure 4 simulates the dynamic trajectory of pollutant emission in the upstream and downstream regions of the basin in a non-cooperative game model. Figure 5 depicts the dynamic trajectory of pollution emissions in the cooperative game model.
Optimal trajectory of pollution emission under non-cooperation when T⟶∞.
Optimal trajectory of pollution emission under cooperation when T⟶∞.
Optimal trajectory of pollution emission under cooperation and non-cooperation when the planning period T⟶∞.
Different from the finite-planning horizon conditions, the pollution emission decision of the game participants under the long-term steady state gradually stabilizes with the passage of time, and the emission level or emission amount tends to be stable. Compared with the initial stage, the total emission has a downward trend. Figure 6 summarizes the pollution emission level under the cooperative and non-cooperative decision-making models. It can be seen that with the passage of time, the pollution emission level during the cooperative game shows a significant downward trend, and the cooperative emission in the stable state is lower than that in the non-cooperative decision-making model.
5.2. Comparative Analysis of Optimal Pollution Abatement Investment
Similar to the comparative analysis of the optimal pollution emission level, we simulate the dynamic trajectory of the optimal pollution abatement investment under the condition of finite-planning horizon.
Figures 7 and 8 show the dynamic trajectory of optimal pollution abatement investment in non-cooperative and cooperative decision-making models, respectively. In Figure 7, the investment intensity of downstream pollution abatement in the non-cooperative model is higher than that in upstream region. The investment level of upstream and downstream increases at first and then decreases, which shows that the decision game of non-cooperative investment at finite-planning horizon is unstable, and the later stage of limited repeated game tends to non-cooperative Nash equilibrium. Even under the upstream and downstream cooperative investment decision model, the upstream and downstream pollution abatement investment tends to non-cooperative Nash equilibrium, and the downward trend of investment level is more obvious. This can be seen from Figure 8. However, as far as the overall situation of the two investment decision-making methods is concerned, Figure 9 shows that the pollution abatement investment intensity in the non-cooperative decision-making model is significantly higher than that in the cooperative decision-making model, and the change of investment intensity is unstable.
Optimal trajectory of pollution abatement investment under non-cooperation when T=10.
Optimal trajectory of pollution abatement investment under cooperation when T=10.
Optimal trajectory of pollution abatement investment under cooperation and non-cooperation when the planning period T=10.
From the perspective of “learning by doing,” the accumulation effect of knowledge plays a positive role in most of the early stage of the non-cooperative investment decision-making model, and it has a prominent depreciation effect at the end of the planning period. This can be seen from the trend of the curve trace of Figure 7. Similarly, it can be seen from Figure 8 that the depreciation effect has a stronger inhibition effect on pollution abatement investment than the accumulation effect in the cooperative investment decision-making model. The change process and degree of pollution abatement investment accumulation and depreciation effect under the two investment decision-making models can be reflected in Figure 9. Furthermore, we can also find that the investment level of non-cooperative pollution abatement under finite-planning horizon conditions is significantly higher than that of cooperative pollution abatement, that is, the accumulation effect of learning by doing is more obvious than the depreciation effect.
Different from the finite-planning horizon conditions, the pollution abatement investment trajectory under the long-term steady state will show a more stable trend with the passage of time, that is, it tends to a stable value. As shown in Figure 10, in the short term, the pollution abatement investment under the non-cooperative decision-making model will show an increasing trend. After increasing to a certain level, there will be a sharp downward trend and it will finally tend to be stable. However, the pollution abatement investment trajectory of the cooperative decision-making model shows a big difference. The change of investment is a smooth process that continues to grow until it reaches a stable value (Figure 11). Figure 12 shows the significant difference of investment trajectory between the two decision models. It is worth noting that under the condition of long-term steady state, the stable level of cooperative investment is higher than that of non-cooperation, that is, the accumulation effect of pollution abatement investment is stronger than depreciation effect.
Optimal trajectory of pollution abatement investment under non-cooperation when the planning period T⟶∞.
Optimal trajectory of pollution abatement investment under cooperation when the planning period T⟶∞.
Optimal trajectory of pollution abatement investment under cooperation and non-cooperation when the planning period T⟶∞.
5.3. Comparative Analysis of Optimal Pollution Stock
In order to study the change of pollution stock in the upstream and downstream of the basin and to understand the movement track of the stock level more clearly, we simulate the optimal pollution stock trajectory curves under non-cooperative and cooperative decision making, respectively, as shown in Figures 13 and 14. The change in the level of pollution stock under the two decision-making models is relatively similar. Under the finite-planning horizon, when the non-cooperative decision-making model (Figure 13) is adopted, the pollution stock in the upstream region has a significant decrease but rebounds at the end of the planning period; the pollutants in the downstream regions increase at the beginning of the plan and decrease in the later stage. In the cooperative model (Figure 14), the pollution stock in the upstream also has a significant decrease and tends to be stable in the later stage, while the pollution stock in the downstream shows an upward trend, that is, the amount of pollutants is increased. It shows that the regional pollution abatement decision under the finite-planning horizon has a greater effect on the improvement of the upstream water environment quality, but it is opposite to the downstream region.
Optimal trajectory of pollution stock under non-cooperation when T=10.
Optimal trajectory of pollution stock under cooperation when T=10.
The change of water pollution stock in the basin also reflects the fact that the dynamic game under finite-planning horizon is a non-cooperative Nash equilibrium strategy. It can also be seen from Figure 15 that the dynamic track curve of pollution stock under the two decision-making models is a concave function, that is, the trend first declines and then rises. On the whole, the non-cooperative decision-making model will reduce the water pollution stock in the basin, while the cooperative model is opposite.
Optimal trajectory of pollution stock under cooperation and non-cooperation when the planning period T=10.
Due to the limitation of planning period and dynamic game decision making, the fluctuation of pollutants in the basin under the finite-planning horizon is small, and it is difficult to reach the ultimate goal of regional water pollution control. Similar to the previous analysis, we also simulated a dynamic trajectory of pollution stock at long-term steady state, as shown in Figures 16–18. Figures 16 and 17 show the dynamic trajectory of pollution stock under non-cooperative and cooperative game decisions, respectively (when T=10). The trajectory changes of the two graphs are very similar. The pollution in upstream has a sharp decline in both non-cooperative and cooperative decision-making models, while the pollution stock in downstream shows a downward trend after an upward trend. At last, both of them tend to be stable, but the stationary values are not the same.
Optimal trajectory of pollution stock under non-cooperation when the planning period T⟶∞.
Optimal trajectory of pollution stock under cooperation when the planning period T⟶∞.
Optimal trajectory of pollution stock under cooperation and non-cooperation when the planning period T⟶∞.
In order to compare which game decision-making model has more significant effect on the reduction of pollution stock in the river basin, we simulate the stock dynamic trajectory under the non-cooperation and cooperation decision-making model (as shown in Figure 18). It was finally found that the cooperative decision-making model can effectively reduce the stock of water pollution in the basin. Compared with the stock change of finite-planning horizon, the long-term steady state is a more optimized dynamic game model. That is to say, the cooperative game decision-making model under the long-term steady state can effectively reduce the stock of water pollution, thus improving the water environment pollution of the basin.
6. Discussion6.1. Further Analysis of the Optimal Emission Rate
Under the condition of finite-planning horizon, i.e., limited T, the instantaneous emission difference of water pollution in the basin is(59)ΔEnt=EnNt−EnCt=D2dr+θ2r+θ1+d1−er+θ1+dt−T+D2dr+θ2θ2−θ1−der+θ2t−T−er+θ1+dt−T.
In this condition, the difference between the instantaneous emission rate is related to the time t and the length of planning period T. ∂EnNt/∂T<0,∂EnCt/∂T<0 indicates that the longer the planning period is, the lower the equilibrium emission rate is. When T ⟶ ∞, the equilibrium emission rate approaches the equilibrium solution of the long-term steady-state differential game.
Figures 19 and 20 depict the trajectory curve of optimal instantaneous emission rate under two decision models (non-cooperative decision model and cooperative decision model) during different planning periods (T = 5, T = 10, and T = 15).
Trajectory curve of optimal instantaneous emission rate under non-cooperation when the planning period T=5,T=10, and T=15.
Trajectory curve of optimal instantaneous emission rate under cooperation when the planning period T=5,T=10, and T=15.
When T ⟶ ∞, EnCt<EnNt, the instantaneous emission difference in the basin area is(60)ΔEnt=EnNt−EnCt=D2dr+θ2r+θ1+d.
The results show that in the cooperative decision making, the amount of pollution emission in both regions is smaller than that in non-cooperative decision making. Within a long-term steady state, the reduction of pollution emission is related to unit damage cost, discount rate, and self-purification capacity of the water body.
6.2. Further Analysis of the Optimal Pollution Abatement Investment Level
From the perspective of river basin, in the upstream region, when μ22>μ3μ1+μ2, the investment intensity of pollution abatement has ∂In1Nt/∂w>0, indicating that under the non-cooperative game conditions within a finite-planning horizon, the higher the market price of emission rights, the bigger the optimal pollution abatement investment. Conversely, when μ22<μ3μ1+μ2, there is ∂In1Nt/∂w<0, indicating that the higher the market price of the emission rights is, the smaller the optimal pollution abatement investment will be. In downstream, the investment in pollution abatement is ∂I22Nt/∂w>0, indicating that the higher the market price of emission right is, the greater the investment in optimal pollution abatement is. Under the cooperative game investment decision-making model, there are ∂In1Ct/∂w>0,∂I22Ct/∂w>0, which shows that the higher the market price of emission rights is, the greater the investment intensity of the optimal pollution control is.
No matter long-term steady state or finite-planning horizon, both decision models have ∂In1t/∂μ1<0,∂I22t/∂μ2<0, indicating that the smaller the cost of pollution abatement investment is, the greater the investment intensity of the optimal pollution abatement is; ∂I21Nt/∂μ3>0, and the difference of pollution abatement cost under the non-cooperative decision-making model is inversely proportional to the investment intensity, which shows that the higher the investment cost of the downstream participants to upstream, the greater the investment intensity of the optimal pollution abatement.
Similarly, under the conditions of finite-planning horizon and long-term steady state, both game decision models have ∂Innt/∂gnn>0,∂Innt/∂znn<0. This also verifies the accumulation effect and depreciation effect of pollution abatement investment. It is worth noting that under finite-planning horizon, there are ∂I21Nt/∂g22<0,∂I21Nt/∂z22>0;∂I22Nt/∂g21<0,∂I22Nt/∂z21>0, indicating that there is a crowding out effect of downstream pollution abatement investment on upstream investment. Further analysis found that ∂Int/∂r<0, ∂Int/∂Dn>0, and ∂Int/∂θ<0; the lower the discount rate, the higher the environmental damage cost, the smaller the self-purification ability of the water body, and the greater the optimal pollution abatement investment.
The investment difference under the infinite level condition is(61)ΔInt=InCt−InNt=1μ1D2dr+θ2r+θ1+d,which indicates that the investment in pollution abatement in the two regions is greater than that in the non-cooperative decision making when the cooperative decision is made. In long-term steady state, the increased investment intensity is related to the unit damage cost of pollution, discount rate, the difference of investment cost in upstream region, and the coefficient of self-purification capacity of water body.
6.3. Further Analysis of Optimal Pollution Stock
The change of pollution stock level depends on the initial emission and the investment intensity of pollution abatement during the game period. Under the finite-planning horizon and long-term steady state, both decision models can reduce the level of pollution stock. Under the condition of long-term steady state, whether it is cooperative game or non-cooperative game decision, there are ∂PnNt/∂gnn<0,∂PnCt/∂gnn<0, ∂PnNt/∂znn>0,∂PnCt/∂znn>0. It shows that the accumulation effect of pollution abatement investment reduces the pollution stock, while the depreciation effect increases the pollution stock. Further analysis shows that there are ∂P2Nt/∂g21>0,∂P2Nt/∂z21<0, which also shows that the accumulation effect of upstream pollution abatement investment in the downstream region of the basin will reduce the stock of downstream pollution, that is, the environment in downstream will benefit. Conversely, the depreciation effect is not conducive to the improvement of downstream water environment.
Figures 21 and 22 depict the trajectory curves of the optimal pollution stock for two planning models (non-cooperative decision model and cooperative decision model) for different planning periods (T = 5, T = 10, and T = 15). It can be seen that the longer the planning period, that is, the larger the T, the lower the level of water pollution in river basin. When T ⟶ ∞, the equilibrium pollution stock approaches a long-term steady state conditional differential game.
Trajectory curve of optimal pollution stock under non-cooperation when the planning period T=5,T=10, and T=15.
Trajectory curve of optimal pollution stock under cooperation when the planning period T=5,T=10, and T=15.
7. Conclusion
This paper analyzes a dynamic differential game model of watershed transboundary water pollution abatement and discusses the optimal decision-making problem under non-cooperative and cooperative differential game, in which the accumulation effect and reduction effect of learning-by-doing pollution abatement investment are taken into account. By solving dynamic equations of models, the Nash equilibrium solution of instantaneous emission rate, investment intensity of pollution abatement, and the pollution stock in upstream and downstream regions are obtained. Based on the results, the path simulation and analysis of the optimal trajectory curves of each variable under long-term steady state and finite-planning horizon were carried out by numerical simulation. The results show that the change track of instantaneous emission rate, pollution abatement investment, and pollution stock under the condition of finite-planning horizon is quite different from that under the condition of long-term steady state:
Under the condition of finite-planning horizon, the game participants in the upstream and downstream regions choose the non-cooperative strategy at the end of the planning period, which leads to a significant increase in the instantaneous emission rate of each region. In terms of the amount of pollution emission, the longer the planning period is, the lower the optimal emission rate is in equilibrium. Whether it is a non-cooperative or a cooperative game, in the long-term steady state, after multiple games, the instantaneous emission rate of each region tends to a stable value, that is, the steady state level. The long-term steady state game under cooperative decision can effectively reduce the amount of pollution emission.
The dynamic changes of water pollution abatement investment under the condition of finite-planning horizon and long-term steady state are quite different. Specifically, under the condition of finite-planning horizon, the investment intensity of pollution abatement in the implementation of non-cooperative game is higher than that of cooperative game. At the end of the independent investment decision, the game decision tends to be a non-cooperative Nash equilibrium state. At this time, the extrusion effect of learning-by-doing pollution abatement investment is also relatively large. Under the condition of long-term steady state, the pollution abatement investment trajectory of the cooperative game is relatively stable and there is no obvious crowding out effect. Therefore, investment continues to rise, and the optimal equilibrium level at steady state is higher than that under non-cooperative decision making.
Due to the strong flow characteristics of water pollution, the trajectories of pollution stock in the upstream and downstream have opposite trends, and this difference is particularly evident in the finite-planning horizon conditions. The level of pollution in the upstream region decreased obviously, while the stock in the downstream increased at first and then tended to be stable. So, total pollution stock level is relatively high. Under the condition of long-term steady state, the trajectories of upstream and downstream pollution in the non-cooperative decision-making model and cooperative decision-making model are similar, but the cooperative decision-making model is superior to the non-cooperative model in terms of the period of stabilization and steady state.
Transboundary water pollution control in river basins is a long-term and complex process. Environmental regulation policies in a short period of time or within a certain planning period can only play a role in local areas. Cooperative decision making within the region can obtain the optimal solution of the game.
Data Availability
The variables data used to support the findings of this study are included within the article (Table 1).
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This study was supported by the National Social Science Foundation of China (grant no. 18ZDA040), the Humanities and Social Science Foundation of the Ministry of Education of China (grant no. 17JJD790017), and the Evaluation Commission of Social Science Achievements of Hunan Province of China (grant no. XSP20ZDA007).
KilgourD. M.OkadaN.NishikoriA.Load control regulation of water pollution: an analysis using game theory1988272179194StimmingM.Capital accumulation subject to population control-open-loop versus feedback investment strategies19998830933610.1023/a:1018994716675ListJ. A.MasonC. F.Optimal institutional arrangements for transboundary pollutants in a second-best world: evidence from a differential game with asymmetric players200142327729610.1006/jeem.2000.11632-s2.0-0035706949BretonM.Martín-HerránG.ZaccourG.Equilibrium investment strategies in foreign environmental projects20061301234010.1007/s10957-006-9086-82-s2.0-33845679831LöfgrenÅ.MillockK.NaugesC.The effect of uncertainty on pollution abatement investments: measuring hurdle rates for Swedish industry200830447549110.1016/j.reseneeco.2008.09.0022-s2.0-56749154836YeungD. W. K.Dynamically consistent cooperative solution in a differential game of transboundary industrial pollution2007134114316010.1007/s10957-007-9240-y2-s2.0-34547155498AnteloM.LoureiroM. L.Asymmetric information, signaling and environmental taxes in oligopoly20096851430144010.1016/j.ecolecon.2008.10.0022-s2.0-60249099544YeungD. W. K.PetrosyanL. A.A cooperative stochastic differential game of transboundary industrial pollution20084461532154410.1016/j.automatica.2008.03.0052-s2.0-44349160620BenchekrounH.Ray ChaudhuriA.Transboundary pollution and clean technologies201436260161910.1016/j.reseneeco.2013.09.0042-s2.0-84896317610HallN. D.Transboundary pollution: harmonizing international and domestic law200640681TahvonenO.Carbon dioxide abatement as a differential game199410468570510.1016/0176-2680(94)90033-72-s2.0-0001673550YoussefS. B.Transboundary pollution, R&D spillovers and international trade200943123525010.1007/s00168-007-0198-32-s2.0-59649118169LiS.A differential game of transboundary industrial pollution with emission permits trading2014163264265910.1007/s10957-013-0384-72-s2.0-84918822090BernardA.HaurieA.VielleM.ViguierL.A two-level dynamic game of carbon emission trading between Russia, China, and Annex B countries20083261830185610.1016/j.jedc.2007.07.0012-s2.0-44049097007ChangS.SethiS. P.WangX.Optimal abatement and emission permit trading policies in a dynamic transboundary pollution game20188354257210.1007/s13235-018-0260-z2-s2.0-85050821024ChangS.WangX.WangZ.Modeling and computation of transboundary industrial pollution with emission permits trading by stochastic differential game201510910.1371/journal.pone.01386412-s2.0-84946924414ArgoteL.EppleD.Learning curves in manufacturing1990247494592092410.1126/science.247.4945.9202-s2.0-0025021555DarrE. D.ArgoteL.EppleD.The acquisition, transfer, and depreciation of knowledge in service organizations: productivity in franchises199541111750176210.1287/mnsc.41.11.1750EppleD.ArgoteK. L.MurphyK.An empirical investigation of the microstructure of knowledge acquisition and transfer through learning by doing1996441778610.1287/opre.44.1.772-s2.0-17144429721BenkardC. L.Learning and forgetting: the dynamics of aircraft production20009041034105410.1257/aer.90.4.10342-s2.0-0001135843KimI.SeoH. L.Depreciation and transfer of knowledge: an empirical exploration of a shipbuilding process20094771857187610.1080/002075407014994812-s2.0-60749092277JaffeA. B.NewellR. G.StavinsR. N.Technological change and the environment, resources for the future2001Washington, DC, USADiscussion paper 00-47REVKlineS. J.RosenbergN.LandauR.RosembergN.An overview of innovation1986Washington, DC, USANational Academies PressBramoulléY.OlsonL. J.Allocation of pollution abatement under learning by doing2005899-101935196010.1016/j.jpubeco.2004.06.0072-s2.0-23244458082LiS.PanX.A dynamic general equilibrium model of pollution abatement under learning by doing2014122228528810.1016/j.econlet.2013.12.0022-s2.0-84891351948XuK.ChiangW.-Y. K.LiangL.Dynamic pricing and channel efficiency in the presence of the cost learning effect201118557960410.1111/j.1475-3995.2011.00816.x2-s2.0-84863576652JanssensG.ZaccourG.Strategic price subsidies for new technologies20145081999200610.1016/j.automatica.2014.05.0172-s2.0-84905265909ClarkeA. J.Learning-by-doing and aggregate fluctuations: does the form of the accumulation technology matter?200692343443910.1016/j.econlet.2006.03.0342-s2.0-33748070522RiversN.JaccardM.Choice of environmental policy in the presence of learning by doing200628222324210.1016/j.eneco.2006.01.0022-s2.0-33644869273ZhongG.ZhangW.Product and process innovation with knowledge accumulation in monopoly: a dynamic analysis201816317517810.1016/j.econlet.2017.12.0162-s2.0-85039805335KoganK.OuardighiF. E.ChernonogT.Learning by doing with spillovers: strategic complementarity versus strategic substitutability20166728229410.1016/j.automatica.2016.01.0322-s2.0-84960401049PanX.LiS.Dynamic optimal control of process-product innovation with learning by doing2016248113614510.1016/j.ejor.2015.07.0072-s2.0-84940896614LiS.NiJ.A dynamic analysis of investment in process and product innovation with learning-by-doing201614510410810.1016/j.econlet.2016.05.0312-s2.0-84974717570WeiZ.YiY.FuC.Cournot competition and “green” innovation under efficiency-improving learning by doing201953112176210.1016/j.physa.2019.1217622-s2.0-85067391193ChangS.QinW.WangX.Dynamic optimal strategies in transboundary pollution game under learning by doing201849013914710.1016/j.physa.2017.08.0102-s2.0-85029591098BretonM.ZaccourG.ZahafM.A differential game of joint implementation of environmental projects200541101737174910.1016/j.automatica.2005.05.0042-s2.0-23744452483JorgensonD. W.The economic theory of replacement and depreciation1974London, UKPalgrave Macmillan18922110.1007/978-1-349-01936-6_10GrilichesZ.R&D and productivity: the unfinished business1998Chicago, IL, USAUniversity of Chicago Press269283FernandezL.Trade’s dynamic solutions to transboundary pollution200243338641110.1006/jeem.2001.11872-s2.0-0036579921