Dynamic Optimal Control of Transboundary Pollution Abatement under Learning-by-Doing Depreciation

,is paper analyzes a dynamic Stackelberg differential game model of watershed transboundary water pollution abatement and discusses the optimal decision-making problem under non-cooperative and cooperative differential game, in which the accumulation effect and depreciation effect of learning-by-doing pollution abatement investment are taken into account. We use dynamic optimization theory to solve the equilibrium solution of models. ,rough numerical simulation analysis, the path simulation and analysis of the optimal trajectory curves of each variable under finite-planning horizon and long-term steady state were carried out. Under the finite-planning horizon, the longer the planning period is, the lower the optimal emission rate is in equilibrium.,e long-term steady-state game under cooperative decision can effectively reduce the amount of pollution emission. ,e investment intensity of pollution abatement in the implementation of non-cooperative game is higher than that of cooperative game. Under the long-term steady state, the pollution abatement investment trajectory of the cooperative game is relatively stable and there is no obvious crowding out effect. Investment continues to rise, and the optimal equilibrium level at steady state is higher than that under non-cooperative decision making. ,e level of decline in pollution stock under finite-planning horizon is not significant. Under the condition of long-term steady state, the trajectories of upstream and downstream pollution in the noncooperative model and cooperative model are similar, but cooperative decision-making model is superior to the non-cooperative model in terms of the period of stabilization and steady state.


Introduction
In recent years, more and more experts and scholars use dynamic optimal control theory to study various complex pollution control problems. e differential game method based on dynamic optimal control theory provides an effective research tool for the treatment of transboundary pollution abatement. And it can analyze the interaction between the strategies of the participants and the dynamic change trajectory of pollution stock. Kilgour et al. [1] first used game theory to analyze the total amount control of pollution in river basins, taking total amount control of COD (chemical oxygen demand) as an example. Stimming [2] studied a differential game problem of participating enterprises under two kinds of environmental policies: pollution tax and emission permit. It was concluded that the total investment and emission amount under feedback strategy decision making were higher than that in open-loop strategy. List and Mason [3] applied differential game theory to analyze the optimal institutional arrangement of transboundary pollution abatement under suboptimal conditions and established a differential game model under two systems of cooperative and non-cooperative decision making, respectively. e results showed that if the emission yields of the two regions are asymmetric and the initial emission level is low, non-cooperative decision making will be better than cooperative from the perspective of total income. Breton et al. [4] used a two-party finite-time differential game model to analyze the cooperative implementation of environmental projects. e key assumption of the model is that pollution abatement investments in one country can reduce the stock of pollution in other countries, and the cost of pollution damage is linear. Löfgren et al. [5] analyzed the impact of environmental pollution taxes with future uncertainties on manufacturers' pollution control investment decisions. Yeung [6] assumed that different governments adopt cooperative and non-cooperative decision making to control environmental pollution and implement the policy of levying pollution tax, with the ultimate goal of maximizing social welfare. e social production department decides the output of the enterprise on its own, aiming at maximizing its own profits. e cooperative differential game of transboundary pollution is analyzed and discussed. In addition, a dynamic consistency of cooperative strategy is further studied, and the stochastic differential game analysis of this problem is extended. Antelo and Loureiro [7] established a three-stage game model of oligopoly competition pollution abatement investment and found that manufacturers will make different pollution investment decisions for the symmetry of pollution control technology information. In [8], a cooperative stochastic differential game of transboundary industrial pollution is presented, and a payment distribution mechanism is derived to maintain the subgame consistency. Additionally, there are several published studies of transboundary pollution problems from other views, such as renewable resource, clean technologies, harmonization of international and domestic law, abatement cost, R&D spillovers, and so on (for instance, [9][10][11][12]). ese papers [8,13,14] took emission permit trading into account to study the problem of transboundary pollution. It has been shown in this literature that coordination of countries' emission strategies leads to a lower total level of pollution and to a higher total welfare than when countries use non-cooperative emission strategies. Chang et al. [15] obtained optimal emission levels and abatement expenditures in a finitehorizon transboundary pollution game with emission trading between two regions. And the paper found that cooperation between the regions leads to increased abatement and lower emissions, resulting in a lower pollution stock. In another paper [16], Chang et al. presented a stochastic differential game to study this kind of problem. More generally, the process of emission permit price is assumed to be stochastic and to follow a geometric Brownian motion (GBM). All the results demonstrate that the stochastic emission permit prices can motivate the players to make more flexible strategic decisions in the games.
e results of theoretical research and empirical analysis have shown that the technology acquired through learning by doing may not last indefinitely, but may depreciate. And learning-by-doing "forgetting" or "depreciation" is of great significance to production planning and scheduling [17][18][19][20][21]. When using economics to analyze environmental problems, scholars usually assume that technology is constant or that technological progress is considered exogenous. Unlike traditional analysis, technological advancement in real production is an endogenous and dynamic process, which is influenced by factors such as regional government policies and knowledge acquisition. Jaffe et al. [22] argued that the relationship between technological change and the environment has an important impact on environmental problems. e positive interaction between technology accumulation and good environmental pollution abatement policies should be better understood. Kline and Rosenberg [23] studied a variety of technology industries and found that in some cases, learning by doing in production process contributed more to technological progress than the original development itself. Bramoullé and Olson [24] argued that the government's Pigouvian tax, an environmental regulation policy, could put pollution abatement on the right technological track. By using the effect of learning by doing over time on the distribution of pollution abatement among heterogeneous technologies, the best conditions for sharing all technologies could be determined. At the same time, it is found that the more mature pollution abatement technology is more inclined to adopt infant technology.
It is learning that accumulates pollution abatement experience and thus reduces pollution control investment costs. Due to the accumulation of pollution control experience, the cost of governance should be reduced, which is measured by cumulative governance. In fact, "learning by doing" from the experience of pollution control can promote the transformation of governance processes and technologies. ese changes can reduce costs. However, participants' net present value income includes not only production income and emission trading income but also pollution abatement costs. Some scholars use the value function of production to measure the learning value of learning by doing. It has been widely used in some operation research literature, such as [25][26][27]. Learning by doing is an important source of technological progress. Similarly, the technology of pollution abatement investment needs to be developed continuously in practice. In recent years, many authors have discussed the technology of pollution abatement investment [25] and the technology of investment accumulation [28]. Some studies have mentioned the environmental policy and abatement cost in the presence of learning by doing [24,25,29].
However, it seems that the existing literature rarely deals with the dynamic effects of accumulation and depreciation of learning by doing to analyze the changing trajectory of investment in transboundary pollution abatement. Furthermore, our work is in the spirit of the study by Li and Pan [25] and Zhong and Zhang [30]-the first time to investigate the effect of experience which is measured by the cumulative abatement investment from time 0 to t. In recent works [31][32][33][34][35], the investment cost reduces with accumulative experience. In this paper, it is assumed that the accumulation of pollution abatement investment increases at a constant rate. As knowledge is forgotten, depreciated, or replaced by new one, knowledge will depreciate at a certain rate. Similarly, there is a process of "learning by doing" that has been forgotten or replaced by new ones in actual production and application of pollution abatement technologies. In order to explore the strength of this forgotten or substitution effect of polluting enterprises in the process of pollution abatement investment and its impact on the instantaneous emission 2 Complexity rate and pollution stock, this article builds models, focuses on solving the optimal solution of related variables, and analyzes it. Our main purpose is to study the pollution abatement investment decision under the accumulation and depreciation of knowledge. erefore, we use a mature dynamic differential game model to simplify and more intuitively analyze the changes of various variables by numerical simulation. e rest of this paper is organized as follows. In Section 2, we will establish our basic dynamic general equilibrium model. In addition, the optimal Nash equilibrium solution of instantaneous emission rate, investment intensity of pollution abatement, and the pollution stock in upstream and downstream regions under non-cooperative game and cooperative game are presented in Sections 3 and 4, respectively. In Section 5, the path simulation and analysis of the optimal trajectory curves of each variable under finiteplanning horizon and long-term steady state equilibrium were carried out by numerical simulation. Some discussions and further analysis are provided in Section 6. Finally, Section 7 concludes the paper.

The Basic Model
ere are two adjacent regions (n � 1, 2) in a river basin, which we call upstream region 1 and downstream region 2, in our transboundary pollution model. It is assumed that both regions discharge organic pollutants into the river basin. For region n (n � 1, 2), production always leads to a quantity of by-products, namely, emissions E n (t) (n � 1, 2). We assume that Q n (t) is the industrial output of upstream and downstream region, which indicates that at time t, the industrial production of region 1 and region 2 is Q 1 (t) and Q 2 (t). e instantaneous emissions of pollutants from industrial output are E n (t) (n � 1, 2), that is, at timet, the pollution emissions of region 1 and region 2 are E 1 (t) and E 2 (t), respectively. It is assumed that pollution emissions in various regions are positively related to industrial production. And the instantaneous linear production function can be expressed as According to literature [3,4,36], it is assumed that the regional industrial income function is R n (Q n (t)), which can be expressed by the following quadratic functional form in terms of emissions: where A n (n � 1, 2) is a positive constant. us, the profit function R n (E n (t)) is expressed as a quadratic concave function of E n (t).
Environmental pollution damage cost is a linear function of pollution stock P n (t); following [4], the cost function is D n (t) � D n P n (t)(D n ≥ 0). Among them, D n indicates the degree of environmental damage per unit pollution stock to regionn. e investment intensity function of pollution abatement is I n (t)(n � 1, 2). It is known that pollution abatement can be realized only when technique and labor are invested. So, we should face the abatement cost which could decrease the net revenue. Following [3], we assume that upstream investment abatement cost can be described by following the quadratic form: Equation (3) means that the marginal cost is increasing with respect to the level of pollution abatement. So, the investment cost of water pollution abatement in downstream region 2 and the investment cost of water pollution control in upstream region 1 can be expressed as follows: where I 22 (t) indicates the investment intensity of downstream region 2 in local area and I 21 (t) indicates the investment intensity of downstream in upstream region 1. is functional form captures the idea that the cost of region 2 to region 1 depends on what the downstream is investing because investors will only acquire "learning by doing" for upstream regions after they have collected downstream investment experience in region 2.
By means of [24,35], the experience of applying pollution abatement technology G nn (t) is measured by the cumulative abatement from time 0 to t, that is, (6) where G nn0 (n � 1, 2) denotes the initial experience level of applying pollution abatement technology. Similar to the above, g nn (n � 1, 2) is a positive parameter and it represents the differences between the two regions' ability in accumulating experience. According to the learning-by-doing theory, the amount of cumulative experience will lead to a decline in the unit cost.
With the rapid development of society and technology, the accumulation of pollution abatement investment increases at a constant rate. And the amount of cumulative experience will lead to a decline in the unit cost. As knowledge is forgotten, depreciated, or replaced by new one, knowledge will depreciate at a certain rate. According to Jorgenson [37] and Griliches [38], a proportional or geometric depreciation rule seems to be a good choice to represent the depreciation of aggregate stock of knowledge.
To simplify, we apply the proportional and linear approach using z nn (n � 1, 2) to represent depreciation rate of the aggregate stock of knowledge evolving over time. Correspondingly, the learning-by-doing function has also changed because of taking knowledge depreciation into consideration: Here, we define parameters g 11 , g 21 , g 22 > 0 as rate of knowledge accumulation under investment in pollution abatement technology and parameters z 11 , z 21 , z 22 > 0 as depreciation rate of learning by doing. e investment intensity of pollution abatement in upstream region is I 11 (t).
In this model, the stocks of pollution in the upstream and downstream regions of the basin are P 1 (t) and P 2 (t) at timet. And the pollution stock in the two basins can be expressed by the following two differential equations: where θ 1 P 1 (t) and θ 2 P 2 (t) represent the stocks of selfpurification pollution. As we all know, with the change of time and temperature, the water in nature has certain selfpurification ability. Without the loss of generality, it is assumed that the basin has the purification rate of water pollution, namely, the coefficient of self-purification ability of water θ n (n � 1, 2), θ 1 , θ 2 ≥ 0. dP 1 (t) is the number of pollution transferred from upstream to downstream. d is assumed to be the transfer coefficient, and 0 ≤ d ≤ 1. e number of initial emission permits in region 1 is E 10 and region 2 is E 20 . It is assumed that the emission trading market is a fully competitive market and the price of emission trading right is constant at w. If the amount of emission exceeds the initial allocation, the emission right can be purchased in the permit market. On the contrary, if there is a surplus of the emission right, it can be reserved for the next year or sold in the emission permit trading market.
Given the above assumptions, we can get the concrete expression of the income function W n (t) of the two regions: We define ε 11 , ε 21 , ε 22 > 0 as the conversion rates of learning by doing.
represent the cost savings brought by learning by doing, and it can also be considered as the income from learning-by-doing conversion.

Non-Cooperative Game
e amount of pollution abatement investment provided by the downstream region in the same basin will affect the investment enthusiasm of the upstream region in pollution control and the amount of pollutant discharge in the region. In turn, the amount of pollutant discharge in the upstream region will affect the amount of pollutant discharge in the downstream region, which constitutes a dynamic game relationship between the two sides. Considering the decision making in continuous time, this constitutes a dynamic differential game relationship.
Under this model, both regions aim to maximize the net present value of their own long-term income. e pollution discharge in the upstream region will affect the income of the downstream region by affecting the pollution stock in the same water basin. e decision-making problem of independent discharge from the two regions constitutes a differential game problem with E 1 (t), E 2 (t), I 11 (t), I 21 (t), and I 22 (t) as control variables and P 1 (t), P 2 (t), G 11 (t), G 21 (t), and G 22 (t) as state variables, aiming at maximizing the net present value of their respective income. e current goal of region 1 is to maximize the expected present flow of instantaneous net revenue in terms of the emission path and the abatement level. We describe this issue as 4 Complexity s.t.
Likewise, the current goal of downstream region 2 is to maximize the expected present flow of instantaneous net revenue in terms of the emission path and the abatement level. Similarly, the optimization problem of the downstream region can be given as e current value Hamilton function of equation (10) is e current value Hamilton function of equation (12) is where λ N n (n � 1, 2, . . . , 6) are the dynamic adjoint variables associated with the state equation about P n (t). Here, the dual variables λ N n (n � 1, 2, . . . , 6), also called shadow prices or common state variables, are Lagrange multipliers, which are the derivatives of the two players' value functions, i.e., revenues, with respect to the pollution stock P n (t).
Solving equations (16) According to the actual situation, here we focus on the In the long run, T ⟶ ∞, lim T⟶∞ I N n (t), (n � 11, 21, 22) tends to a steady state. We apply the superscript "∧" to identify the non-cooperative equilibrium results. Equations (21)-(26) are standard first-order differential equations. Substituting into equations (29)-(31) and solving equations under state equilibrium conditions, we get Further, substituting equations (32)-(34) into equation (8), collating, and solving, we obtain the optimal trajectory of pollution stock in the upstream and downstream regions under the non-cooperative game as follows: where

Cooperative Game
Now, consider another model, assuming that the upstream and downstream regions of the basin reach some agreement, set up a joint decision-making department or unified decision making by a higher management department, and jointly coordinate the pollution discharge strategies of the two regions with the goal of maximizing total net present value of long-term income. en, the decision-making problem of combined pollution discharge in upstream and downstream regions constitutes a differential game problem, which takes instantaneous emission discharge (emission discharge rate) as control variable and pollution stock in water area, investment in pollution abatement, and learning by doing as state variables, and the total net present value of the whole benefit of the basin is maximized. We describe this issue as 6 Complexity s.t.
e current value Hamilton function of equation (37) is where λ C n (n � 1, 2, . . . , 5) are the dynamic adjoint variables associated with the state equation about P n (t). Here, the dual variables λ C n (n � 1, 2, . . . , 5), also called shadow prices or common state variables, are Lagrange multipliers, which are the derivatives of the two players' value functions, i.e., revenues, with respect to the pollution stock P n (t).
Solving equations (40)-(44), we have Similarly, in the long run, T ⟶ ∞, lim T⟶∞ I C n (t), (n � 11, 21, 22) tends to a steady state. We apply the superscript "∧" to identify the cooperative equilibrium results. Equations (45)-(49) are standard first-order differential equations; solving the above equations and then substituting the results into equations (52) and (53), we get Further, substituting equations (54) and (55) into equation (8), collating, and solving, we obtain the optimal trajectory of pollution stock in the upstream and downstream regions under the cooperative game as follows: where

Numerical Simulation
rough the analysis mentioned above, we have obtained the optimal emission, pollution abatement investment, and pollution stock under non-cooperative game and cooperative game. In this section, we will analyze their trajectories of each variable and find the difference between the non-cooperative model and cooperative model and simulate the optimal emission decision making and investment level of pollution abatement. e parameters used in the numerical examples are presented in Table 1, and we use the version 7.0 of Wolfram Mathematical Matlab to obtain the numerical solutions. e parameters are as follows [39].

Comparative Analysis of Optimal Emission Level.
Drawing on [15], in this section, we simulate the dynamic trajectory of optimal emission in the basin at a finiteplanning horizon, i.e., T �10, as shown in Figures 1-3. Figure 1 simulates the dynamic trajectory of upstream and downstream pollution emissions in a non-cooperative decision-making model. Figure 2 shows the dynamic trajectory of pollution emissions when the upstream and downstream regions adopt a cooperative decision-making model. From the figures, we can easily find out that, excluding other interference factors, at a finite-planning horizon, the pollution emission level in the upstream of the basin is higher than that in the downstream, whether in the non-cooperative decision-making model or cooperative decision-making model. Assuming that the planning period is 10, game participants adjust their decision making with the change of time. In the initial stage, due to the pressure of environmental regulation, the pollution level of the upstream and downstream regions of the basin is relatively low. With the continuous promotion of a limited number of repeated games, the participants in the game gradually adopt a non-cooperative Nash equilibrium strategy to increase the level of pollution emissions. Figure 3 simulates the dynamic trajectory of pollution emission under the cooperative and non-cooperative decisionmaking model. e curve trajectory shows an upward trend and tends to be the same at the planning period T �10. It shows that under the finite-planning horizon, whether the cooperative decision or the non-cooperative decision is adopted at the beginning stage, the final game result is the non-cooperative Nash equilibrium.
is also verifies the previous conclusions.
When T ⟶ ∞, we call it long-term steady state. In a differential game, the information that participants have at the beginning stage is similar to the finite-planning horizon, and they only know the initial state of the system. At different time t, participants in the game take different decisions. erefore, the optimal strategy is a time-dependent dynamic strategy. Next, we simulate the dynamic trajectory of optimal emission in the basin at a long-term steady state as shown in Figures 4-6. Under the assumption of long-term steady state, Figure 4 simulates the dynamic trajectory of pollutant emission in the upstream and downstream regions of the basin in a non-cooperative game model. Figure 5 8 Complexity depicts the dynamic trajectory of pollution emissions in the cooperative game model. Different from the finite-planning horizon conditions, the pollution emission decision of the game participants under the long-term steady state gradually stabilizes with the passage of time, and the emission level or emission amount tends to be stable. Compared with the initial stage, the total emission has a downward trend. Figure 6 summarizes the pollution emission level under the cooperative and noncooperative decision-making models. It can be seen that with the passage of time, the pollution emission level during the cooperative game shows a significant downward trend, and the cooperative emission in the stable state is lower than that in the non-cooperative decision-making model.

Comparative Analysis of Optimal Pollution Abatement
Investment. Similar to the comparative analysis of the         optimal pollution emission level, we simulate the dynamic trajectory of the optimal pollution abatement investment under the condition of finite-planning horizon. Figures 7 and 8 show the dynamic trajectory of optimal pollution abatement investment in non-cooperative and cooperative decision-making models, respectively. In Figure 7, the investment intensity of downstream pollution abatement in the non-cooperative model is higher than that in upstream region. e investment level of upstream and downstream increases at first and then decreases, which shows that the decision game of non-cooperative investment at finite-planning horizon is unstable, and the later stage of limited repeated game tends to non-cooperative Nash equilibrium. Even under the upstream and downstream cooperative investment decision model, the upstream and downstream pollution abatement investment tends to noncooperative Nash equilibrium, and the downward trend of investment level is more obvious.
is can be seen from Figure 8. However, as far as the overall situation of the two investment decision-making methods is concerned, Figure 9 shows that the pollution abatement investment intensity in the non-cooperative decisionmaking model is significantly higher than that in the        10 Complexity cooperative decision-making model, and the change of investment intensity is unstable. From the perspective of "learning by doing," the accumulation effect of knowledge plays a positive role in most of the early stage of the non-cooperative investment decisionmaking model, and it has a prominent depreciation effect at the end of the planning period. is can be seen from the trend of the curve trace of Figure 7. Similarly, it can be seen from Figure 8 that the depreciation effect has a stronger inhibition effect on pollution abatement investment than the accumulation effect in the cooperative investment decisionmaking model. e change process and degree of pollution abatement investment accumulation and depreciation effect under the two investment decision-making models can be reflected in Figure 9. Furthermore, we can also find that the investment level of non-cooperative pollution abatement under finite-planning horizon conditions is significantly higher than that of cooperative pollution abatement, that is, the accumulation effect of learning by doing is more obvious than the depreciation effect.
Different from the finite-planning horizon conditions, the pollution abatement investment trajectory under the long-term steady state will show a more stable trend with the passage of time, that is, it tends to a stable value. As shown in Figure 10, in the short term, the pollution abatement investment under the non-cooperative decision-making model will show an increasing trend. After increasing to a certain level, there will be a sharp downward trend and it will finally tend to be stable. However, the pollution abatement investment trajectory of the cooperative decision-making model shows a big difference. e change of investment is a smooth process that continues to grow until it reaches a stable value (Figure 11). Figure 12 shows the significant difference of investment trajectory between the two decision models. It is worth noting that under the condition of long-term steady state, the stable level of cooperative investment is higher than that of non-cooperation, that is, the accumulation effect of pollution abatement investment is stronger than depreciation effect.

Comparative Analysis of Optimal Pollution Stock.
In order to study the change of pollution stock in the upstream and downstream of the basin and to understand the movement track of the stock level more clearly, we simulate the optimal pollution stock trajectory curves under noncooperative and cooperative decision making, respectively, as shown in Figures 13 and 14. e change in the level of     pollution stock under the two decision-making models is relatively similar. Under the finite-planning horizon, when the non-cooperative decision-making model ( Figure 13) is adopted, the pollution stock in the upstream region has a significant decrease but rebounds at the end of the planning period; the pollutants in the downstream regions increase at the beginning of the plan and decrease in the later stage. In the cooperative model (Figure 14), the pollution stock in the upstream also has a significant decrease and tends to be stable in the later stage, while the pollution stock in the downstream shows an upward trend, that is, the amount of pollutants is increased. It shows that the regional pollution abatement decision under the finite-planning horizon has a greater effect on the improvement of the upstream water environment quality, but it is opposite to the downstream region.
e change of water pollution stock in the basin also reflects the fact that the dynamic game under finite-planning horizon is a non-cooperative Nash equilibrium strategy. It can also be seen from Figure 15 that the dynamic track curve of pollution stock under the two decision-making models is a concave function, that is, the trend first declines and then rises. On the whole, the non-cooperative decision-making model will reduce the water pollution stock in the basin, while the cooperative model is opposite.
Due to the limitation of planning period and dynamic game decision making, the fluctuation of pollutants in the basin under the finite-planning horizon is small, and it is difficult to reach the ultimate goal of regional water pollution control. Similar to the previous analysis, we also simulated a dynamic trajectory of pollution stock at long-term steady state, as shown in Figures 16-18. Figures 16 and 17 show the dynamic trajectory of pollution stock under non-cooperative and cooperative game decisions, respectively (when T � 10). e trajectory changes of the two graphs are very similar. e pollution in upstream has a sharp decline in both noncooperative and cooperative decision-making models, while the pollution stock in downstream shows a downward trend after an upward trend. At last, both of them tend to be stable, but the stationary values are not the same.
In order to compare which game decision-making model has more significant effect on the reduction of pollution stock in the river basin, we simulate the stock dynamic trajectory under the non-cooperation and cooperation decision-making model (as shown in Figure 18). It was finally found that the cooperative decision-making model can     In this condition, the difference between the instantaneous emission rate is related to the time t and the length of planning period T. (zE N n (t)/zT) < 0, (zE C n (t)/zT) < 0 indicates that the longer the planning period is, the lower the equilibrium emission rate is. When T ⟶ ∞, the equilibrium emission rate approaches the equilibrium solution of the long-term steady-state differential game. Figures 19 and 20 depict the trajectory curve of optimal instantaneous emission rate under two decision models (non-cooperative decision model and cooperative decision model) during different planning periods (T � 5, T �10, and T �15).
When T ⟶ ∞, E C n (t) < E N n (t), the instantaneous emission difference in the basin area is e results show that in the cooperative decision making, the amount of pollution emission in both regions is smaller than that in non-cooperative decision making. Within a long-term steady state, the reduction of pollution emission is related to unit damage cost, discount rate, and self-purification capacity of the water body.

Further Analysis of the Optimal Pollution Abatement
Investment Level. From the perspective of river basin, in the upstream region, when μ 2 2 > μ 3 (μ 1 + μ 2 ), the investment intensity of pollution abatement has (zI N n1 (t)/zw) > 0, indicating that under the non-cooperative game conditions within a finite-planning horizon, the higher the market price of emission rights, the bigger the optimal pollution abatement investment. Conversely, when μ 2 2 < μ 3 (μ 1 + μ 2 ), there is (zI N n1 (t)/zw) < 0, indicating that the higher the market price of the emission rights is, the smaller the optimal pollution abatement investment will be. In downstream, the investment in pollution abatement is (zI N 22 (t)/zw) > 0, indicating that the higher the market price of emission right is, the greater the investment in optimal pollution abatement is. Under the cooperative game investment decision-making model, there are (zI C n1 (t)/zw) > 0, (zI C 22 (t)/zw) > 0, which shows that the higher the market price of emission rights is, the greater the investment intensity of the optimal pollution control is.
No matter long-term steady state or finite-planning horizon, both decision models have (zI n1 (t)/zμ 1 ) < 0, (zI 22 (t)/zμ 2 ) < 0, indicating that the smaller the cost of pollution abatement investment is, the greater the investment intensity of the optimal pollution abatement is; (zI N 21 (t)/zμ 3 ) > 0, and the difference of pollution abatement cost under the non-cooperative decision-making model is inversely proportional to the investment intensity, which shows that the higher the investment cost of the downstream participants to upstream, the greater the investment intensity of the optimal pollution abatement.
Similarly, under the conditions of finite-planning horizon and long-term steady state, both game decision models have (zI nn (t)/zg nn ) > 0, (zI nn (t)/zz nn ) < 0. is also verifies the accumulation effect and depreciation effect of pollution abatement investment. It is worth noting that under finite-planning horizon, there are (zI N 21 (t)/zg 22 ) < 0, (zI N 21 (t)/ zz 22 ) > 0; (zI N 22 (t)/zg 21 ) < 0, (zI N 22 (t)/zz 21 ) > 0, indicating that there is a crowding out effect of downstream pollution abatement investment on upstream investment. Further analysis found that (zI n (t)/zr) < 0, (zI n (t)/zD n ) > 0, and (zI n (t)/zθ) < 0; the lower the discount rate, the higher the environmental damage cost, the smaller the self-purification ability of the water body, and the greater the optimal pollution abatement investment.   14 Complexity e investment difference under the infinite level condition is which indicates that the investment in pollution abatement in the two regions is greater than that in the non-cooperative decision making when the cooperative decision is made. In long-term steady state, the increased investment intensity is related to the unit damage cost of pollution, discount rate, the difference of investment cost in upstream region, and the coefficient of self-purification capacity of water body.

Further Analysis of Optimal Pollution
Stock. e change of pollution stock level depends on the initial emission and the investment intensity of pollution abatement during the game period. Under the finite-planning horizon and longterm steady state, both decision models can reduce the level of pollution stock. Under the condition of long-term steady state, whether it is cooperative game or non-cooperative game decision, there are (zP N n (t)/zg nn ) < 0, (zP C n (t)/ zg nn ) < 0, (zP N n (t)/zz nn ) > 0, (zP C n (t)/zz nn ) > 0. It shows that the accumulation effect of pollution abatement investment reduces the pollution stock, while the depreciation effect increases the pollution stock. Further analysis shows that there are (zP N 2 (t)/zg 21 ) > 0, (zP N 2 (t)/zz 21 ) < 0, which also shows that the accumulation effect of upstream pollution abatement investment in the downstream region of the basin will reduce the stock of downstream pollution, that is, the environment in downstream will benefit. Conversely, the depreciation effect is not conducive to the improvement of downstream water environment. Figures 21 and 22 depict the trajectory curves of the optimal pollution stock for two planning models (non-cooperative decision model and cooperative decision model) for different planning periods (T � 5, T �10, and T �15). It can be seen that the longer the planning period, that is, the larger the T, the lower the level of water pollution in river basin. When T ⟶ ∞, the equilibrium pollution stock approaches a long-term steady state conditional differential game.

Conclusion
is paper analyzes a dynamic differential game model of watershed transboundary water pollution abatement and discusses the optimal decision-making problem under non-cooperative and cooperative differential game, in which the accumulation effect and reduction effect of learning-by-doing pollution abatement investment are taken into account. By solving dynamic equations of models, the Nash equilibrium solution of instantaneous emission rate, investment intensity of pollution abatement, and the pollution stock in upstream and downstream regions are obtained. Based on the results, the path simulation and analysis of the optimal trajectory curves of each variable under long-term steady state and finite-planning horizon were carried out by numerical simulation. e results show that the change track of instantaneous emission rate, pollution abatement investment, and pollution stock under the condition of finite-planning horizon is quite different from that under the condition of longterm steady state: (i) Under the condition of finite-planning horizon, the game participants in the upstream and downstream regions choose the non-cooperative strategy at the end of the planning period, which leads to a significant increase in the instantaneous emission rate of each region. In terms of the amount of pollution emission, the longer the planning period is, the lower the optimal emission rate is in equilibrium. Whether it is a non-cooperative or a cooperative game, in the long-term steady state, after multiple games, the instantaneous emission rate of each region tends to a stable value, that is, the steady state level. e long-term steady state game under cooperative decision can effectively reduce the amount of pollution emission. (ii) e dynamic changes of water pollution abatement investment under the condition of finiteplanning horizon and long-term steady state are quite different. Specifically, under the condition of finite-planning horizon, the investment intensity of pollution abatement in the implementation of non-cooperative game is higher than that of cooperative game. At the end of the independent investment decision, the game decision tends to be a non-cooperative Nash equilibrium state. At this time, the extrusion effect of learning-by-doing pollution abatement investment is also relatively large. Under the condition of long-term steady state, the pollution abatement investment trajectory of the cooperative game is relatively stable and there is no obvious crowding out effect. erefore, investment continues to rise, and the optimal equilibrium level at steady state is higher than that under noncooperative decision making. (iii) Due to the strong flow characteristics of water pollution, the trajectories of pollution stock in the upstream and downstream have opposite trends, and this difference is particularly evident in the finite-planning horizon conditions. e level of pollution in the upstream region decreased obviously, while the stock in the downstream increased at first and then tended to be stable. So, total pollution stock level is relatively high. Under the condition of long-term steady state, the trajectories of upstream and downstream pollution in the noncooperative decision-making model and cooperative decision-making model are similar, but the cooperative decision-making model is superior to the non-cooperative model in terms of the period of stabilization and steady state.
Transboundary water pollution control in river basins is a long-term and complex process. Environmental regulation policies in a short period of time or within a certain planning period can only play a role in local areas. Cooperative decision making within the region can obtain the optimal solution of the game.

Data Availability
e variables data used to support the findings of this study are included within the article ( Table 1).

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.