The Dynamic Evolution of Firms’ Pollution Control Strategy under Graded Reward-Penalty Mechanism

The externality of pollution problem makes firms lack enough incentive to reduce pollution emission. Therefore, it is necessary to design a reasonable environmental regulation mechanism so as to effectively urge firms to control pollution. In order to inspire firms to control pollution, we divide firms into different grades according to their pollution level and construct an evolutionary game model to analyze the interaction between government’s regulation and firms’ pollution control under graded reward-penalty mechanism. Then, we discuss stability of firms’ pollution control strategy and derive the condition of inspiring firms to control pollution. Our findings indicate that firms tend to control pollution after long-term repeated games if government’s excitation level and monitoring frequency meet some conditions. Otherwise, firms tend to discharge pollution that exceeds the stipulated standards. As a result, in order to effectively control pollution, a government should adjust its excitation level and monitoring frequency reasonably.


Introduction
With the development of economy, environmental pollution problem is becoming more and more serious.But because of the externality of pollution problem, relying solely on the market mechanism cannot effectively stimulate firms to control pollution and reduce pollution emission.Therefore, environmental regulation is necessary to solve the pollution problem.Many scholars have studied the pollution emission problem under environmental regulation.For example, Gryglewicz et al. [1] investigate firms' pollution control investment decision under environmental regulation.D.-H.Kim and D.H. Kim [2] analyze the relationship between environmental regulation intensity and illegal pollution emission level.They find that the severe environmental regulation can reduce the frequency of illegal pollution emission.Foulon et al. [3] empirically analyze the impact of government's spot check on firms' pollution emission and then indicate that government's spot check can reduce the occurrence of overstandard pollution emission to a certain extent.Flynn [4] discusses the problem of environmental regulation capture.
Yi [5] takes trans-boundary water pollution as an example to summarize the reason and solution of environmental regulation failure for local government.Zang et al. [6] find that the game between the government and firms under the condition of asymmetric information may reduce the utility of environmental regulation for government, so the government should carry out regulation policy innovation to improve regulation efficiency.However, the existing literatures pay less attention to the design of environmental regulation mechanism.Environmental regulation in practice is implemented in the way of imposing fine on firms that exceed pollution emission standard.This regulation manner is too simple to receive satisfactory result.On the one hand, only dividing pollution emission firms into two groups according to the pollution emission standard may make firms just seek to reach standard, not for better.Moreover, pollution emission information collected through environmental monitoring cannot be fully utilized.On the other hand, all punishment no reward allows firms to treat environmental regulation as a burden, so evading supervision such as secret filming and cover-up happens now and then.Considering the two aspects, we divide firms into different grades according to their pollution level and construct an evolutionary game model (optimization methods are more widely used in the field of resource and environment management.For example, Zhang et al. [7] investigate the optimal control strategy for regional water pollution by using the inexact two-stage programming model.Miao et al. [8] presents an interval-fuzzy De Novo programming model to analyze the optimal allocation scheme for water resources in a watershed.Cai et al. [9,10], Suo et al. [11], and Hu et al. [12] study the optimal design problem of regional energy management system.The reason we do not adopt optimization methods in this paper is that, on the one hand, firms and government are bounded rational, and it is very difficult for them to make optimal decision (at least immediately); on the other hand, the optimal pollution emission control strategy derived from optimization methods can be implemented in the way of total amount control at the regional level, but it is short of maneuverability at the enterprise level) combined with the blame game [13] to analyze interaction between government's regulation and firms' pollution control under reward-penalty mechanism.Then, we discuss stability of firms' pollution control strategy and derive condition of inspiring firms to control pollution.

The Model
There are two ways that firms deal with pollutants produced in the production process.One way is to spend a certain amount of costs in dealing with pollutants and then discharge the treated pollutants; the other way is to discharge raw pollutants directly.Government as an environmental protection department needs to monitor firms' pollution emission situation.But due to the limitation of cost, it often monitors in the manner of random check.

Reward-Penalty Mechanism.
In order to encourage firms to control pollution, we assume that government not only punish firms based on their pollution level, but also reward firms that meet the pollution emission standard.Suppose that government divides firms into several grades according to firms' pollution level, the dividing method is described as follows.If firm's pollution level is less than environmental standard, it is denoted as grade  1 .Otherwise, once firm's pollution level increases by a fixed amount, firm's grade will increase one, denoted as  2 , . . .,   , . . .,   with   ∈  + .And then denote the set of pollution emission grade as  = { 1 ,  2 , . . .,   }.Accordingly, the pollution emission strategy set is denoted as  = { 1 ,  2 , . . .,   } (see Figure 1).Government imposes penalty (  / 1 −1) on firms that take the strategy of   , where  is punishment amount and (  / 1 − 1) determines the extent of punishment.
For  firms in the same area, each firm freely makes decision.Set the strategy of the th firm is () with () ∈ ; then the strategy set of  firms can be denoted as  = {(1), (2), . . ., ()}.Let  = min{(1), (2), . . ., ()} and  = max{(1), (2), . . ., ()}.Reward-penalty mechanism can be described as follows: giving a reward  for firms that meet environmental standard and imposing a penalty  on the most serious polluters.Then the reward and penalty of firms that take the strategy of   can be indicated by the function ⋅  .Therein,   is reward-penalty indicator function, which is defined as (1)

Evolutionary Game Model on Government's Monitoring
and Firms' Pollution Control.Firms freely make decision according to the principle of maximizing their benefits.Set the proportion of firms that take the strategy of   in all  firms as   in period ; then the proportion vector that depicts firms' pollution emission situation can be written as If a firm takes the strategy of   , it obtains additional benefit of Γ(  ).If government takes the strategy of monitoring, firms suffer from graded penalty (  / 1 )⋅ and gain reward-penalty compensation ⋅  .Then, the utility function   (  ) (or   (  )) for firms taking the strategy of   with (without) government monitoring is expressed as For the government, set monitoring cost as  and set monitoring probability as  in period .In respect to pollution level   , set pollution control cost as (  ) with government's monitoring and set negative impact without government's monitoring as (  ).The utility function   (or   ) with (without) government's monitoring is defined as ⋅  (  ) . (3)

The Stability Analysis of Firm's Pollution-Emission Strategy
3.1.Replicated Dynamic Equation.Firm's pollution emission is a long-term repeated process.Because of the limitation of the information and judgment, government and firms cannot find the optimal strategy at the beginning.In the process of repeated game, government and firms continually adjust their strategy and gradually find the better strategy.
The transformation process of the strategy of government and firms can be described by replicated dynamic equation.
From ( 6), government's monitoring probability (), reward strategy ( ⋅   ), and the penalty strategy ((  / 1 − 1) ⋅ ) affect the expected benefit of firm's pollution emission strategy   , thus controlling the evolution dynamics of the proportion of strategy   .The greater the proportion of low pollution firms, the better the pollution control effect.

The Stability Analysis of Firm's Pollution Emission Strategy.
For the government, let () = / = (1−)(  −  ) = 0 and we get the following results.If   =   , any monitoring probability  ∈ [0, 1] is equilibrium state; if   ̸ =   ,  = 0 or  = 1 is evolution equilibrium state of government's strategy; if   <   ,  = 0 is ESS, which indicates that government tends to take the strategy of nonmonitoring finally if the benefit of nonmonitoring is more than that of monitoring.If   >   ,  = 1 is ESS, which indicates that government tends to monitor after long-term repeated games if the benefit of nonmonitoring has a greater benefit.

(8)
For population evolution dynamics described by differential equation, we use the Jacobin matrix method to study the local stability of balance point.Denote (  ) as   for convenience, representing average benefit of adopting the strategy of   .The Jacobin matrix formed by   (  ) =   / = 0 can be expressed as ) .
From Table 1, we can know that the evolution stability of equilibrium state is determined by its corresponding strategy benefit.If there is a strategy whose benefit is higher than other strategy, after long-term repeated games, firms tend to take the strategy of   through continuous imitation and learning and the strategy becomes the sole ESS.For any other strategy   ( ̸ = ) or mixed strategy, it is not stable.If the benefit of all strategy is equal, firms' strategy evolution is more complex and may appear the phenomenon of bifurcation.To effectively control pollution, the government should adjust the level of reward and penalty reasonably to make the pollution emission strategy that meets pollution emission standard have a higher benefit; namely, ( 1 ) > max{( 2 ), . . ., (  )}.
For the government, the expected benefit of adopting the strategy of monitoring (nonmonitoring) is given by   = −1−  2 +  2 ⋅  (  =  2 ) through formulas (3).If   >   , namely,  2 > 1/( − 2), the expected benefit of adopting the strategy of monitoring is greater than that of nonmonitoring, and the government tends to monitor after long-term repeated games.Thus,  = 1 becomes government's ESS, which is conducive to fulfilling its duty and strictly enforcing law.If   <   , namely,  2 < 1/( − 2), the expected benefit of nonmonitoring is greater than that of monitoring, and government tends not to monitor after long-term repeated games.Thus,  = 0 becomes government's ESS, which leads to supervision failure and environmental degradation.If   =   , namely,  2 = 1/( − 2), any  ∈ [0, 1] is equilibrium state, but it is not evolutionary stability strategy.In conclusion, if the proportion that firms exceed pollution emission standard is higher than the critical value 1/( − 2), government tends to monitor; otherwise, government tends not to monitor.
(2) If ( 2 ) > ( 1 ) and ( 2 ) > 0, namely,  ⋅  +  ⋅  < 1, the incentive compensation and the additional benefit obtained by firms that don not control pollution are lower than punishment exerted by government, and firms tend to exceed pollution emission standard after long-term repeated games.Thus,  2 becomes ESS.
(3) If aforesaid conditions are not satisfied, any strategy is not an evolutionary stability strategy.In this situation, government monitoring is not decisive, and firms' strategy is random.Therefore, environmental pollution generated by firms is unpredictable.In order to control environmental pollution effectively, government should adjust the reward/penalty strategy and increase monitoring frequency (meeting 2⋅+⋅ > 1) to promote firms to control pollution.

Conclusions
In this paper, we divide firms into different grades according to their pollution level and construct an evolutionary game model to analyze interaction between government regulation and firms' pollution control under reward-penalty mechanism.Then, we discuss stability of firms' pollution control strategy and derive conditions that inspire firms to control pollution.Our findings indicate that firms will tend to control pollution after long-term repeated games if government's excitation level and monitoring frequency meets some conditions.Meanwhile, the government can effectively fulfill its duties and prevent environmental degradation.Otherwise, the benefit obtained by firms that exceed pollution emission standard will be higher than the reward for pollution control, and ultimately overstandard pollution emission and environmental degradation will appear.Therefore, in order to effectively control environmental pollution, government should adjust excitation level and monitoring frequency reasonably.