Decision-Making of a Dual-Channel Closed-Loop Supply Chain in the Context Government Policy: A Dynamic Game Theory

To analyze the effect of government reward-penalty policies (RPPs) on the decisions of a dual-channel closed-loop supply chain (CLSC), this paper endogenizes government decision variables to maximize social welfare and builds four decision-making models (without RPP, with carbon emission RPP, with recycling amount RPP, and with double RPP) by using a Stackelberg dynamic game between the government and supply chain members. )e research results show that, (1) in the four models, there exist optimal prices and reward-penalty coefficients to maximize the supply chain members’ profits and social welfare. (2) Comparing with model W, under most conditions, three government RPPs decrease the demand for new products and increase the demand for remanufactured products. Comparing the case without RPP, R’s profit decreases, and when the carbon emission cap is very big and the lowest recycling amount is very small, M’s profit increases. (3) In most cases, the three government RPPs can effectively control the total carbon emission and increase the social welfare, but they damage the benefits of retailers and consumers. With the increase of the carbon emission intensity of remanufactured products, the government can implement the double RPP, the carbon emission RPP, and the recycling amount RPP in turn.


Introduction
As the concepts of sustainable manufacturing and green supply chain have gained global acceptance, remanufacturing of used products has been recognized as a significant innovation of manufacturing and environmental protection [1][2][3][4][5]. Remanufacturing refers to a manufacturing process for specialized repair or upgrading in order to transform remanufactured products into "likenew." [6] More and more enterprises engage in recycling and remanufacturing the used products through dualchannel consisting of a network direct channel and traditional retail channel, which result in dual-channel closedloop supply chains (CLSCs) [7]. e CLSC management means that the supply chain members share their resources and recycle the used products to reduce environmental pollution and change of biodiversity over time [8][9][10][11][12][13][14]. Meanwhile, governments have put forward environmental policy to promote remanufacturing, including the remanufacturing subsidy policy [15,16]. Government subsidy policy holds a very important place in the development of remanufacturing, which is considered to be an efficient policy. e Chinese government has started to subsidize remanufacturers for each remanufactured truck engine sold [17]. Government policy is not only an attractive remanufacturing mechanism from an environmental perspective but it also provides a substantial source of revenue [18]. erefore, CLSC members are increasingly accepting remanufacturing and are willing to take responsibility for product recycling and remanufacturing. In a dual-channel CLSC, the decisions of manufacturers and retailers are influenced by the government subsidy policy, and the government plays a leading role. erefore, it has become a critical issue to investigate the effect of the government subsidy policy on the decisions of a dualchannel CLSC [19].
On the contrary, for the sustainable development of the environment, governments will levy taxes on excess carbon emission to control carbon emission, which together with the government subsidy policy constitutes the government reward-penalty policy (RPP). e carbon emission tax policies include carbon tax regulation [20], cap-and-trade regulation [21], and carbon cap regulation [22]. In 2008, the first global trading platform for carbon emission began to run. After that, more and more carbon trading platforms appeared, including the UK Emissions Trading Group and National Trust of Australia [23][24][25]. Since climate change is a defining issue of our time [26,27], the climate action summit held on September 23, 2019, and low carbon emission reduction is an important goal of our country's enterprise management. In this process, as a social subject, governments should deal with manufacturing companies to take strong regulatory measures and control carbon emission. And thus, governments carry out reward-penalty for enterprises of achieving and meeting performance targets. Chinese also promulgated "the law of People's Republic of China on air pollution prevention and control," "the law of People's Republic of China on energy conservation," and "the measures for the administration for certification of energy-saving products." In 2012, Beijing, Shanghai, Chongqing, Guangdong, Hubei, and Shenzhen were approved to carry out pilot carbon emission trading.
In a dual-channel CLSC, game theory is often used to research the decisions of CLSC members. Especially, the dynamics game theory, as a traditional and classical theory, reports that the action between the participants is in order, i.e., the latter can observe the former's behavior choice, and then the former makes the appropriate choice according to this [28]. By considering the influence of government policies, the members of the dual-channel CLSC must change the traditional decision-making mode. Since the government plays an important role in the dual-channel CLSC, it should be regarded as a participant in the dynamic game in the dual-channel CLSC dynamic game model. In the dynamic game model of government participation, the government's goal is to maximize social welfare [29]. We attempt to address this decision problem for a dual-channel CLSC by endogenizing government decisions in a dynamic game model.
More specifically, we will analyze the impact of one government RPP (carbon emission RPP or recycling amount RPP) and double government RPP (carbon emission and recycling amount RPP) on the decision-making of a dualchannel CLSC consisting of a manufacturer and a retailer, respectively. Four models are examined: (1) the decentral- In the dual-channel CLSC, the government first gives the intervention policy parameters to maximize social welfare. en, the manufacturer wholesales new products to the retailer and collects the used products from the marketplace. e manufacturer is responsible for remanufacturing and sells remanufactured products to consumers through the online channel.
e retailer subsequently retails the new products in the end market. In this decentralized dynamic model, the government is modeled as the leader followed by the manufacturer and lastly by the retailer. is research attempts to address the following three questions: (1) How to derive the optimal pricing decision, demand, profit, carbon emission, consumer surplus, and social welfare for the dual-channel CLSC in W, C, R, and D models based on a Stackelberg dynamic game? (2) What is the impact of the government RPP on the performance of the dual-channel CLSC? (3) What are the applicable conditions for each government RPP and how to set government RPP parameters for maximizing social welfare? e remainder of this paper is organized as follows. Section 2 reviews related literature. In Section 3, we describe the problem and model assumptions. Section 4 gives the optimal decisions of a dual-channel CLSC in W, C, R, and D models based on the Stackelberg dynamic game, and then model W is taken as a benchmark model, and the optimal decisions in models C, R, and D are compared with those in model W to illustrate the impacts of government RPP on the performance of the dual-channel CLSC, respectively. Numerical illustration is provided in Section 5. Section 6 concludes the paper.

Literature Review
In a dual-channel CLSC, an extensive game theoretic approach has been applied to the supply chain, which has been carried out from various perspectives, such as CLSC's performance, pricing decisions, dual-channel competition, and the effect of government intervention policy. Based on the research questions in this paper, we focus our review on the applications of a game theory in supply chains, the decision-making of dual-channel CLSCs, and the effect of government policies on supply chains.

e Applications of a Game eory in Supply Chains.
A critical issue in the supply chain is the competition relationship of supply chain members, which has attracted considerable attention in academia and practice. Most of the existing studies are using a game theory approach to depict the competition among supply chain members, but limited research has been carried out for game members including governments [30][31][32]. For instance, an optimization fuzzy game model of three-player payoff is developed in a green supply chain, and this model proposes a practical solution to increase the players' confidence to choose green strategies [33]. Babu and Mohan [34] explained and predicted social and economic sustainability for a public health insurance supply chain using evolutionary game theory. Raj et al. [35] studied the coordination issues in a sustainable supply chain using two-stage Stackelberg game-theoretic approach where the supplier acts as a Stackelberg leader. Pricing, recycling, and green product decisions in a multiproduct competitive 3-echelon supply chain are investigated, and the supply chain involves one manufacturer and multiple suppliers and retailers, in which the latter two compete horizontally while keeping Nash equilibrium, but all the three compete vertically while maintaining the Stackelberg equilibrium [36]. Guo et al. [37] proposed a two-echelon reverse supply chain and dealt with the differential game model by introducing recycling publicity and proposed coordination strategies between recyclers and processors in the collection of waste electrical and electronic equipment. Taleizadeh and Sadeghi [38] considered two collecting reverse supply chains and applied three game theory structures to obtain the optimal channel rewards: Nash, Nash-Stackelberg-first supply chain, and Nash-Stackelberg-second supply chain. e above research indicates that game models consisting of supply chain members are well suited to investigate the competition in the supply chain, but it remains an unsettled issue of considering the effect of the government action on the CLSC. In practice, governments usually are more powerful than supply chain members, and therefore, the Stackelberg game can be more suitable to characterize the interaction between them [39].

e Decision-Making of Dual-Channel
CLSCs. An increasing number of researchers have paid attention to the decision-making problem of dual-channel CLSCs and focused on the pricing decisions, channel operational management, remanufacturing strategy, and coordination mechanism. In a dual-channel CLSC, Xie et al. [40] combined the revenue-sharing contract in the forward channel with the channel investment cost-sharing contract and introduced the Stackelberg game to investigate the contract coordination mechanism. Giri and Dey [41] extended the model of Jafari et al. [42] with a backup supplier, considering the uncertainty in the collection of used products and the performance of the supply chain is improved. On the background of an online/offline dual channel, Xie et al. [43] developed a revenue-sharing mechanism by taking the relationship between the recycling rate and the recycling revenue-sharing ratio into consideration. Yi et al. [44] targeted at the optimum strategies of the collection decision for a retailer-oriented closed-loop supply chain with dual recycling channel in the construction machinery industry. Giri et al. [45] derived analytically the pricing and return product collection decisions for the supply chain under five different scenarios, viz., centralized, decentralized (Nash game), manufacturer-led, retailer-led, and third-party-led decentralized scenarios. Huang et al. [46] investigated the optimal strategies for a closed-loop supply chain with dual recycling channels, within which the manufacturer sells products via the retailer in the forward supply chain, while the retailer and the third party competitively collect used products in the reverse supply chain. It is worth noting that the government plays an important role in a dualchannel CLSC, and government policies will affect the supply chain members' decisions [7]. is motivates us to investigate the game between government and supply chain members [47].

2.3.
e Effect of Government Policies on Supply Chains. Many existing studies consider different government policies in a supply chain [48,49]. For example, He et al. [7] investigated the channel structure and pricing decisions of a dual-channel closed-loop supply chain (CLSC), where a manufacturer can distribute new products through an independent retailer and sell remanufactured products via a third-party firm or platform (3P) in the presence of possible government subsidy. Wan and Hong [50] investigated the impacts of subsidy policies and transfer pricing policies on the closed-loop supply chain with dual collection channels. Improving greening activities in manufacturing firms plays a major role in decreasing hazardous environmental impacts of products and increasing social welfare (SW); Zand et al. [51] considered a dyadic online to offline (O2O) closed-loop supply chain (CLSC) composed of a manufacturer and a retailer for trading a single green product. Chen et al. [52] endogenized a government subsidy in a research joint venture and presented a three-player game in which a government determines the amount of subsidy for a supply chain consisting of a manufacturer and a retailer conducting a research joint venture into a sustainable product. Sinayi and Rasti-Barzoki [53] considered a two-tier model which consists of a supply chain and a government that the government in a higher-level role as a leader for the whole supply chain. At both levels, three dimensions of sustainability, namely, economic, social, and environmental dimensions, are defined, and each is considered in modeling. Cheng et al. [54] established a cooperation decision model for a mixed carbon policy for carbon trading-carbon tax in a two-stage S-M supply chain and investigated the influence of mixed carbon policy with constraint of reduction targets on supply chain price, productivity, profits, and carbon emission reduction rate.
ese aforementioned papers all highlighted that government policies play a significant role in dual-channel CLSCs. erefore, our study will investigate the changes in the supply chain members' optimal decisions relative to government policy parameters, and especially when using government policies to increase social welfare.

Problem Description and Model Assumptions
We consider a dual-channel CLSC consisting of a government (G), a manufacturer (M), and a retailer (R), and they play a Stackelberg dynamic game. In the decentralized dynamic model, G firstly gives the intervention policy parameters to maximize social welfare. en, M wholesales new products at a unit wholesale price w n to R. M collects used products at a price A from the marketplace, and it is responsible for remanufacturing at a cost c r . And then, M sells remanufactured products at a unit price p r to consumers through the online channel. R subsequently retails the new products at a unit price p n in the end market. In this three-player Stackelberg dynamic game, G is modeled as the leader followed by M and lastly by R. Decision-making goal of M and R is to maximize their profits, and G's decisionmaking goal is to maximize social welfare. Our primary objective is to derive the effect of government RPPs on the Discrete Dynamics in Nature and Society decision-making of the dual-channel CLSC by using a dynamic game theory.
In this research, two common government policies (carbon emission RPP and recycling amount RPP) are considered. e operational structure diagram of the dualchannel CLSC under the government RPP is shown in Figure 1. en, four models are examined: (1) the decentralized model without RPP (model W), (2) the decentralized model with carbon emission RPP (model C), (3) the decentralized model with recycling amount RPP (model R), and (4) the decentralized model with double RPP (model D). Given the above model settings, we firstly derive the optimal pricing, resulting demand, profit, carbon emission, consumer surplus, and social welfare in the four models W, C, R, and D.
en, the effect characteristic of the government RPPs is obtained based on the optimal results. e numerical experiments are carried out to compare and evaluate the performance of the carbon emission RPP, recycling amount RPP, and double RPP, thereby useful managerial insights are given for supply chain managers and governments.
is paper only considers a single product, i.e., a new product or a remanufactured product. A consumer can only have a new product or a remanufactured product. It is assumed that new and remanufactured products coexist in the same market [55,56], and the remanufactured products are homogeneous. All recycled used products can be produced as remanufactured products, and only one unit of recycled used products can produce one unit of remanufactured products. e market consists of two types of consumers: new product consumers and remanufactured product consumers. New product consumers do not have used products when they buy new products, while remanufactured product consumers have used products before they buy remanufactured products. New product consumers can buy new products directly, while remanufactured product consumers must sell their used products when they buy remanufactured products [8].
Based on the problem description, we employ the symbols and notations given in Table 1 throughout this paper.
To make the analysis tractable, we introduce the following assumptions in this research.

Assumption 1.
Consumers are heterogeneous in their willingness-to-pay θ for a new product, which is uniformly distributed between 0 and 1. Consumers' willingness-to-pay for a remanufactured product is a fraction a of θ, where a ∈ (0, 1). e utilities that consumers receive from new and remanufactured products are u n � θ − p n and u r � aθ − p r , respectively [3,8,57]. Following the utility maximization principle, if u n ≥ max u r , 0 , consumers will purchase the new product, resulting in a new product demand function q n � 1 − ((p n − p r )/(1 − a)). If u r ≥ max u n , 0 , consumers will purchase the remanufactured product, leading to a remanufactured product demand function q r � ((ap n − p r )/(a(1 − a))) [8,58,59].
It is common to simplify the modeling analysis process by assuming the unit manufacturing cost of new products to be positive and that of remanufactured products to be zero [3,8,60,61]. e marginal manufacturing cost of new products is assumed to be c < 1 to ensure a positive demand for new products [8,62].
Assumption 3. To ensure profitable remanufacturing, A < ac is assumed.
A < ac encourages M to recycle and remanufacture used products to enjoy a cost advantage and offers both new and remanufactured products [3,8,55].
e unit carbon emission of a remanufactured product is less than that of a new product, and the carbon emission intensity of the remanufactured product λ satisfies λ ∈ (0, 1). e total amount of carbon emission is linearly increasing in its production output. e unit carbon emission generated in producing remanufactured products is a certain percentage of that generated in producing new products [61]. In model similar processes, the unit carbon emission generated in producing new products is 1, and then the unit carbon emission generated in producing remanufactured products is λ [61,62]. us, the total carbon emission of the two types of products is e � q n + λq r . Following Cachon [63], Yenipazarli [29], Cao et al. [21], and Cao et al. [61], the total environmental cost is assumed to be linearly increasing in the total emission, i.e., ve, where v is the environmental cost coefficient.

Equilibrium Analysis of Four Dynamic Models
In the following part, we will derive the optimal results for the four dynamic models W, C, R, and D and compare the optimal decisions in the models C, R, and D with those in the model W, respectively.

Model W: Decentralized Model without RPP.
In model W, G does not participate in the game; only M and R participate in the game. M determines w W n and p W r to maximize his profit. en, R determines p W n to maximize his profit. M is the leader, and R is the follower, and they play a Stackelberg dynamic game. Hence, the model is formulated as By using the reverse induction method, we have the optimal results for optimization model (1) as shown in the following Proposition 1.

Proposition 1.
In model W, the optimal prices are given as and e optimal profits of M and R are given as Total social welfare consists of four parts: supply chain members' profits, consumer surplus, government expenditure, and environment damage cost [29]. In model W, G does not participate in the game, government expenditure is zero, and then we have Consumer surplus consists of the consumer surplus of new and remanufactured products [29], i.e., e total carbon emission generated in producing new and remanufactured products [29,61], in this model W, is given as And thus, the environment damage cost is equal to ve W * . In this model W, the optimal consumer surplus and the total social welfare are given as  Unit production cost of the new products p n /p r Unit retail price of new/remanufactured products q n /q r Demand for new/remanufactured products λ Carbon emission intensity of remanufactured products A Unit exogenous cost of recycling used products G Carbon emission cap setting by G q 0 Lower limit of recovery amount setting by G t Reward-penalty coefficient of carbon emission, t > 0 g Reward-penalty coefficient of recycling amount, g > 0 θ Consumer's willingness-to-pay for new products a Consumer's value discount for remanufactured products v Environmental damage coefficient Discrete Dynamics in Nature and Society 5 e proof is given in Appendix A.

Model C: Decentralized Model with Carbon Emission RPP.
In model C, G implements the carbon emission RPP, and it participates in the game. Firstly, G determines the unit carbon emission reward-penalty coefficient t c by maximizing the total social welfare. Under this carbon emission RPP, M determines w C n and p C r to maximize his profit. Finally, R determines p C n to maximize his profit. G, M, and R play a Stackelberg dynamic game. In this Stackelberg dynamic game, G is modeled as the leader followed by M and lastly by R. e three-echelon Stackelberg dynamic game model consisting of G, M, and R is formulated as By using the reverse induction method, we have the optimal results of optimization model (7) as shown in the following Proposition 2.
Proposition 2. In model C, the optimal reward-penalty coefficient and price are given as and 1 − a)). e optimal profits of M and R are given as In this model C, the total carbon emission is given as erefore, the optimal consumer surplus and the total social welfare are given as e proof is given in Appendix B.1. e effects of carbon emission RPP on promoting remanufacturing are first embodied in the demands of the two-type products, as shown in Proposition 3.
Proposition 3. In model C, the effects of the carbon emission RPP on the prices and demands are given as follows: e proof is given in Appendix B.2. Proposition 3 indicates that the retail price of new products in model C is higher than that in model W, while the demand for new products in model C is lower than that in model W.
e demand for new products in model C decreases with the increase of unit carbon emission reward-penalty coefficient. Because the cost and the retail price of new products increase under the carbon emission RPP, thus the demand for new products decreases. We can also find that the carbon emission RPP increases the retail price of remanufactured products due to the increase cost. When the carbon emission intensity of the remanufactured products is very small, the demand for remanufactured products in model C is higher than that in model W. With the increase of the unit carbon emission reward-penalty coefficient, the demand for the remanufactured products increases. However, the total market demand in model C is smaller than that in model W. erefore, compared to the case without RPP, when the carbon emission intensity of the remanufactured product is very small, the carbon emission RPP can promote the recycling and remanufacturing products. But, it will reduce the total market demand. e effect of carbon emission RPP on the total carbon emission is given in Proposition 4.

Proposition 4.
In model C, the effects of carbon emission RPP on the total carbon emission are given as (ze C * /zt C * ) < 0 and e C * < e W * . e proof is given in Appendix B.3. Proposition 4 indicates that, with the increase of the unit carbon emission reward-penalty coefficient, the total carbon emission decreases. Compared to the case without RPP, the total carbon emission decreases. Because the demand for new products decreases under the carbon emission RPP and only when the carbon emission intensity of the remanufactured products is very small, the demand for remanufactured products increases, but the total market demand decreases, and thus, the total carbon emission decreases comparing the case without RPP.
e effects of the carbon emission RPP on the profits of R and M are described in Proposition 5.
Proposition 5. In model C, the effects of carbon emission RPP on profits are given as follows: 1 − a))).
e proof is given in Appendix B.4. Proposition 5 shows that, with the increase of the unit carbon emission reward-penalty coefficient, R's profit decreases, and the carbon emission RPP decreases R's profit comparing the case without RPP. Since the carbon emission RPP makes M increase the wholesale price of new products, while the demand for new products decreases comparing the case without RPP, thus R's profit decreases. From Proposition 5, we also find that when the carbon emission cap is very big, M's profit increases as the unit carbon emission reward-penalty coefficient increases, and it is higher than that in model W. Because the demand for the remanufactured products increases under the carbon emission RPP, the positive effect of remanufactured products on M's profit is greater than the negative effect of new products on M's profit, and thus, M's profit increases.

Model R:
Decentralized Model with Recycling Amount RPP. In model R, G implements a recycling amount RPP, and it participates in the game. Firstly, G determines the reward-penalty coefficient g R by maximizing the total social welfare. Under this recycling amount RPP, M determines w R n and p R r to maximize his profit. Finally, R determines P R n to maximize his profit. G, M, and R play a Stackelberg dynamic game. In this Stackelberg dynamic game, G is modeled as the leader followed by M and lastly by R. e three-echelon Stackelberg game consisting of G, M, and R is formulated as By using the reverse induction method, we have the optimal results for optimization model (11) as shown in the following Proposition 6.
Proposition 6. In model R, the optimal reward-penalty coefficient and price are given as 1 − a)) and q R * r � (a(1 − a + c) + (a − 2)(A− g R * ))/(4a (1 − a)). e optimal profits of M and R are given as In this model R, the total carbon emission is given as erefore, the optimal consumer surplus and the total social welfare are given as e proof is given in Appendix C.1 e effects of recycling amount RPP on promoting remanufacturing are first embodied in the demands of the two-type products, as shown in Proposition 7.
Proposition 7. In model R, the effects of the recycling amount RPP on the prices and demands are given as follows: (1) p R * n < p W * n , q R * n < q W * n , and (zq R * n /zg R * ) < 0.
e proof is given in Appendix C.2. Proposition 7 indicates that the retail prices of new and remanufactured products in model R are lower than those in model W. e demand for new products in model R is lower than that in model W, while the demand for remanufactured products in model R is higher than that in model W. With the increase of the unit recycling amount reward-penalty coefficient, the demand for new products decreases, but the demand for remanufactured products increases. Because the cost decreases and M decreases the retail price of the two-type products under the recycling amount RPP. And thus, the demand for remanufactured products increases in model R, while the demand for new products decreases in model R. e total market demand increases, and the recycling amount RPP is conducive to promoting recycling and remanufacturing of products.
Discrete Dynamics in Nature and Society e effect of recycling amount RPP on the total carbon emission is given in Proposition 8.

Proposition 8.
In model R, the effects of recycling amount RPP on the total carbon emission are given as if λ < (a/(2 − a)), then (ze R * /zg R * ) < 0 and e R * < e W * . e proof is given in Appendix C.3. Proposition 8 indicates that when the carbon emission intensity of the remanufactured products is very small, the total carbon emission decreases as the unit recycling amount reward-penalty coefficient increases. Compared to the case without RPP, the total carbon emission decreases. Because the increase amount of remanufactured product demand under the recycling amount RPP is larger than the decrease amount of new product demand, thus, the total carbon emission decreases comparing the case without RPP. e effects of the recycling amount RPP on the profits of R and M are described in Proposition 9.
Proposition 9. In model R, the effects of the recycling amount RPP on profits are given as follows: e proof is given in Appendix C.4. Proposition 9 shows that, with the increase of the unit recycling amount reward-penalty coefficient, R's profit decreases. And the recycling amount RPP decreases R's profit comparing the case without RPP. Because the retail price and demand of new products decrease under the recycling amount RPP, thus R's profit decreases. From Proposition 9, we also find that when the lowest recycling amount is very small, M's profit increases as the unit recycling amount reward-penalty coefficient increases, and it is higher than that in model W. Although the retail prices of two-type products and the demand for new products decrease under the recycling amount RPP, the demand for the remanufactured product and the total market demand increase. e negative effect of new products on M's profit is lower than the positive effect of remanufactured products on M's profit, and thus, M's profit increases.

Model D: Decentralized Model with Double RPP.
In model D, G implements a double RPP consisting of a carbon emission RPP and a recycling amount RPP, and it participates in the game. Firstly, G determines the unit carbon emission reward-penalty coefficient t D and the unit recycling amount reward-penalty coefficient g D by maximizing the total social welfare. Under this double RPP, M determines w D n and p D r to maximize his profit. Finally, R determines p D n to maximize his profit. G, M, and R play a Stackelberg dynamic game. In this Stackelberg dynamic game, G is modeled as the leader followed by M and lastly by R. e three-echelon Stackelberg dynamic game consisting of G, M, and R is formulated as By using the reverse induction method, we have the optimal results for optimization model (15) as shown in Proposition 10.  (1 − a)). e optimal profits of M and R are given as

Proposition 10. In model D, the optimal reward-penalty coefficients and prices are given as
In this model D, the total carbon emission is given as 8 Discrete Dynamics in Nature and Society erefore, the optimal consumer surplus and the total social welfare are given as e proof is given in Appendix D.1. e effects of double RPP on promoting remanufacturing are first embodied in the demands of the two-type products, as shown in Proposition 11.
Proposition 11. In model D, the effects of the double RPP on the prices and demands are given as follows: e proof is given in Appendix D.2. Proposition 11 indicates that the double RPP decreases the demand for the new products, and with the increase of carbon emission and recycling amount reward-penalty coefficient, the demand for new products decreases. When the environment damage coefficient is very small, the retail price of new products in model D is lower than that in model W.
e total carbon emission is strictly controlled under the double RPP, which causes a great negative impact on new products. e demand for remanufactured products in model D increases as the recycling amount reward-penalty coefficient increases. When the carbon emission intensity of remanufactured products is very small, the retail price of remanufactured products in model D is lower than that in model W, while the demand for remanufactured products in model D is higher than that in model W. e demand for remanufactured products in model D increases as the carbon emission reward-penalty coefficient increases, and the total market demand in model D is higher by comparing the case without RPP. Since the double RPP promotes product recycling, the positive effect of the remanufactured product is larger than the negative effect of new products. e effect of the double RPP on the total carbon emission is given in Proposition 12. e proof is given in Appendix D.3. Proposition 12 indicates that the total carbon emission decreases with the increase of carbon emission rewardpenalty coefficient. When the carbon emission intensity of the remanufactured product is very small, the total carbon emission decreases with the increase of recycling amount reward-penalty coefficient, and the total carbon emission in model D is lower than that in model W. Because the demand for remanufactured products increases and the demand for new products decreases under the double RPP, thus, the total carbon emission decreases. e effects of the double RPP on the profits of R and M are described in Proposition 13.
Proposition 13. In model D, the effects of the double RPP on profits are given as follows: e proof is given in Appendix D.4. Proposition 13 indicates that, with the reward-penalty coefficients increase, R's profit decreases, and it is lower comparing the case without RPP. Since the double RPP increases the wholesale price of new products and decreases the demand for new products, it damages R's profit. We also find that when the carbon emission cap is very big, M's profit increases as the unit carbon emission reward-penalty coefficient increases. When the lowest recycling amount is very small, M's profit increases as the unit recycling amount rewardpenalty coefficient increases. e demand for remanufactured products and the total market demand increase under the double RPP. When the carbon emission cap is very big and the lowest recycling amount is very small, M can obtain more rewards through product recycling and remanufacturing.
From the above propositions, we can find that the three government RPPs can promote product recycling and remanufacturing, i.e., decreasing the demand for new products and increasing the demand for remanufactured products.
e three government RPPs decrease the total Discrete Dynamics in Nature and Society carbon emission comparing the case without RPP and protect the environment. e three government RPPs damage R's profit but improve M's profit comparing the case without RPP. e comparative effects of the three government RPPs on supply chain profit, consumer surplus, and social welfare will be given in the following numerical analysis.

Numerical Experiment
In order to more intuitively describe the impact of the three government RPPs on the optimal decision of supply chain members, the supply chain profit, consumer surplus, and total social welfare, the numerical simulation analysis will be given in this part. e relevant parameters are designed to illustrate the managerial insights, which are set as c � 0.3, A � 0.1, a � 0.5, v � 0.5, q 0 � 1, and E � 1. e carbon emission intensity of remanufactured products is set to λ ∈ [0.4, 0.8]. en, we first have the impact of λ on the optimal reward-penalty coefficients, which is described in Figure 2. e impacts of λ on price, demand, profit, carbon emission, consumer surplus, and social welfare in the four models are described in Figures 3-8.
It can be found from Figure 2 that, with the increase of λ, the reward-penalty coefficient t C increases, while the rewardpenalty coefficients g R , t D , and g D decrease. When the carbon emission intensity of remanufactured products, λ, is very small, the reward-penalty coefficient t D is the highest, while when λ is very big, the reward-penalty coefficient t C is the highest. And the reward-penalty coefficient g R in model R is the lowest, but the reward-penalty coefficients t D and g D in model D are not the lowest. In order to ensure the rewardpenalty coefficients are positive, λ must be lower than 0.45.
at is, because G maximizes the total social welfare, when the carbon emission intensity of remanufactured products is very big, in order to control the total carbon emission, G increases the reward-penalty coefficient t C in model C. However, G decreases the reward-penalty coefficient g R in model R.
From Figure 3, we can find that, with the increase of λ, the retail prices of new and remanufactured products increase in models C and R, and the retail price of remanufactured products increases in model D. However, the retail price of new products firstly decreases then increases with the increase of λ in model D. e retail price of new products in model C is the highest, that is, the lowest in model R when λ is very small. e retail prices of new products in models C and D are higher than those in model W. When λ is very small, the retail price of remanufactured products in model D is the lowest, while λ is very bigger, the retail price of remanufactured products in model R is the highest, that is, because G increases the reward-penalty coefficient t C in model C, which increases M's cost. And then, M will increase the retail price of remanufactured products and the wholesale price of new products, and R increases the retail price of new products.
It is found from Figure 4 that, with the increase of λ, the demands of new products increase in models C, R, and D, while the demands of remanufactured products decrease in models C, R, and D. When λ is very small, the demand for new products in model D is the lowest, but the demand for remanufactured products in model D is the highest. However, when λ is very big, the demand for new products in model R is the highest, but the demand for remanufactured products in model R is the lowest.
at is because when the carbon emission intensity of remanufactured products, λ, increases, the three government RPPs will decrease the demand for remanufactured products but increase the demand for new products. When λ is very small, the double RPP introduces the lowest demand for new products and the highest demand for remanufactured products in model D. However, when λ is very big, the difference of carbon emission between new and remanufactured products is very small, and G has no incentive to set a higher reward-penalty coefficient of recycling amount. And then, the recycling amount RPP causes that the demand for new products is the highest, but the demand for remanufactured products is the lowest in model R.
From Figure 5, we can find that, with the increase of λ, the profits of M and R increase in model R, the profits of M and R first decrease, and then increase in model D, but in model C, M's profit first decreases then increases, and R's profit first increases then decreases. When λ is very small, M's profit in model C is the highest, and it is the lowest in model C, but R's profit in model W is the highest, and it is the lowest in model D. When λ is very big, M's profit in model R is the highest, but R's profit in model R is the highest. In the four models, M's profits are all higher than R's profits. Because when λ is very big, the reward-penalty coefficient g R is negative, and the recycling amount RPP increases the prices and demands of new and remanufactured products, which can increase the profits of M and R. G limits on carbon emission and recycling amount, which will damage the price and demand of new products and promote the price and demand of remanufactured products. And thus, comparing the case without RPP, the three government RPPs increase M's profit but decrease R's profit.
From Figure 6, we can find that when λ ∈ [0.4, 0.45], the total carbon emission in model R is higher than that in model W, but in other cases, the total carbon emission under the three government RPPs is lower than that in model W. From Proposition 8, when λ < (a/(2 − a)) � 0.33, the total carbon emission in model R is lower than that in model W. We also find that when λ is very small, the total carbon emission in model D is the lowest, but when λ is very big, the total carbon emission in model C is the lowest. erefore, when the carbon emission intensity of remanufactured products, λ, meets certain conditions, the three government RPPs can effectively control the total carbon emission. When the difference of carbon emission between new and remanufactured products is very big, the double RPP is the most effective to control the total carbon emission; otherwise, the carbon emission RPP is the most effective to control the total carbon emission. Figure 7 gives that the consumer surplus in model C is the lowest. When λ is very small, the consumer surplus in model D is the highest; otherwise, the consumer surplus in model W is the highest. In most conditions, the three government RPPs damage the consumer interests and surplus. Especially, the carbon emission RPP can most effectively control the total carbon emission, but under this government RPP, the consumer surplus is the lowest. e retail prices of new and remanufactured products are higher under the carbon emission RPP than those under other government RPPs. However, when λ is very small, the retail prices of new and remanufactured products in models R and D are lower than those in model W, and thus, the consumer surplus in models R and D is higher than that in model W.
From Figure 8, we can find that, with the increase of λ, the total social welfare in model W decreases, while the total Discrete Dynamics in Nature and Society social welfare in models C, R, and D first decreases then increases. When λ is very small, the total social welfare in model D is the highest, and G can implement the double RPP, while when λ is very big, the total social welfare in model R is the highest, and G can implement the recycling amount RPP; under other conditions, the total social welfare in model C is the highest, G can implement the carbon emission RPP, and they are all higher than those in model W. at is to say, the three government RPPs can all increase the total social welfare. at is because, in most conditions, comparing the case without RPP, although the three government RPPs decrease the consumer surplus, they can increase the profits of M and supply chain and decrease the total carbon emission, and thus increase the total social welfare. When the difference of carbon emission between two products is very big, G only can control the total carbon emission through setting the lowest recycling amount.
In order to analyze the impact of RPPs on supply chain members, we choose model D as the analysis object. In the following part, the parameter λ is set to be λ � 0.4. From Figure 2, we have the optimal reward-penalty coefficients t and g as to be t C * � 0.18, g R * � 0.06, t D * � 0.3, and g D * � 0.192. To illustrate the impacts of the reward-penalty coefficients t D and g D on profit, total carbon emission, and social welfare, the reward-penalty coefficients t D and g D are set to be t D ∈ [0, 0.5] and g D ∈ [0, 0.3]. e effects of the double RPP are illustrated in Figures 9-11. From Figure 9, we can find that, with the increase of t D , M's profit increases, but R's profit decreases. Under most conditions, M's profit is higher than R's profit. Only when t D is very small and g D is very big, R's profit is higher than M's profit. Because the more the reward-penalty coefficient t D is, the greater the damage to the demand for new products and the more benefit to the demand for remanufactured products. And thus, the double RPP is a benefit for M, but it is damage for R.
It can be found that, from Figure 10, with the increase of t D , the total carbon emission decreases, but it increases with g D increasing. Since when the reward-penalty coefficient t D is very high, the demand for new products decreases, but the demand for remanufactured products increases, and thus, the total carbon emission decreases. Combining Figure 6, we obtain that the total carbon emission in model D is always lower than that in model W. at is to say, the double RPP is effective to control the total carbon emission.
From Figure 11, it can be found that the total social welfare is a convex function of two reward-penalty coefficients, and then, there exists a unique optimal rewardpenalty coefficient combination (t D * � 0.3, g D * � 0.192) to maximize the total social welfare. Considering the supply chain member's profits, consumer surplus, government expenditure, and environment damage cost, G must set appropriate reward-penalty coefficients to balance the interests of the four aspects to maximize the total social welfare.

Conclusions
is paper studies the role of government policies in a dualchannel CLSC and constructs four dynamic game models (i.e., model W, model C, model R, and model D) by considering the game among M, R, and G. We first derive the optimal pricing decision, consumer demand, profit, carbon emission, consumer surplus, and social welfare for the dualchannel CLSC in models W, C, R, and D. Second, we take model W as the benchmark model and compare the optimal results in the later three models with the optimal values of model W to illustrate the impacts of the government RPPs on the performance of the dual-channel CLSC.
ird, through numerical simulation analysis, the optimal results in the four models are compared, and model D is taken as an example to analyze how to set the reward-penalty coefficients of the government RPPs by maximizing social welfare. Discrete Dynamics in Nature and Society e results show that, (1) in the four models, there exist optimal prices and reward-penalty coefficients to maximize the supply chain members' profits and social welfare. (2) Comparing with model W, under most conditions, three government RPPs decrease the demand for new products and increase the demand for remanufactured products. When the difference of carbon emission between new and remanufactured products is very small, three government RPPs can decrease the total carbon emission comparing the case without RPP. Comparing the case without RPP, R's profit decreases, and when the carbon emission cap is very big and the lowest recycling amount is very small, M's profit increases. (3) In most cases, the three government RPPs can effectively control the total carbon emission and increase the social welfare, but they damage the benefits of retailers and consumers. When the difference of carbon emission between new and remanufactured products is very small, G can implement the double RPP, while when the difference of carbon emission between new and remanufactured products is very big, G can implement the recycling amount RPP, and otherwise, G can implement the carbon emission RPP.
In this paper, we do not consider the recycling channels, recycling, and remanufactured costs of used products. ese factors will influence the decisions of supply chain members in a dual-channel CLSC and affect governments for setting their RPP's parameters. is is worthy of further research. In addition, the three government RPPs damage the benefits of the retailer and consumers, and how to design a more appropriate government RPP for governments to balance various interests will also become a problem for further study. When governments implement RPPs with the limitation of capital budget, how to determine the optimal reward-penalty coefficients is also worth further investigation.

(B.2)
Substituting the above results into the total social welfare maximizing problem, we can derive the optimal carbon emission reward-penalty coefficient as follows: Substituting t C * into w C n , p C n , p C r , q C n , q C r , π C m , π C r , e C , C C s , and T C s gives the optimal price, demand, profit, carbon emission, consumer surplus, and social welfare.

C. Proof of Propositions in Model R
C.1. Proof of Proposition 6 in Model R. Similar to the proof of Proposition 1, given reward-penalty coefficient g R of recycling amount, we have the prices and demands as follows: w R n � (1 + c)/2, p R r � (a + A − g R )/2, p R n � (A − a+ c + 3 − g R )/4, q R n � (A − a − c + 1 − g R )/(4(1 − a)), and q R r � (a(1 − a + c) + (a − 2)(A − g R ))/(4a (1 − a)). Hence, we have the profits of M and R, the total carbon emission, and consumer surplus as follows: (C.1) Substituting the above results into the social welfare maximizing problem, we can derive the optimal carbon emission reward-penalty coefficient as follows: A(a − 4) + a(1 + 3c − a) + 4v[a + λ(a − 2)] 4 − 3a .
(C.2) Discrete Dynamics in Nature and Society Substituting g R * into w R