Dynamic Nonlinear Pricing Model Based on Adaptive and Sophisticated Learning

Existing dynamic pricing models which take consumers’ learning behavior into account generally assume that consumers learn on the basis of reinforcement learning and belief-based learning. Nevertheless, abundant empirical evidence of behavior game indicates that consumers’ learning is normally described as a process of mixed learning. Particularly, for experience goods, a consumer’s purchase decision is not only based on his previous purchase behavior (adaptive learning), but also affected by that of other consumers (sophisticated learning). With the assumption that consumers are both adaptive and sophisticated learners, we study a dynamic pricingmodel dealing with repeated decision problems in a duopolymarket. Specifically, we build a dynamic game model based on sophisticated experience-weighted attraction learningmodel (SEWA) and analyze the existence of the equilibrium. Finally, we show the characteristics and differences of the steady-state solutions between models considering adaptive consumers and models considering sophistical consumers by numerical results.


Introduction
For experience goods, consumers are only able to learn about their own preferences for a certain product after experiencing it.Studies have shown that indeed, there exists a learning process when consumers buy experience goods and they tend to be bounded rational for goods bought repeatedly [1].Learning model describes consumers' behavior by assuming that consumers are capable of using some simple learning rules when they make repeated decisions.Most previous research on dynamic pricing of experience goods describes consumers' learning process with the adaptive learning model.For example, consumers' purchase decisions are mainly based on their previous purchase behavior and usage experience.However, a large stream of empirical evidence indicates that in the market, there still exist sophisticated consumers.Different from adaptive ones, sophisticated consumers can use other consumers' purchase information to maximize their own utility.Hence, their learning process is different with that of adaptive consumers, and the way they purchase repeatedly can affect their purchase behavior [2].With the development of information technology and e-commerce, information is easy to get and shared worldwide, which increases the number of sophisticated consumers in the market, and it is more realistic to consider consumers' sophisticated behavior when studying dynamic pricing problems.Therefore, we develop a sophisticated learning model based on adaptive learning and apply it to dynamic pricing problems.
Most previous literature on dynamic pricing of experience goods has assumed fully rational consumers [3][4][5].But consumers' bounded rational behavior has a significant impact on market demand and corporate profits [6].Moreover, we should pay more attention to consumers' learning behavior when they buy experience goods.There are kinds of learning models describing consumers' learning process.Brandts and Holt [7] propose Bayesian learning model when studying signal game.Sarin and Vahid [8] develop belief learning models.Börgers and Sarin [9] propose reinforcement learning model.Oyarzun and Sarin [10] study the reinforcement learning model in risk decision.All these learning models assume that consumers' learning process is single.However, a fruitful avenue of empirical evidence suggests that the learning process is often a process of mixed learning.Camerer and Ho [11] develop the experienceweighted attraction learning model (EWA) which is a sort of mixed learning models, combining reinforcement learning 2 Mathematical Problems in Engineering model with brief learning model together.Furthermore, they show that EWA learning model fits remarkably better than the existing learning models in several different classes of games.Ho et al. [12] simplify the EWA model further and put forward the self-tuning EWA learning model, which has the same function to predict like the EWA learning model.But EWA model is still adaptive learning model, which cannot be used to describe the sophisticated consumers' learning process.Camerer et al. [2] propose sophisticated experience-weighted attraction model (SEWA) based on EWA learning model describing the sophisticated consumers' learning process.
It is generally accepted that learning model is used to explain the behavior of game and individual decision making, but a lot of evidence suggests that learning model can also be used to explain the actual choice behavior [13,14].So far, many scholars have introduced learning model into their researches on dynamic pricing problems.Some of them use it in solving monopoly pricing problems of experience goods [15].But generally, the market of experience goods is highly competitive.Chintagunta and Rao [16] consider the reinforcement learning model in a duopoly dynamic pricing model.Based on their study, Hopkins [17] compares and analyzes the steady state of reinforcement learning model and brief learning model in dynamic pricing.All these literatures assume that consumers' purchase behavior is a process of adaptive learning, but Camerer et al. [2] suggest sophisticated learning of consumers are more realistic in real world.So it is necessary to take sophisticated consumers' learning process into consideration in dynamic pricing.Amaldoss and Jain [18] found that using the sophisticated experience-weighted attraction learning model (SEWA) could explain the actual customers' purchase behavior better in their experimental study of the dynamic pricing of luxury goods.
Therefore, we study a dynamic pricing model of experience goods with both adaptive and sophisticated consumers and try to illustrate the effect of sophisticated consumers on the market structure more clearly.However, because of the particularity of experience goods, consumers cannot perceive the utility of products before purchasing; thus, there definitely exist limitations when describing the learning process of adaptive and sophisticated consumers with EWA model and SEWA model, respectively.To address these problems, we simplify the EWA model and SEWA model on the basis of belief learning proposed by Sarin and Vahid [8] and enforcement learning proposed by Börgers and Sarin [9].When studying dynamic oligopoly problems, in addition to determining the learning model of consumers' behavior, we also need to choose the suitable equilibrium analysis methods.There are several equilibrium concepts-open-loop equilibrium, closed-loop equilibrium, and Markov perfect equilibrium.We mainly use open-loop equilibrium for it can conduct qualitative analysis on the steady-state solutions and stability and what is more, the analysis is simple [19].The main contribution of this paper is as follows: (1) we propose a dynamic pricing model based on adaptive learning and sophisticated learning; (2) we also perform numerical comparison of dynamic pricing equilibrium solutions between sophisticated and adaptive learning.It can be concluded that, for nonlinear dynamic pricing problems, both models have groups of equilibrium solutions.Compared with the dynamic pricing model of adaptive consumers, however, when that of sophisticated consumers reaches a symmetric Nash equilibrium, differences between the two companies' market share will decrease.
The rest of this paper is organized as follows.The next section introduces the duopoly dynamic pricing model based on the adaptive learning of EWA and analyzes the characteristics of steady states.In Section 3, we study the duopoly dynamic pricing model based on the sophisticated learning model and analyze the qualities of steady-state solutions through numerical solution.In Section 4, we summarize and conclude.

Dynamic Pricing Model of Duopoly Based on Adaptive Learning
We consider the infinite duopoly dynamic pricing model of adaptive consumers.In the market of repeated purchasing, assume that there are two firms-firm 1 and firm 2producing two brands of experience goods (such as daily necessities, etc.).There are lots of consumers in the market and each one chooses only one brand to purchase at every period.When deciding to buy a brand at every period, adaptive consumers update their propensity to a brand according to their previous experience.We use EWA to describe consumers' learning process.And adaptive learning model based on EWA that is composed of three elements.The first one is consumers' propensity for the two goods which we denote by A  = (  1 ,   2 ), and it is determined by goods' prices and consumers' evaluation on quality or goodwill of goods,    =    −   .The prices of the two goods are described as p = ( 1 ,  2 )(  ∈ [0, ∞), ( = 1, 2).Consumers' evaluation on quality or goodwill of goods is denoted by   = (  1 ,   2 ), ( = 1, 2).We assume consumers' propensity for goods is    ∈ [0, ∞); thus,    ≥   and    ∈ [0, ∞).The second one is the choice rule, describing the probability to choose goods on the basis of goods' propensity.The last one is the updating rule, describing the process which consumers update their propensity at each period.Table 1 lists all of the symbols used in this paper.

Choice Probabilities.
Consumers' preference for good  will affect the probability of choosing it in some way.Furthermore, the more degree a consumer prefers a certain good, the more likely he chooses it.That means    should increase along with    and decrease along with    monotonically (where  ̸ = ).There are several probability functions meeting the above requirements, such as Logit, power, and probit.Previous studies illustrate that the Logit forms fit better than the others [20].So we use the Logit function, which is commonly used in the studies about making brand choices with risk and uncertainty [21,22].Based on the Logit rules, the probability of a consumer choosing good 1 from the first firm is The probability that a consumer chooses good 2 is   2 = 1−  1 . is the optimal degree of consumers, and the probability a consumer chooses the optimal product increases with .If  = 0, the consumer will choose every good with equal probability, and if  = ∞, then he will choose the optimal product with the probability of 1.   =   1 −   2 is the relative quality or goodwill of two goods for adaptive consumers;  =  1 −  2 is the relative price of two goods.According to (1), the consumers' choosing probability    ( = 1, 2) is nonlinear and the degree of its nonlinearity depends on the optimization parameter .For example, if  is very small,   1 (  , ) ≈ 1/2 + (  − )/4 approximates a linear function.
We consider the customers in the market as a whole, the number of the consumers is  = 1 and the total demand of the market is ⋅   =    .Thus, we can consider the probability that the consumer chooses good  as the total demand of the market.

Updating Rules.
Prices for experience goods are usually clearly marked on the shelves.Thus, the learning the consumer has to undertake is neither their prices nor their distribution but about the quality or goodwill of experience goods.And consumers are only able to receive a payoff for product after experiencing it.For this information case asymmetry, Erev and Roth [13] propose a reinforcement learning model, the updating rule is where    (),    () is the consumers' estimate for the quality or goodwill of good  and  at , respectively.π () is the utility which is acquired by consuming good  at .A consumer's choice behavior is random due to a given Logit choice rule.In this context we assume that the consumer's experience is also random.So π () is a random variable.We assume   is the average value of π (). is the "recency" parameter, 0 <  ≤ 1.  = 1 means only the last period is remembered.If  approaches to zero, it indicates that previous experience has great effect on present belief.In reinforcement learning model, the good  is purchased at the previous period and the good  is not purchased at the previous period.
Sarin and Vahid [8] propose a "belief-based" learning model and the updating rule is where    (),    (), π (), ,  and  in (3) are the same as in (2).However, considerable empirical evidence suggests that the learning process is often a process of mixed learning.Therefore, in order to accurately describe the consumer's learning process, we combine rules (2) and ( 3) by a similar way with Camerer and Ho [11] and obtain an adaptive learning model based on EWA.The updating rule is where  1 is the "recency" parameter of preference for purchased goods and  2 is the "recency" parameter of preference for goods not purchased, 0 <  1 ,  2 ≤ 1.  and  are the same as in (2).
Assumption 1.The "recency" parameter  1 ,  2 can be seen as coefficient memory and consumers are more impressed for the purchased good than the nonpurchased good.Therefore, we assume that  1 ≥  2 and 0 ≤  =  2 / 1 ≤ 1.  = 0 means that the EWA model turns to belief learning model and  = 1 means that the EWA model turns to reinforcement learning model.
The following Proposition 2 shows the continuous time updating rule of adaptive consumers.

Proposition 2. The continuous time updating rule of adaptive consumers is θ
(5) Proof.See the Appendix.

Equilibrium Analysis of Dynamic Pricing Model.
We suppose the two firms producing a product at a constant marginal cost in time , and marginal cost for both brands is normalized to zero.Then the optimal function of firm's longterm profit is subject to For myopic firm, its short-term profit is described as   ()   (), which is the value function.In order to analyse the existence of Nash equilibrium of our model, we further make the following general assumption on the value function.Proof.Taking the first derivative of   ()   () with respect to   yields Let ℎ  = 1 − (1 −    )  and consider the following two cases.
(2) There exists p such that ℎ  ( p ) = 0.The second derivative of   ()   () with respect to   evaluated at p is equal to where the third equality follows from ℎ  ( p ) = 0.The preceding shows that p is a local maximum of   ()   () and that there does not exist an interior minimum for   ∈ [0, ∞).It then follows that p is unique because otherwise there must exist an interior minimum for   ()   ().Consequently, the function   ()   () increases for   ∈ [0, p ) and decreases for   ∈ ( p , ∞), and therefore is strongly quasiconcave.Lemma 5.There exists at least one Nash equilibrium in dynamic pricing model of duopoly based on adaptive learning (see (6)).
Proof.From Theorem 2.1 in Vives [23], we see that if the strategy sets are nonempty convex and compact, and the firm 's instantaneous profits   ()   () is continuous in the prices of all firms and quasiconcave in its own price, the Nash equilibrium will exist.In our model, although each firm can choose a price from [0, ∞), which is not compact, the firm 's instantaneous profits   ()   () are uniformly bounded, allowing us to construct an equivalent model by restricting firm  to choose a price from a nonempty convex and compact set.In addition, the firm 's instantaneous profits   ()   () a continuous and quasiconcave in   ( = 1, 2).So, there exists Nash equilibrium in dynamic pricing model of duopoly based on adaptive learning.
By analyzing ( 6) and ( 5), we obtain the following steady states of Nash equilibrium.Proposition 6.The steady states of open-loop Nash equilibrium satisfying the following (10) and (11): Proof.See the Appendix.
According to (10) and ( 11), consumer's choosing probability    is nonlinear, so the steady-state solution is also nonlinear, and the steady-state may have multiple solutions.Additional, it can be seen from ( 10) and ( 11) that if  is close to zero, there is only one steady-state solution.
The following proposition shows the relationship between the myopic optimal price and optimal steady-state price.We set p (  ) as the firms' myopic optimal price, and the solution is p (  ) = 1/(1 −    ).(It has been established by Caplin and Nalebuff [24] that Nash equilibrium in oligopoly with Logit demand functions exists and is unique.) ) is the solution of (10) and (11), at any path of the optimal price, each firm's optimal price is  *  (  ) ≤ p (  ) (the equality holds  2 = 0.  = 1, 2).Proof.See the Appendix.

Numerical Results
. The steady-state prices satisfy (10) and (11).Because    is nonlinear equation of   and , so we turn (10) and (11) into the nonlinear equations of   and  and get the steady-state solution by solving equations.In fact, (10) and (11) have several different sets of steadystate solution.For instance, we assume that  1 =  2 = .That is, the two goods are identical.If  = 2,  = 1,  = 2, and  = 0.5, there are three groups of steady-state solutions for (  ,  1 ,  2 ), that is (−0.6866,0.2398, 0.5338), (0, 0, 0), and (0.6866, 0.5338, 0.2398).As is shown in Figure 1, the curve of (10) and the curve of (11) have three intersections.Respectively, the corresponding probabilities are   1 = 0.31,   1 = 0.5, and   1 = 0.69.If  = 2,  = 1,  = 2, and  = 1 ( = 1 means the adaptive learning model based on EWA turning to the reinforcement learning model); there are also three groups of steady-state solutions for (  ,  1 ,  2 ), that is (−1.103,0.196, 0.679), (0, 0, 0), and (1.103, 0.679, 0.196).As is shown in Figure 2, the curve of (10) and the curve of (11) have three intersections.Respectively, the corresponding probability   1 is   1 = 0.224,   1 = 0.5, and   1 = 0.776.If  = 2,  = 1,  = 2, and  = 0 ( = 0 means the adaptive learning model based on EWA turning to the belief learning model); we can see directly that (10) and (11) have only one steadystate solution (  ,  1 ,  2 ) = (0, 0, 0), and the corresponding probability   1 is 0.5.Based on Figures 1 and 2, if 0 <  ≤ 1 and  is big enough, there will always be three groups of the steady-state solution, and the greater  is, the more divergent three sets of steadystate solutions are.If   (0) is a small positive number, as long as firm 1 charges a price lower than myopic optimal price, the third equilibrium solution will emerge,   1 = 0.69 and   1 = 0.776.In other words, by selecting a lower initial price, the firm has the capability of making some naive consumers fascinated and gaining a greater market share.What is more, comparing Figure 1 with Figure 2, it can be seen that the three solutions in Figure 2 are more divergent, which indicates that the two firms have greater differences in their market share under the balanced equilibrium on both sides.In order to illustrate the relationship between the number of steady-state solutions and , we get Figure 3 with  = 2,  = 1, and  = 2. Therefore, we can draw the conclusion that compared to the adaptive learning model based on EWA, reinforcement learning model gives consumers more possibilities to choose their own familiar goods.As a result, consumers gradually are trapped in those products with inferior quality.
What happens if one firm holds a quality advantage?We assume that  1 ̸ =  2 and  2 = 2,  = 1,  = 2, and  = 0.5; then we get the relationship between  1 and   which isillustrated in Figure 4. From Figure 4 we can see that there exists a steady state when  1 < 1.946 and  1 > 2.1005, and there are three steady state when 1.946 ≤  1 ≤ 2.1005.This illustrates that it remains true that dominance by a low quality firm is a possibility if the initial value of goodwill   (0) is sufficiently close to the appropriate steady state.

Dynamic Pricing Model of Duopoly Based on Sophisticated Learning
With the development of information technology and ecommerce, access to information is getting easier.Some consumers may update their thoughts about the quality or goodwill of experience goods not only by their own experience but also taking advantage of other consumers' purchase information.We call this part of consumers as sophisticated consumers.The way they purchase repeatedly can affect their purchase behavior, and the consumer's behavior can affect market structure.Thus, taking sophisticated consumers into account when study the dynamic pricing problems is more in accordance with reality.Like the previous assumptions, there are two firms 1 and 2 producing experience goods with two brands.At each point in time the consumer seeks to buy one unit of the good, either from firm 1 or firm 2.Here we assume that there are two types of consumers on the market, one is adaptive consumers, their proportion is 1 − , and the other one is sophisticated consumers, their proportion is .
Same as adaptive learning model based on EWA, sophisticated learning model is composed of three elements.The first one is consumers' propensity for the two goods which we denoted as A  = (  1 ,   2 ).The prices of the two goods are described as p = ( 1 ,  2 ), and consumers' evaluation on quality or goodwill of goods is denoted by . The second one is the choice probabilities.The last one is the updating rule.The significant difference between sophisticated learning and adaptive learning model is the updating rule.We will describe this difference in detail in Section 3.2.

Choice Probabilities.
Same as the adaptive consumers, the sophisticated consumers' choice probability can be given as and   2 = 1 −   1 .Where   = (  1 ,   2 ) is the sophisticated consumers' estimate for the quality or goodwill of good and   =   1 −   2 is the relative quality or goodwill of two goods for sophisticated.According to (12), the consumers' choosing probability    ( = 1, 2) is nonlinear.The nonlinearity arises from the nonlinearity of the demand function   1 (  , ) and the degree of its nonlinearity depends on the optimization parameter .For example, if  is very small,   1 (  , ) ≈ 1/2 + (  − )/4 approximates a linear function.
We assume the customers in the market as a whole ( = 1) as in Section 2.1.So the total demand of the market is

Updating Rules.
Because sophisticated consumers are easily influenced by other consumers when they make purchase decisions.We assume that, for a consumer, the more the goods he buys, the better the quality of the goods is.In other words, the less he purchases, the poorer the quality is.Therefore, we assume that the credit evaluation for sophisticated consumers after purchasing good 1 is where   1 represents sophisticated consumers' strategy for purchasing good 1.    means other consumers' purchase strategy. is the number of customers buying good 1, while  is those who buy good 2.So the overall number of customers in the market is  =  + .Considering customers on the market as a whole, we get  = 1.
According to the SEWA model proposed by Camerer et al. [2], we have where Similarly, when consumers do not buy good 1, Then we obtain the updating rule of sophisticated consumers as follows.

Proposition 8. The continuous updating rule of sophisticated consumers' is
The poof of Proposition 8 is similar with the poof of Proposition 2. If  = 0, there is only adaptive consumers in the market, namely, the experience inspired model.If  = 1, there is only sophisticated consumers in the market, namely, the AQRE model. = / 1 and it is apparent that sophisticated consumers become adaptive consumers if  = 0, and the larger  is, the greater effect on sophisticated consumers' purchase decision from other consumers.

Equilibrium Analysis of Dynamic Pricing
Model.We also suppose the two firms producing a product at a constant marginal cost and marginal cost for both brands is normalized to zero.Based on the assumption in Sections 3.1 and 3.2, the maximization function of firm's long-term profit is subject to By analyzing ( 17) and ( 18), we obtain the following steady states of Nash equilibrium.

Proposition 9. The steady states of Nash equilibrium for openloop satisfy
where Proof.See the Appendix.
Equations ( 19) are the requirements which steady-state solutions must meet.According to these equations, consumers' choosing probability    is nonlinear, so the steadystate solution is also nonlinear, and the steady-state may have multiple solutions.
Compared with the steady states of dynamic pricing model which only considers adaptive consumers in the last section, the relative market share of two firms is less in the steady states obtained from this section.Whether there are sophisticated consumers in the market, the cognition degree of sophisticated consumers is mainly reflected in the value of  in the model.So, we then analyze the relationship between the steady-state solution and the value of .In the steady-state solutions, the relationship among   1 ,   1 , and  is shown in Figures 5 and 6.
As we can see from Figures 5 and 6, the steady state is gradually convergent with the increasing of .That is to say, in case of different perception for the same quality products, the market share disparity is smaller in the steady state of sophisticated learning situation than that of self-adaptive learning situation.Compared to the market where only exist adaptive consumers, when there are also sophisticated consumers and if   (0),   (0) is small but positive; the firm 1 should choose a lower price which will place him on the third stable state.The market share of firm 1 is less.This indicates that when there are sophisticated consumers in the market, the firms' income is less than the case that there are only adaptive consumers in the market simply by raising initial evaluation of consumers for the commodity.Therefore, when faced with sophisticated consumers, firms should not only raise the initial evaluation of consumers for the commodity but also improve product quality to increase market share.

Conclusion
This paper considers the dynamic pricing problems of experience goods in a duopoly market.First of all, we presented the self-adaptive learning model based on EWA model and applied the model to deal with dynamic pricing problems.With the concept of open-loop equilibrium, we gained groups of steady-state solutions when using nonlinear dynamic programming.In the analysis of steady-state solutions, we get that there will be a dominant company in the symmetrical steady-state solutions.Secondly, we put forward a sophisticated learning model on the basis of previous studies and applied it to dynamic pricing as well.
The dynamic pricing model of sophisticated learning also has multiple steady-state solutions, and there must be one firm dominating when a symmetric steady-state solution exists, which is similar to the self-adaptive learning dynamic pricing model based on EWA.But in the dynamic pricing model based on sophisticated learning, differences of market share between the two firms decrease when the steadystate solution is symmetric.This shows that, compared with the adaptive consumers, sophisticated consumers less easily become locked into the habit of purchasing inferior goods.Therefore, with the market existing sophisticated consumers, firms should not only improve consumers' initial evaluation but also devote themselves to improving the quality of products, so as to occupy the dominate status in the market.While there are most adaptive consumers in the market, firms can successfully make parts of naive consumers obsessed only by increasing consumers' initial evaluation on their products.This paper studies dynamic pricing problems of experience goods in a duopoly market.However, there are lots of firms producing experience goods in the market.Hence, future research can consider the case of many firms in the market.Meanwhile, we assume that consumers are homogeneous and do not consider heterogeneous consumers.Furthermore, in reality, decision makers are always in front of imprecise and vague operational conditions [25].Uncertainties have been tackled in a lot of ways and fuzzy set theory has a long history for handling imprecise values [26].

Appendix
Proof of Proposition 2. Because the consumers' actual preference evolution is random, firms need to use stochastic dynamic optimization method to predict consumer preferences.There are lots of stochastic dynamic optimization methods [27], and in this thesis, we assume that firms use the stochastic optimal approximation theory [28].Firstly, we calculate the expectations of change of    : Stochastic approximation, which has been widely used in the recent literature on learning, shows that if  1 is small, the solution of the original stochastic difference equation to the differential equation ( 5) will be closely approximated by the solution to the following parallel continuous time system [29].

Assumption 3 .
The following assumption hold for all , .

(A. 8 )
Because the steady-state meets the following condition: θ