Microstructure Models with Short-Term Inertia and Stochastic Volatility

Partially observed microstructure models, containing stochastic volatility, dynamic trading noise, and short-term inertia, are introduced to address the following questions: (1) Do the observed prices exhibit statistically significant inertia? (2) Is stochastic volatility (SV) still evident in the presence of dynamical trading noise? (3) If stochastic volatility and trading noise are present, which SVmodel matches the observed price data best? Bayes factor methods are used to answer these questions with real data and this allows us to consider volatility models with very different structures. Nonlinear filtering techniques are utilized to compute the Bayes factor on tick-by-tick data and to estimate the unknown parameters. It is shown that our price data sets all exhibit strong evidence of both inertia and Heston-type stochastic volatility.


Introduction
Financial analysts list speculation, finiteness of assets, interest rates, tick size, price inertia, price clustering, belief heterogeneity, asymmetric information, greed and fear, and so forth as causes for price fluctuations over time.Yet, popular models like geometric Brownian motion (GBM) (e.g., Black and Scholes [1], Merton [2]) or the Cox-Ross-Rubinstein model [3] try to handle all these factors in an overly simple framework, resulting in unnatural phenomena like the volatility smile.Consequently, stochastic volatility, which has been observed in real prices, is often added to the price value evolution (e.g., Heston [4], Jachwerth and Rubinstein [5], Hull and White [6], and Nelson [7]) to avoid the volatility smile.However, which stochastic volatility model fits the market data best?
Nowadays, many authors talk about the misspecification of stochastic price-volatility models (including the Heston model which we show favorably herein) so much.It leads us to wonder whether there are missing ingredients to these very simple models.Even combined stochastic value-volatility models do not address tick size, price inertia, price clustering, hidden liquidity, and fear-greed cycles that traders, especially high frequency traders, must deal with.To handle these issues, one is drawn to tick-by-tick microstructure models and left with the perplex question: How should one model price inertia in continuous time?We are using the term price inertia instead of the related term price momentum because we are not weighting transaction prices by volume.Fractional Brownian motion (FBM), best known for its long memory properties, exhibits inertia and has been used to model markets (Mandelbrot [8], Shiryaev [9]) even though these models allow arbitrage strategies.We speculate that FBM's success in modeling observed data is more attributable to inertia than long memory.However, we introduce an alternative inertia process and show that this new process better satisfies the desired properties of inertia than FBM.We then show strong statistical evidence of price inertia that lasts for hours or days using Bayes estimates and Bayes factor on real price data.We do not consider the possibility of arbitrage nor determine derivative prices for our models but rather leave these interesting mathematical finance questions to the experts.(See Capinski and Zastawniak [10] for an excellent introduction to these types of questions and to mathematical finance in general.)Also, we leave the difficult task of obtaining theoretical error bounds for our particle filter methods to other works.(See, e.g., Kouritzin and Zeng [11] and Del Moral et al. [12] for related work on approximate filters.)Our focus is solely on modeling observed stock price data and the methodology of determining which of a class of models best fits the observed data.
High frequency data contains complete marketparticipant trading activities (Engle [13]) and is modeled using microstructure (Black [14], Chan and Lakonishok [15], Hasbrouck [16,17], Engle and Russell [18], Engle [13], and Bandi and Russell [19]).Unlike the macrostructure market, the trading noise in the microstructure market is not negligible; thus, the intrinsic asset value is not readily discernable.In this paper, we introduce a class of dynamic microstructure models, where the transaction price is formulated as a distorted and color-noise corrupted variant of the intrinsic asset value with the intrinsic asset value being a traditional stochastic value-volatility process.Indeed, we view the transaction price data as random counting-measure observations of intrinsic value corrupted by microstructure trading noise with such things as inertia and fear-greed cycles built in.However, trading noise sources themselves introduce volatility to transaction prices.This raises the question, "Do we need to model stochastic volatility explicitly in the presence of dynamic microstructure trading noise?"We will give strong evidence of the presence of stochastic volatility through Bayes factor methods and stochastic filtering theory.Moreover, we also utilize model selection to provide strong evidence of Heston-type volatility over competing stochastic volatility models based on the observed transaction data in a microstructure market.This suggests that the common viewpoint of the Heston model being highly misspecified might be better stated as overly simplistic macrostructureonly models are underspecified.Bayes factor (see, e.g., Kass and Raftery [20]) is our preferred model selection method since it provides statistical comparisons in real time as to which model best fits the market data while allowing the stochastic value-volatility (signal) models to be singular to one another.Indeed, to use the Bayes factor method, we need only to be able to transform all microstructure asset-price observation models of interest into the same canonical process via Girsanov-type measure change.
Previously, Zeng [21] studied a filtering equation for inferring the intrinsic value process in a microstructure model while Xiong and Zeng [22] proposed a branching particle approximation to this equation.Kouritzin and Zeng [23] derived a Bayes factor equation and discussed the Bayesian model selection problem to determine whether financial data, such as stock prices, display jump-type stochastic volatility.However, all these works are based on a restricted microstructure model and thus cannot be applied to our general setting.Moreover, our problems of showing statistical evidence of inertia and determining which of the classical stochastic volatility models best represents real data in the presence of microstructure noise were not considered.We also propose a new inertia process, explain its role in modeling prices, and show its statistical significance with real tick-by-tick data.
Section 2 is devoted to explaining our model.First, our five standard value-volatility models (GBM, Hull-White, Log Ornstein-Uhlenbeck, continuous GARCH, and Simplified Heston) are given followed by our microstructure inertia process and its properties and then the other components of our dynamic microstructure model.Together the valuevolatility and microstructure components form our price evolution model, which, at the end of Section 2, is interpreted as a filtering model.In Section 3, we discuss model calibration and fair price/value estimation through Bayesian filter estimation.A filtering equation and a branching particle filter approximation algorithm are first given and explained.Then, their use to identify parameters and come up with initial state estimates is discussed.Finally, numeric parameter and initial state estimates for each model are given.As a byproduct, it is demonstrated that proper modeling and estimation of fair price (as is done herein) can provide information about overbought conditions and help avoid financial loss (see Figure 4).Section 4 is dedicated to Bayesian model selection.We first motivate the use of Bayes factor for model selection and explain how to estimate Bayes factor from unnormalized particle filters.Then, we establish strong statistical evidence of inertia and Heston-type volatility in all our price data through model selection using the Bayes factor method to test which fair price-volatility model and what amount of inertia best fit the observed price data.

The Partially Observed Market Model
In this section, we build our stochastic model that has macrostructure and microstructure components and interpret this model in terms of a signal that needs to be estimated in real time and observations which are used to form the signal estimates.The macrostructure model consists of fair price, volatility, and related parameters and will be denoted by (, ) in the sequel, with  = (, ) being price and volatility and  being the parameters for this model.Unlike macrostructure models, we do not assume access to (, ), but rather we take it to be part of the signal to be estimated.Indeed, a model would be judged to be better if the macrostructure price  (which represents a "fair" price) is quite different than the observed price and we can use filtering to determine overbought and oversold situations.
The microstructure price construction converts the macrostructure model into the observed price.Such things as inertia (or momentum), fear-greed cycles, and wholeprice clustering (or rounding), which are not part of the fair price, are incorporated into the microstructure model.A distinguishing feature in our microstructure is dynamic state: To allow the microstructure to influence price over a period of time so that the observed microstructure price can differ from fair price significantly, one needs to add and then estimate microstructure state .In particular, the inertia process, characterized by a parameter ℎ, is introduced to capture price inertia that might be caused by hidden liquidity; various reaction and access times to information as well as momentum traders themselves.This inertia process is not Markov, so we will have to consider the historical version Ẑℎ of this state.Further, Ẑℎ is also unobservable and hence must be added to the signal along with microstructure parameters  and all must be estimated as nuisance parameters.
The nondynamic part of the microstructure noise consists of rounding and clustering noise.It is widely observed in markets that more trades occur at more even prices like whole nickel or whole dollar levels.Therefore, to match observed prices well, we should have a mechanism to convert evenly distributed raw prices into whole-price-biased observed prices.This is done by binning raw prices into sets  1 ,  2 ,  3 ,  4 , and  5 depending on how even they are and then randomly moving raw prices in the less even bins to close prices in the more even bins in order to match the observed prices.
The observations then become the marked counting process of the number of trades that occur at the various prices.We will later use these observations to select and calibrate models and to estimate the augmented signal: (, , , Ẑℎ ) . ( The whole point of the microstructure is to allow the macrostructure price to distinguish itself from the observations and rather to represent fair value.We then use filtering on asset prices to estimate implied value (hereafter called fair price) and thereby judge whether an asset is overbought or oversold.
For any stochastic process , its natural filtration, defined as F   ≐ {  : 0 ≤  ≤ }, represents the information in  up to time .N 0 denotes the set of nonnegative integers and, for any Polish space , () is the set of all bounded measurable R-valued functions on .

Common Macrostructure State Models.
We use a macrostructure model  = (, ) for the unobservable fair price together with its volatility and parameters.Here,  ∈ R   is the macrostructure financial state (fair price plus volatility) with macrostructure parameter  ∈ R   for some   ,   ∈ N 0 .We let  be a probability distribution on R   +  , take A to be a generator with domain D(A) ⊂ (R   +  ), and assume (, ) satisfies the martingale problem.Definition 1. (, ) is the unique solution of the R   + valued martingale problem for A with initial distribution .That is, is {F ,  }-martingale for each  ∈ D(A).Moreover, if ( X, θ) also satisfies (i) and (ii), then (, ) and ( X, θ) have the same finite dimensional distributions.
Remark 2. While  does not vary in time, we include it in our macrostructure model to be estimated because it is still unknown.Nevertheless, the operator A does not act on the variable  since   / = 0 for our fixed parameters.
The martingale problem formulation (2) (see Stroock and Varadhan [24], Ethier and Kurtz [25] for more details) is general enough to cover most interesting financial models.In this paper, the macrostructure state  consists of two components: the fair price  and the stochastic volatility  (if any).The most common example of (, , ) in finance is the "geometric Brownian motion" (GBM) utilized in the classical Black-Scholes option pricing formula.Throughout this section,  and  are two independent standard Brownian motions and (, V, ) ∈ R   +  .
Example 3 (GBM model; see Black and Scholes [1], Merton [2]).We have that with parameters  = (, ), corresponds to our martingale problem with the generator In GBM model, the volatility  is a constant.To account for the "volatility smile" commonly observed in market option prices (see Jackwerth and Rubinstein [5] for a detailed survey), the GBM model is generalized to stochastic volatility (SV) models, where  itself is replaced by a stochastic process { 1/2  ,  ≥ 0}.Some of the popular SV models include the following.

Mathematical Problems in Engineering
Example 6 (continuous GARCH model; see Nelson [7]).We have that with parameters  = (, ], , ) and generator Example 7 (simplified Heston model; see Heston [4]).We have that with parameters  = (, ], , ) and generator We label this example as simplified because we do not allow  and  to be correlated as Heston did.There is no mathematical issue by including this correlation, but it would add a parameter to the model, which increases computation time.The Heston model already performed the best without this parameter.GBM (with microstructure) plays a special role in our study as it is our no stochastic volatility model.We will compare our other models against it on real data to determine whether stochastic volatility is present.In summary, refer to Table 1.
Remark 8.The continuous GARCH model is the continuoustime limit of many classical GARCH-type discrete-time processes (Nelson [7], Drost and Werker [27]).We did not consider jumping stochastic volatility models (e.g., Elliott et al. [28], Kouritzin and Zeng [23], Duffie et al. [29], Eraker et al. [30], and Eraker [31]) or models where ,  are correlated, due to our need to dedicate our limited computer resources to handling our complicated (non-Markov) microstructure with inertia.Still, we want to emphasize that the computational complexity we experienced is fundamental to the fact that we are using non-Markov (inertia) models and has little to do with our particular methods.Indeed, our Bayes factor filtering methods are what makes the computations possible on an inexpensive contemporary desktop computer.

Construction of Microstructure
Price.The fair pricevolatility models account for the random variances of the intrinsic asset value; thus, the selection of proper SV model is crucial for investing, derivative pricing, and hedging.On the other hand, microstructure noise (see Black [14], Hansen and Lunde [32], Duan and Fulop [33], etc.) causes random

GARCH
(4)  (, ) (, ], , ) A (4) Heston  (5)  (, ) (, ], , ) A (5)   perturbations of transaction price from its intrinsic value and the disregard of such trading noise introduces severe bias into stochastic volatility estimation (see Duan and Fulop [33]).We incorporate microstructure trading noise into traditional fair price-volatility models and use statistical filtering to reveal such things as short-term inertia in the trading noise and stochastic volatility in the intrinsic value.
In microstructure markets, the price changes occur only at irregularly spaced transaction times  1 ,  2 , . . .with total trading intensity () (see Engle [13]).Here, we assume () is just a time-varying measurable function as the empirical analysis illustrates that there is no need to consider more general structures.At each transaction time   , the transaction price    is formulated as where  is some nonlinear random field modeling the trading noise.Formulation ( 13) is similar to that of Hasbrouck [16], where  is the intrinsic and permanent component while  introduces the transitory component.The empirical evidence reported by Hansen and Lunde [32] suggests strongly that the trading noise is serially correlated.Similar results can be found in Aït-Sahalia et al. [34].Indeed, there exist situations in which the trading noise variance estimate is zero if the trading noise is simply assumed to be independent (see Duan and Fulop [33]).This does not mean there is no trading noise but rather that the trading noise is autocorrelated.To characterize this correlation, Hansen and Lunde [32] assume the trading noise to be some Gaussian random sequence with stationary covariance and finite dependence.However, this model is most suitable for the low-frequency data and ignores many crucial microstructure effects.We build correlation into our microstructure information noise through inertia and mean-reversion while utilizing microstructure rounding and clustering noise to explain the discreteness and whole-price biasing.

Inertia.
The idea of momentum or inertia has been used in many studies (see Jegadeesh and Titman [35], Moskowitz and Grinblatt [36], Grundy and Martin [37], Grundy et al. [38], etc.).Basically, there is the tendency for a stock to continue to move in one direction.To illustrate our approach, we introduce the following definition.
is called the inertia function.
The idea behind our definition is that for inertia we should expect  + −  and   − − to have the same sign for  > , but close to  and ,  > 0 small.We strengthen this condition to lim Many processes have inertia.However, to model the stock price effect of the information reaching all market participants, we want the following five properties: (1)   is Gaussian and driftless and Var(  ) is proportional to  so  resembles Brownian motion; (2)    is finite, not infinite, indicating that the influence of past values on immediate future is not too strong; (3)   makes sense from informational and hidden liquidity points of view; more precisely, it can explain well the price effects due to the reactions of all market participants to information and rumor being diffused and simulated over a period of time as well as due to the purchases or sales of an agent spreading out a large change in his/her position over time; (4)  is easy to simulate using, for example, the Gaussian property; (5)  is easy to analyze.
Neither a Brownian motion  nor more generally a square integrable martingale has inertia.Brownian motion with drift where ℎ ∈ (0, 1) is the Hurst parameter.Therefore, Thus, the inertia function of  ℎ is infinity for all  if ℎ > 1/2 (and is −∞ if ℎ < 1/2).Neither case satisfies our five properties.Still, standard representations of FBM motivate the creation of driftless inertia by convolving a Brownian motion with the desired impulse response for information dissemination.With this in mind, we consider the following inertia process.
Remark 11.  ℎ  is a weighted average of the historical information (the first term) and fundamental information (the second term).In fact, tanh(/Δ) can be viewed as the impulse response on price created by market participants receiving and simulating the "information"    and Δ determines the diffusion speed in the market.This formulation captures the idea that news or rumor and its ramifications require time to be fully disseminated and understood.When ℎ = 1, it represents the case of only historical information resulting in the strongest inertia in prices.Alternatively, we can use inertia to explain "hidden liquidity."If everybody knew that an agent was going to make a big change in a position, then the price would immediately jump.However, if the agent breaks up the desired change into small transactions, then it takes time for this extra buying or selling pressure to be recognized in the market.In this case, ℎ = 1 represents the case, where all changes in position are done over a period of time and Δ represents the time to effect 58% of the positional change.
Note that  ℎ  is a centered Gaussian process such that the autocovariance is positive for any  ≥ .In particular, Thus, Var( ℎ  )/ converges to 1 as  → ∞ with speed determined by Δ. (Hence, informational noise increases at the same asymptotic rate as Brownian motion.)Moreover, and, using standard antiderivatives, Mathematical Problems in Engineering Thus, the inertia function of our inertia process is and this happens quickly for small Δ.We can thus verify that  ℎ , defined in (18), satisfies our five desired properties.One can also look upon Δ as the time for new information to be disseminated to fifty-eight percent of the market.Below, we consider three different dissemination times: Δ = 40 minutes, Δ = 2 hours, and Δ = 1/2 day on real stock data.Finally, the fact that  ℎ is Gaussian eases its simulation greatly.
where  ℎ,Δ is the dynamical part of the microstructure through which inertia is introduced (with our inertia process  ℎ ) and  = (, ).The case  ℎ,Δ ≡ 0 is of particular importance in the sequel as it represents the nondynamical microstructure case and is used as a calibration model.The information noise consists of two parts: is a sequence of independent standard Gaussian random variables,  > 0;  ℎ is Ornstein-Uhlenbeck-(O-U-) like inertia velocity process with mean-reverting parameter  > 0. Here,  ℎ , , and  are independent and  0 is a constant. ℎ provides an intuitive continuous-time model that accommodates the joint presence of the inertia and mean-reversion.Our information noise is more reasonable than that of Zeng [21] in that (1) we preclude the possibility of negative prices by using multiplicative noise; (2) the stochastic inertia process  ℎ captures the empirical feature of the inertia observed in transaction prices (e.g., Jegadeesh and Titman [35]); (3) the mean-reverting structure of  ℎ when combined with the inertia captures the cyclic property of prices (e.g., Black [14]). ℎ is not a Markov process, so we introduce its historical process as which is Markovian.Moreover, Ẑℎ  ∈ [0, ], the space of all continuous functions on [0, ], since the paths of  ℎ are continuous.Consequently, we augment the state vector to be where  = (, ) is the microstructure noise parameter set.The advantage of this formulation is that we can estimate Ẑℎ and thus  ℎ jointly with other components using particle filtering methods.The generalized state incorporates fair price, volatility, parameters, and the historical trading noise Ẑℎ while keeping the tractability of a Markovian framework.
Remark 12.We include neither ℎ nor Δ into the model parameters but rather consider different models corresponding to different values of ℎ and Δ as well as different SV models 1-5.Indeed, we will provide evidence of inertia in the sequel by using Bayesian methods to select a model with a large value of ℎ based upon tick-by-tick stock data.[39]).Since we are concerned with price clustering for decimal pricing in stock markets, we let  = 100.
It is well documented that there is price clustering to more whole prices.To quantify this price clustering, we examine the price behavior for three NYSE-listed stocks over April 2010 (Figure 1 and Table 2).(In a larger study, we considered eight NYSE stocks in different sectors.However, we only report on three here to conserve space.The results for the other five were similar in nature.) The transaction data of these stocks shows there is modest clustering at multiples of 5 cents as shown in Figure 1, plotted in terms of pennies.Supposing the raw price Y   falls in the interval [  − 1/2,   + 1/2), then if there was no clustering noise, the trading price    would just be   .Thus, Equivalently, we can write  in terms of the historical process as where Π   is the projection onto time   ; that is, Clearly, (  | , , ) is a smooth function of (, , ) for each fixed   .
To build the observed whole-price bias into our model, we introduce the following sets: that are not multiples of 5} ,  2 = {The integers in (0, 100] that are multiples of 5 but not of 25} , While the raw price will be uniformly distributed over  1 ∪  2 ∪ 3 ∪ 4 ∪ 5 (or rather the continuous interval (0, 100]), the observed price model must bias  2 over  1 ,  3 over either  2 or  1 , and so forth.We distribute the observed price randomly over  1 ∪  2 ∪  3 ∪  4 ∪  5 based upon the raw price in a biased manner favoring the more whole-price ticks in  2 ∪  3 ∪  4 ∪  5 .In particular, if the fractional part of the raw price  rounded to the nearest cent is in  1 , then the observed value will stay at the same price with probability 1 −  or move to the closest multiple of 5 cents, that is, the closest tick level in  2 ∪  3 ∪  4 ∪  5 with probability .Then, if the fractional part of the price  is in  2 , it will stay in the same level with probability 1 −  or move to the closest tick level in  3 ∪  4 ∪  5 with probability .Finally, if the fractional part of the price  is in  3 , then it will stay in the same level with probability 1 −  1 −  2 or move to the closest tick level in  4 with probability  1 and the closest tick level in  5 with probability  2 .In summary, the transition probability function is obtained iteratively by the following.( Case 3. If the fractional part of   belongs to  3 , where Moreover, we have to handle the case  = 0 separately to avoid negative prices.Remark 13.Our clustering setup is designed to work well for intrinsic prices over $1.For real penny stocks, our setup would introduce positive bias and should be modified slightly.
Using relative frequency analysis on the aggregate of our three stocks, we found the values presented in Table 3.
The large degree of clustering exhibited, especially to the whole dollar, might be considered surprising.However, earlier studies of Huang and Stoll [39], Chung et al. [40], and Chung et al. [41] also showed significant clustering.Moreover, the degree of price clustering in NYSE is weaker than that of NASDAQ.For example, Barclay [42] examined 472 stocks from NASDAQ before and after their listing in NYSE or American Stock Exchange (AMEX): before the listing, the average fraction of even-eighths (0, 1/4, 1/2, 3/4) is 78% while thereafter it drops to about 56%.

Nonlinear Filtering Model.
Our price process can be formulated as a marked point process ⃗ : a sequence of random vectors ⃗  = (  ,    ,  ≥ 1), where   ∈ [0, ] denotes the time of th-trade and    the corresponding trading price.Accordingly, the mark space of ⃗  is (, E), where  = N 0 and E is all its subsets.Here,  ∈  corresponds to the th-tick level /.For each  ∈ E, we associate the counting process to count the trades in tick level set  up to time .In particular, for  ∈ , denotes the total trades at th-tick level / until time .Equivalently, we can introduce the random counting measure The natural filtration, that is, information content, of  is Now, we assume the following.(C1) The total trade process   =   () admits an intensity () for some positive measurable function .
Therefore, using the conditional probabilities defined in the previous subsection, we find that   () has intensity To simplify the notation, we rewrite (44) as   =  ⋅   .For our present work, we estimated total intensity function () from intertrade data allowing for intraday variation.Figure 2 is the intertrade duration histogram of our 3 NYSElisted stocks averaged over all times of the day.We divided the intertrade data into half-hour periods over the course of the day and took  to be constant over these half-hour periods:

𝑎 (𝑡) =
Average number of trades in period 1800 seconds (45) for  in that daily period.(C2) There exist some positive constants ,  such that  ≤ () ≤  for all .
Based on representation (40), ( 44), (, , , Ẑℎ ; ) is framed by a partial-observation model, where (, , , Ẑℎ ) is the state (signal), which is partially observed through the infinite dimensional counting process .One difficulty in calibrating these models is that their transition probability functions are usually unknown in closed form, so maximum likelihood estimation (MLE) methods are difficult to use (see Aït-Sahalia and Kimmel [43] for further details).Instead, we use Bayesian filtering because (1) Bayes estimates do not require the availability or regularity of the full likelihood functions; (2) Bayes estimates can be computed recursively for our tick-by-tick data; (3) Bayesian hypothesis tests can be conducted through Bayes factor, which is the ratio of marginal likelihoods and is easily computed even when the signals are of different dimension or, more generally, singular to each other.

Model Calibration
Our foremost goal is to contribute to the process of model building for financial markets both by suggesting elements to be included in the models and proposing methods to select models based on real observation data.To be able to do this effectively, we need to be able to tune each possible model effectively to get good prior (probability distribution) estimates for the complete signal (, , , Ẑℎ ) before the test period.We do this through nonlinear filtering and in particular through particle filtering.In this section, we first introduce the filtering equations for our problem.Then, we introduce a branching particle filter algorithm that is an approximation to the unnormalized filter and can be implemented on a computer.Next, we explain how we did the calibration (i.e., came up with this prior distribution) and finally we give the results for the models of interest herein.

Nonlinear Filtering Equations.
The available information about (  , , , Ẑℎ  ) is the observation filtration F   ⊂ F  , defined in (43), and the primary goal of nonlinear filtering is to characterize the conditional distribution or, equivalently, for  ∈ (R   +  +2 × [0, ]).Here,  = (, ), Ẑℎ is the long memory portion of our information noise and (, ) is the state and parameter of our fair price-volatility martingale problem.
Remark 14.Actually, we often only want to estimate P[(  , ) ∈ ⋅ | F   ], but there is no simple recursive formula for this marginal.The filter is naturally model dependent, so we can produce different filtering processes for each model, that is, for each SV choice (1-5), each value of Δ, and each value of ℎ in our inertia process.
Suppose   is a positive constant for each  ∈ N 0 such that  ≐ ∑ ∞ =0   < ∞, and consider the continuous-time likelihood function is a martingale under Condition (C2) and Q, defined by is called the reference measure.Under Q, the observations are just a Poisson measure, independent of the state vector (, , , Ẑℎ ), with mean measure ( × (0, ]) = ∑ ∈   × (0, ].To make the likelihoods more manageable in the particle filters to follow, we choose  to be a long time average value (1/) ∫  0 () of () and  →   to be highest where the trades will be more concentrated.Bayes Theorem (see Bremaud [44], p. 165) then links the desired (real-world) conditional distribution   with the unnormalized filter   by where the unnormalized filter   is defined by for all  ∈ (R   +  +2 ⊗ [0, ]).Now, we can give the evolution equation for   .
Theorem 15.Under (C1) and (C2), the unnormalized filter   is the unique measure-valued solution of the stochastic filtering equation for  > 0 and  ∈ D(A).
This theorem is a modest generalization of prior results and can be obtained in much the same manner as results in Kouritzin and Zeng [23] and Xiong and Zeng [22].Here, A is the generator of the joint martingale problem to (, , , Ẑℎ ) obtained from A, the generator of state (, ) and A  , the generator of the historical process Ẑℎ .We do not need an explicit formula for A. Instead, we can use particle filters to approximate   .
Henceforth, it is convenient to think of the reference measure Q as the standard measure from which we can construct the measure P ,,,ℎ,Δ corresponding to model  ∈ {1, . . ., 5} with parameters  and microstructure with parameters , ℎ, and Δ.

Particle Filter.
The weighted filter is the simplest of particle filters.The idea behind the weighted filter is that, by the independence of signal (, , , Ẑℎ ) from the observations  under Q, we can create an infinite collection of particles {  }  =1 = {(  ,   ,   , Ẑℎ, )}  =1 , each having the same law as (, , , Ẑℎ ) that are also independent of the observations.Then, it follows from the law of large numbers that for Qalmost all  we have the weak convergence of finite measures Unfortunately, it is well known that the weighted particle filter may not work well for a fixed number of particles .Roughly speaking, most of the particles diffuse away, do not track the signal well, are assigned low likelihoods, and do not really affect the average  ,  .Meanwhile, very few particles do match the observations better and have likelihoods that are orders of magnitude higher than of the majority of particles.

Mathematical Problems in Engineering
,  essentially becomes an average over too few particles to reflect   well.
We also initialize the number of particles to N 0 =  and particle likelihoods all to A 0 = 1.

Particle Weights and Average
Weight.We simulate using the reference measure Q and we incorporate the observations based upon (48).At the th observation (  ,    ), the th particle's weight is multiplied by where   =    .Hence, the th particle's weight becomes and the average weight is 57) by continuous paths.Here,    depends on the observation  and the increment of likelihood ratio of measure P over measure Q defined by (48) given the simulated particle path realized on the interval [ −1 ,   ).These weights do not depend upon the parameters  directly.This is common and is why the observations are often called partial observations.We still can estimate  and include these parameters as part of the particles' states since they do affect stock price , which is observed in the presence of noise and distortion.The weights are stored along with the states of particles before resampling.

Resampling.
After weighting, we resample the particles pruning the unlikely ones and duplicating the better ones in an unbiased manner.In particular, we let    be ( L  /A  − ⌊ L  /A  ⌋)-Bernoulli random variable independent of everything and produce ⌊ L  /A  ⌋ +    particles at location P   .We then give all the particles weight   and let 3.2.5.Unnormalized Filter.Now, we can estimate the unnormalized filter at the th observation time,    , by The actual algorithm that was implemented is as follows.
Repeat.For  = 0, 1, 2, . .., do ; (4) average weight: A +1 =    +1 (1); (5) repeat: for  = 1, 2, . . ., N  do (a) offspring number: Remark 17. (i) We extract our estimate before resampling to avoid excess noise.(ii) The key step is (5) that determines the new number of particles N +1 and weights L  +1 in an unbiased manner.The result is zero or more particles all having the average weight at the same location as the parent.(iii) The particle evolution would typically be done via Newton's or Milstein's method.
Since the above algorithm produces unbiased resampling of the weighted particle filter, it is quite reasonable to believe the following result.The technicality of this result's proof would detract from our applications so it is omitted.

Calibration and Historical Training.
To keep the problem size manageable, we just used the clustering parameter estimates of , ,  1 , and  2 given above as the actual values throughout our simulations.
One is often faced with the problem of estimating initial distributions for fair price, volatility, and the parameters prior to filtering over the time interval of interest (April 2010 here).Our approach was to make arbitrary assignments very far in the past (January 3, 2000, to be precise) and then do an excessive amount of prior particle filtering, relying on the ability of the filter to forget its starting point and to produce   reasonable distributions at a much later point, April 1, 2010.
(See, e.g., Ocone and Pardoux [49], Delyon and Zeitouni [50], and Atar [51] for mathematical results regarding this phenomenon.)This had to be done for every model, namely, every combination of our three stocks, five SV models, and multiple microstructure models, characterized by inertia parameters.Our main purpose in this historical training was to get a starting joint distribution for (, , , Ẑℎ ) as of April 1, 2010, under each model combination.Due to the large number of cases this produced, we first display and discuss two models: the nondynamical microstructure Heston case and the median inertia dynamical case where ℎ = 1/2 and Δ = 7200 s (i.e., 2 hrs) in the inertia microstructure model.Also, to ensure that  and  did not converge to a single value, we made them vary slightly in a random manner; that is, we replaced the equation  = 0 with   = V  for a very low variance Brownian motion V.
In Figure 3, we illustrate our prior filtering of PepsiCo.The choppiest curve is the actual stock price while the smoothest curve is the filter's fair price estimate [  | F   ] using the Heston SV model with (median) microstructure inertia.The middle curve is the filter's fair price estimate [  | F   ] using the Heston SV model without dynamics in the microstructure; that is,  ℎ = 0.These curves go beyond April 1, 2010.However, the required initial distributions were taken from the filter at that point.
Notice from Figure 3 that the implied fair price process estimate is far less volatile in the presence of dynamical microstructure than without.This lower volatility for fair price is highly desirable.It does not make sense that the fair price of a stock should fluctuate dramatically from day to day or within a day in the absence of an event, but rather these short-term fluctuations are better explained by trading noise.Moreover, fair price is a mathematically more optimal version of moving averages, which are used to judge value and momentum from, and so fair price estimates should inherit the smooth nature of such moving averages.

Numerical Results
. The data is one month (April 2010) of transaction prices of our three NYSE-listed stocks.Our filter produces Bayes estimates to the macro-and microparameter vectors  and , respectively.These estimates in the nondynamical microstructure case (i.e., using the simpler form in ( 24)) for PepsiCo are as shown in Table 4.All parameters are estimated using time in seconds.Our PepsiCo Bayes estimates in the median inertia case are as shown in Table 5.
While it is difficult to read much from these numbers, we can see that the main volatility parameters ], , and  are mostly smaller when dynamics is included in the microstructure.This further justifies our conjecture that at least some stochastic volatility is better replaced by microstructure with dynamics.
Figures 4 and 5 show the conditional expectation fair price estimation for Goldman Sachs and PepsiCo, respectively, in the cases of no dynamics and median inertia dynamics for each of our SV models.There are a total of eleven curves in both figures.The most volatile curve is the stock price itself over this month.The smoothest curves somewhat separated from the stock price are the fair price estimates using the five SV models with (median inertia) dynamical microstructure.The remaining five curves (that hug the stock price in Figures 4 and 5) are our fair price estimates for our five SV models with nondynamical microstructure.In this last case, the microstructure does not have the power to separate the fair price and actual stock price to any large degree.It is important to realize that these pictures are really just a one-month snapshot of a much bigger multiyear filtering process.This explains why many of the fair price processes are significantly different than the actual stock price on April 1, 2010: The filter is estimating that the difference is due to the microstructure.It is apparent that adding dynamics to the microstructure allows the estimated fair price process to differ significantly from the stock price.Indeed, there is a significant correction of all three stock prices (especially Goldman Sachs) towards estimated fair price of the models with (median inertia) dynamical microstructure.This produces a compelling reason to use models with microstructure dynamics.You would be estimating that the stocks were significantly overvalued before the correction if you used the model with microstructure dynamics and this could be The filters provide conditional distributions and estimates for more than just fair price and parameters.Table 6 shows the average volatility estimates without microstructure dynamics (see (24)) and with (the best performing) microstructure inertia using the simplified Heston SV model.We only highlighted Heston here because (1) we will show evidence below that Heston performs the best and (2) the volatility estimates of the other SV models behave similarly.The amount of stochastic volatility estimated when there is (median inertia) dynamics in the microstructure shrank to a couple of percent of what it was without.This really suggested that by far the primary use of stochastic volatility is as a proxy for microstructure with dynamics and further raises the question about the need for stochastic volatility in the presence of microstructure dynamics.
The final and most difficult quantity the filter estimates (in the dynamical microstructure case) is the historical noise.For practical purposes, we can not let the historical path go back all the way to year 2000, but we found that there is not much loss if we just update discrete samples over the previous three years, which is still a tremendous amount of data.Also, we can not plot these historical paths so we just plot the projection onto the current time; that is, we just plot  ℎ  even though we must propagate the Markov process Ẑℎ  in the filter.Figure 6 shows the noise estimate for PepsiCo.In this graph, we look at the effect of inertia.The curves where ℎ = 0 represent the no-inertia case, so  0  is just an Ornstein-Uhlenbeck process.Conversely, the case ℎ = 1 represents the one hundred percent inertia case and  1   is not Markov.We see from these graphs that the amount of estimated noise is very similar indicating that the amount of inertia modeled might not be that significant.However, the noise processes where ℎ = 1 are far smoother due to the inertia.Below, we will produce strong evidence that inertia is important and find that the best ℎ is in the range [0.4,1], depending upon the stock.We compare the behavior of our models in terms of the SV models and the inertia parameters ℎ and Δ within the Bayesian model selection framework in the following section.

Evidence for Inertia and Stochastic Volatility
The main objective of this section is to use Bayes factor to investigate the model selection in microstructure markets.
To use the Bayes factor method, we need only to be able to transform all observation models of interest into the same canonical process via Girsanov measure change.The signal models can be singular to one another.Kouritzin and Zeng [23] discuss the Bayesian model selection problem.However, their equations do not apply to our models.

Model Selection and Bayes
Factor.Consider our five SV macrostructure fair price-volatility models where the generators of the martingale problem to  () are, respectively, A () for  = 1, 2, 3, 4, 5. Normally, we would have to consider a multitude of parameters  resulting in a plethora of models.However, by our calibration process we have reduced the setting to one parameter set per martingale problem so we have a base of five models.However, we still have to consider the various choices for our inertia.For simplicity, we restrict ourselves to three distinct values for Δ, eleven choices for ℎ, and we use the calibration process to estimate the other microstructure parameters .Therefore, we have a total of 5 × 3 × 11 = 165 models to test.The likelihood of  being produced by model (, ℎ, Δ) up until time  is Here, () is the counting measure on  = N 0 and the same observations and observation rate information are used for all models.One can think of  (,ℎ,Δ)  as the likelihood ratio of the model  (,ℎ,Δ) with distribution P (,ℎ,Δ) characterized by (, ℎ, Δ) to the simple (or null) model  0 with distribution Q where the observation prices just arrive according to a Poisson measure with intensity measure () = ∫    (), that is, with rate independent of any macrostructure model and independent of any microstructure state.In other words, ( (,ℎ,Δ)  ) −1 = (Q/P (,ℎ,Δ) )| F  then transforms the observations into the same Poisson measure with intensity measure () = ∫    () regardless of (, ℎ, Δ).Unfortunately,  (,ℎ,Δ)  depends upon  ()   ,  ℎ,Δ  , which are unknown so we can not select models via the likelihood.

Bayes
Factor.The available information in microstructure market is the observation process , which represents the cumulative transaction records throughout all tick price levels.The normalized filter  (,ℎ,Δ)  ,  = 1, 2, 3, 4, 5, ℎ ∈ [0, 1], Δ > 0, satisfies where and  (,ℎ,Δ)  (1) is the integrated (or marginal) likelihood of  for model (, ℎ, Δ).Now, we use Bayes factor to compare models.The Bayes factor determines which model best fits this observed data by doing pairwise comparisons.We define Bayes factor of model  (,ℎ,Δ) to the null model by the conditional likelihood: which is consistent with more basic definitions of Bayes factor.It then follows that the Bayes factors for two models, characterized by ( 1 , ℎ 1 , Δ 1 ) and ( 2 , ℎ 2 , Δ 2 ), are the ratios , , with the integrated likelihoods  1  (1) = (1) that can be approximated using the algorithm of Section 3.2.5.Kass and Raftery [20] demonstrate how to interpret Bayes factor shown in Table 7.

Numerical Results on Stochastic
Volatility.First, we consider the problem of selecting the best of our five fair pricevolatility models, and the resulting partially observed market models, ( () , Ẑℎ,Δ ,  (,ℎ,Δ) , ; ) .We compare these five models to determine which can best represent the market data.More precisely, we run all unnormalized filters as explained in Section 3.2 with the optimal parameters discovered and reported earlier.Then, we choose Model  if  (,ℎ,Δ)  is the largest.Naturally, this corresponds to the model whose Bayes factor ends up greater than one when compared to any other model.While we have five basic models, we also consider different market ingestion times Δ and inertia magnitude parameters ℎ for each model.
Using GBM with nondynamic microstructure (i.e.,  ℎ = 0) as the benchmark, we determine which combination of SV model and inertia parameters outperforms GBM most.We first focus on the candidate models (Examples 3-7).In each case, we pick the inertia parameters from the sets Δ ∈ {30 mins, 2 hrs, 1/2 day} and ℎ ∈ {0, 0.1, 0.2, . . ., 0.9, 1} that would yield the highest Bayes factor against the calibration model.The data is the transaction price of PepsiCo, IBM, and Goldman Sachs, April 2010.Figure 7 and Table 8 summarize  the Bayes factor performance.The Bayes factors computed in this table give strong evidence (based upon the Kass and Raftery criterion mentioned before) for the Heston model based on a full month of real tick-by-tick stock price data.Indeed, as we will see below, there would still be strong evidence supporting Heston if we used different values of ℎ and Δ.It is also interesting that the order of the models did not change over our three stock selections, with Heston always being preferred and GBM always performing the worst.Recall that all models are tuned to have their best parameters  and .

Numerical Results on
Inertia.Next, we look at the ingestion time Δ using nondynamic microstructure Heston as the calibration model.Figure 8 and Table 9 show the effect of varying Δ over {30 mins, 2 hrs, 1/2 day} for ℎ ∈ {0, 0.1, 0.2, . . ., 0.9, 1} fixed to give the highest Bayes factor.There is a drop in the Bayes factor values from the model determination experiment which is entirely due to the change of calibration model from GBM with nondynamic microstructure to Heston with nondynamic microstructure.Our results show that the best ingestion times for Goldman Sachs, PepsiCo, and International Business Machines stocks are, respectively, 1/2 day, 2 hours, and 1/2 day.The fact that the data supports long-time ingestion might add merit to the case of the momentum trader.Finally, we investigate the optimal amount of inertia.Figure 9 and Table 10 show the effect of varying the amount of inertia ℎ over {0, 0.1, 0.2, . . ., 0.9, 1} for Δ ∈ {30 mins, 2 hrs, 1/2 day} fixed to give the highest Bayes factor.The table shows that inertia is important.In fact, the best ℎ was always at least ℎ = 0.4 and was even ℎ = 1 in the case of IBM so all microstructure dynamics should be driven by the inertia process.

Conclusions
Herein, we considered five popular SV models to represent intrinsic or fair price and stochastic volatility of this price.These SV models are free of inertia or momentum.We then added microstructure noise with possible dynamics and inertia to these SV models to accommodate trading noise, trend following, information dispersion, and the slow unwinding of big positions.We used Bayesian model selection techniques to determine which of these combined models fits real NYSE data best.In the process of selecting the best model we also investigated characteristics like microstructure Mathematical Problems in Engineering dynamics, inertia, and stochastic volatility.For the stock data considered, we can conclude the following: (1) Bayesian model selection through particle filtering provides a computationally effective means to identify the best finance models on real tick-by-tick data.(2) The SV and inertia components of the financial models compared can be singular to each other as long as the microstructure can be changed into the same canonical Poisson measure process for all models considered.(3) There is strong evidence of dynamical microstructure noise.(4) Adding dynamics to the microstructure allowed much greater deviations of price from intrinsic value, which can be detected by filtering and used as a warning sign to investors and traders.(5) The simplified Heston stochastic volatility model with microstructure dynamics and significant inertia performed the best in all cases.(6) There is strong statistical evidence that such simplified Heston stochastic volatility models with microstructure dynamics and inertia match the data better than the classical geometrical Brownian motion.(7) The amount of inertia ℎ and the time it lasted Δ varied a little from stock to stock but in all cases there was significant inertia that lasted for hours.
More complicated SV models can be investigated in our future work.One could also postulate more complicated microstructure dynamics and consider additional real data analysis.

Figure 3 :
Figure 3: Long-term value estimation of PEP.
t of various models ( * indicates models without dynamics) Log O-U *

Table 1 Name
2.3.2.Information Noise and AugmentedState.Hitherto, we have focused on constructing inertia processes.Now, we include all informational noise into asset prices.Information noise is introduced to represent trading noises due to things like inertia, fear-greed cycles, belief heterogeneity, and asymmetric information.For the th-transaction occurring at   , the raw price Y   is defined by ln

Table 2 NYSE
with no clustering noise given    = ,    =  would be

Table 9 :
Bayes factor for ingestion time determination, April 2010.