On the Existence of Strongly Consistent Indirect Estimators When the Binding Function Is Compact Valued

We provide sufficient conditions for the definition and the existence of strongly consistent indirect estimators when the binding function is a compact valued correspondence. We use conditions that concern the asymptotic behavior of the epigraphs of the criteria involved, a relevant notion of continuity for the binding correspondence as well as an indirect identification condition that restricts the behavior of the aforementioned correspondence. These are generalizations of the analogous results in the relevant literature and hence permit a broader scope of statistical models. We examine simple examples involving Levy and ergodic conditionally heteroskedastic processes.


Introduction
Indirect estimators (henceforth IE) are multistep M-estimators defined in the context of (semi-) parametric inference.They are minimizers of criteria (inversion criterion) that are functions of an auxiliary estimator, itself derived as an extremum estimator.The latter minimizes a criterion function (auxiliary criterion) that partially reflects the structure of a possibly misspecified auxiliary statistical model.The inversion criterion is usually a (possibly stochastic) distance function evaluated on the auxiliary estimator as well as on some functional approximation of a mapping between the statistical models involved that is termed binding function.This is constructed by some limiting argument concerning the auxiliary criterion.The IE is finally defined by minimization of the inversion criterion.This definition is conceptually justified by properties of the binding function that guarantee indirect identification and the subsequent use of the analogy principle.Given the auxiliary criterion, differences between IEs hinge on differences on the distance functions, and/or the approximations of the binding function and/or the optimization errors involved.
In the present paper, we are concerned with the issue of the existence of strongly consistent IE allowing for cases where the binding function is compact valued (hence possibly multivalued).Therefore we perform our study in a more general framework than the ones employed in the relevant literature.
Our motivation lies in cases where the auxiliary criterion is a quasilikelihood function involving a class of stationaryergodic volatility processes defined by some GARCH or SV type models that represent the statistical model at hand and an auxiliary class of stationary-ergodic invertible processes living in the premises of a possibly misspecified analogous model.In such frameworks the limit criterion could assume extended real values due to the possible existence of parameter values that imply nonexistence of relevant moments.Furthermore since the limit criterion is a statistical divergence between the two classes of processes the binding function can in principle be multivalued due to the geometry involved.Notice that even when this is true, it is possible to study single valued reductions of it via measurable selections.This could however imply stricter conditions for indirect identification.Furthermore since these frameworks are generally suboptimal with respect to asymptotic efficiency, actual multivaluedness could lead to efficiency gains.
This study has the form of a calculus of escalating weak sufficient conditions that enable the definition of IE in this framework and the proof of existence of strongly consistent ones.First, using mild assumptions on the structure of the auxiliary criterion functions, we are occupied with a weaker than the uniform notion of convergence of the relevant sequence of criterion functions.This is termed epiconvergence and essentially concerns the almost sure asymptotic behavior of their epigraphs.By construction it is suitable for the study of the asymptotic behavior of their minimizers.This form of convergence has been extensively studied in the statistical literature (see among others [18][19][20]) and it enables the definition of the binding function and the determination of the limiting relation between this function and the auxiliary estimator.
Then we strengthen our assumptions in order to obtain a form of continuity of the binding function that enables the definition of IE derived from this function and the verification of their existence.Finally the imposition of a relatively weak condition of indirect identification on the behavior of the binding function along with the limiting relation already established enables the proof of the existence of strongly consistent IE via the use of the same limit arguments also used for the pseudoconsistency of the auxiliary estimator and the subsequent definition of the binding function.This framework readily enables the description of conditions concerning the behavior of any approximation of the binding function that could also be used for the definition of IE in a similar manner.
Hence we manage to extend the framework for the definition of IE in a threefold manner.We allow for the auxiliary and/or the inversion criteria and/or their appropriate limits to assume extended real values.We study their asymptotic behavior via the use of the weaker known topology associated with convergence of minimizers and we allow for the binding function to be a correspondence with values on the collection of nonempty and compact subsets of the relevant parameter space.This incorporates the definitions used in the existing literature but simultaneously generalizes the set of the statistical models that are in accordance with these conventions.
The structure of the paper is as follows.We formulate our setup and define and study the asymptotic behavior of the auxiliary estimator, the binding correspondence, and finally of the IE.We then exhibit some of our results by a set of simple examples.We conclude posing some questions for future research.In the appendix we briefly describe some general notions that are essentially used in main body.

Assumptions and Main Results
2.1.General Setup.We construct our framework and describe the underlying statistical problem.Let the triad (Ω, J, ) denote a complete probability space.Let also (Θ,  Θ ) and (,   ) denote two compact separable metric spaces.Let also B(Θ) and B() denote the corresponding Borel algebras.R denotes the extended real line.Again B(R) denotes the corresponding Borel algebra.
The auxiliary criterion is a function   (, , ) : Ω × Θ ×  → R, which is of the form   (  , ) with   :   × → R, for   : Ω × Θ →   , for   some appropriate space, usually homeomorphic to R  for some  > 0.   represents the sample.  reflects part of the structure of an auxiliary model, a statistical model defined on the measurable space (  , B(  )), with  as its parameter space (e.g., it can be a likelihood function or a GMM type criterion; see Section 3) (which in general is a correspondence   P(  ), with P(  ) the set of probability measures on   ).  (⋅, ) is measurable for any  and thereby represents the underlying statistical model which is essentially the set { ∘  −1  (⋅, ),  ∈ Θ}.These two models need not coincide.
We abbreviate with  a.s.any statement that concerns elements of J of unit probability.When not nessesary we avoid notating the potential dependence of those elements on the parameters.
In the following we provide with an escalating description of a set of sufficient conditions that enable first the existence of the auxiliary estimator, second the construction of the binding function and the description of the asymptotic relation between the two, third an appropriate form of continuity of the binding function which along with the previous enables the definition of the IE, and finally consistency.

Definition and Existence of the Auxiliary Estimator.
We begin with a sufficient weak assumption on the behavior of   that enables the definition and the existence of the auxiliary estimator.It comprises of a joint measurability condition along with a pointwise with respect to  and  a.s. with respect to  continuity and some condition concerning the facilitation of minimization.All these conditions are weak enough so that their verification is easy in many cases.Remember that a function with values in the extended real line is called proper, if it does not take the value −∞ and its image contains at least a real number.It is called inf-compact, if its level sets (for  ∈ R, the level set of  :  → R with respect to  (Level ≤ ()) is defined by { ∈  : () ≤ }) are compact.inf-compactness follows trivially when the function is lsc and its domain is compact.
and proper  a.s., for all  ∈ Θ. (It is obvious that the element of J of unit probability with respect to which Assumption 1 holds can depend on ).
In our examples presented in Section 3, the issue of joint measurability is handled easily due to the fact that the   's considered are in fact Caratheodory functions, that is, jointly continuous (with respect to (, )) and pointwise measurable.Separability of Θ ×  and Lemma 4.51 of Aliprantis and Border [21] implies the required measurability.Properness is an ad hoc consideration that is easily established in many cases.For instance, when   has the form of a quasilikelihood function it  a.s.does not attain extended values.This is also the case in instances where   has the form of a hemimetric (see Appendix) as in Section 2 due to the compactness of its arguments.
Remark 2. The joint measurability and the pointwise semicontinuity imply that   (⋅, , ⋅) is a normal integrand (see Definition 3.5 and Proposition 3.6 in Chapter 5 of [22]).The compactness of  then implies that   is inf-compact  a.s.for all  ∈ Θ.
We are now ready to define and prove existence for the auxiliary estimator.We remind the reader that the Appendix provides with the definition of a measurable compact valued correspondence.
Definition 3. The auxiliary correspondence  #  (, ,   ) satisfies where   is a  a.s.nonnegative random variable defined on Ω.
Notice that the dependence of  #  on  is due to the assumed dependence of   on .In practice  #  is evaluated by the minimization appearing in the first equality of the previous display.

Epilimits and Existence of a Fell Consistent Auxiliary
Correspondence.Assumption 1 is not sufficient for the construction of the binding function as an appropriate limit of the auxiliary correspondence.The following assumption facilitates the investigation of the issue of (pseudo-) consistency for the auxiliary correspondence and thereby permits this construction.It indicates the almost sure epiconvergence of the auxiliary criterion to a proper, semicontinuous asymptotic counterpart.The first part is essentially the sequential characterization of this form of convergence.For the topological definition of epiconvergence along with some properties please see the Appendix.For the equivalence between the topological and the sequential definitions see among others [20].The second part enforces to the limit criterion a property that are is not generally invariant with respect to epiconvergence.Assumption 5.There exists a function  : Θ ×  → R such that (1) ∀ ∈ Θ,  a.s.for any :  [23]).In the considered cases  is proper since it is either an expectation that cannot assume the value −∞, and there exists at least some parameter value for which it is finite or it is defined by composition with a (hemi-) metric and assumes the value 0 for at least one parameter value (see, for example, [24] Part 1, (ii) in association with Part 2 of the proof of Theorem 5.3.1,where   is a quasilikelihood function and Θ coincides with  for the former or Section 2 for the latter case).inf-compactness follows from the compactness of Θ.
The assumption essentially enables the use of the fact that the argmin correspondence is upper continuous as a function defined on the relevant space of lsc functions equiped with the topology of epiconvergence.Analogous assumptions have been used for the establishment of strong consistency of various estimators.See among others [18][19][20]25].Hence it makes possible the definition of the binding function as  () ≑ arg min   (, ) . ( The following proposition provides its existence and is essentially similar to Proposition 4.

Proposition 7.
Under Assumptions 1 and 5 the binding correspondence  is nonempty compact valued.
Proof.It follows from Remark 6.
Both the auxiliary and the binding correspondence will be used for the definition of the IE via some intuition that utilizes an analogy principle.The following result explores their asymptotic relation.For its construction the hemimetric   is needed that is defined by   ( 1 ,  2 ) = inf{ > 0 :  2 ⊂   ( 1 )} where   ( 1 ) = { ∈  : (,  1 ) < }, where (,  1 ) ≑ inf ∈ 1 (, ) and  1 ,  2 are nonempty closed subsets of some topological space.For several continuity and measurability properties of   see the Appendix.
The first and last implications of the following proposition are already well known.Its second implication is a partial generalization of Theorems 7.30, 7.32 of [26] in our setting.Given the definitions of the upper and the Fell topology in the Appendix, the first essentially establishes the upper pseudoconsistency and the other two the Fell pseudoconsistency of  #  with respect to to .

Proposition 8. Under Assumptions 1 and 5 the following holds:
(1) for any For its proof we will use the following lemmata.Let   ≑ inf  (, ) which is well defined due to Remark 6 and the compactness of .Remember that for (ϝ  ) a sequence of non empty sets Liϝ  is the set comprised of the limit points of any possible sequence (  ) such that   ∈ ϝ  , and Lsϝ  is the one comprised of the analogous cluster points.Also Level ≤ (  (, ⋅)) ≑ { ∈  : (, ) ≤ } for  ∈ R.

Lemma 9. Under Assumptions 1 and 5,
Proof.Consider the family of -parameterized correspondences epi  (, ) ≑ epi(  (, , ⋅)).Due to the fact that  is locally compact, epi  (, ) is a random closed set in the sense of the previous paragraph, that is, a B(  )/J ⊗ B(Θ)measurable correspondence.Hence epi  (, ⋅) is an B(  )/Jmeasurable correspondence due to the measurability of the relevant projection.Now due to Assumption 5 (see Section 4) we have that for large  and for all  in an element of is open in the relevant product topology.Hence inf    (, , ) ≤ inf  (, ) for all  described previously.
The next result will be used for the proof of Proposition 8( 2 where that last inequality follows from Lemma 9.This establishes that for any nonnegative random variable Now (1) follows from the fact that -arg min  ⊆   -arg min  if  ≤   .For (2) notice that, from the definition of the Fell topology in Section 4 for any  > 0, we have that for large , is compact in the relevant product topology.This implies that lim inf  inf    (, ) ≥    a.s. and in conjunction with Lemma 9 that inf    (, ) →    a.s.Then using Lemma 10 set  *  =  *  −   which is obviously measurable and converges to zero  a.s.(3) follows from (1) the single valuedness of () and the compactness of .
The proof of Lemma 10 implies that the sequence ( *  ) ∈N that appears in Proposition 8(2) is nonunique.However the fact that this implication does not hold for any sequence of nonnegative random variables that  a.s.converge to zero is the cause of that in what follows we can only prove the existence of strongly consistent indirect estimators among the set of the ones to be defined.

Upper Hemicontinuity of the Binding Correspondence.
Proposition 8 enables the use of   ((),  #  ) as the inversion criterion.The following assumption concerns the upper continuity of the binding correspondence which along with the relevant properties of   would imply the analogous continuity property for the particular inversion criterion and thereby facilitate the issue of existence and consistency of the IE to be defined.Assumption 11.  is upper hemicontinuous that is, for any  and  * → ,   ((), ( * )) → 0.
The following proposition provides with sufficient conditions for this to hold.It essentially strengthens Assumption 5 in that it requires that the relevant  a.s.epiconvergence be continuous on Θ.Notice that its requirements are also stricter with respect to measurability compared to the ones in Assumption 5 since the former requires that for any  the relevant set of  unit probability does not depend on the sequence that converges to . replacing   ((),  #  (,   )) when this occurs.This means that the premises of Proposition 12 render  #  strongly continuously (upper in the first case or Fell in the second and third cases of Proposition 8) ()-consistent.Furthermore the compactness of Θ then implies that these notions are also uniform with respect to Θ.
Remark 13.Proposition 12(1)-( 2) would obviously be implied if   (, , ) is  a.s.jointly continuous and converges jointly uniformly  a.s. to (, ).Since we allow   and/or  to assume extended values, the relevant notion of uniform convergence must also be extended as in Definition 7.12 of [26].The following lemma provides with a set of even weaker sufficient conditions than extended jointly uniform  a.s.convergence when   has the form of an arithmetic mean with respect to stationary and ergodic processes.Proof.Proposition 12(1) follows from the fact that the assumption framework of the lemma implies condition C 0 and thereby Theorem 2.3 of [23], which implies the joint  a.s.epiconvergence of   to  0 .For Proposition 12 (2) notice that the separability of  and the  a.s.continuity of  0 implies the existence of a countable dense  * such that for any  and any  > 0 there exists a  * ∈  * such that lim sup By assumption the subset of Ω of  unit probability can be chosen independent of  and  can be chosen arbitrarily small.Hence, Proposition 12(2) would be implied for   = , if, for any ,  * ∈  * ,  a.s. and any due to the countability of  * .Notice that ((  (,  * ) ∧ −) ∨ ) ∈Z is also stationary-ergodic for any  > 0 (see, e.g., Proposition 2.1.1.of [24]), and hence the uniform version of Birkhoff 's LLN implies that converges  a.s. to ∫  −  0 (, ,  * )() for any  > 0. Due to the separability of R the subset of Ω of  unit probability can be chosen independent of .Hence from Definition 7.12 of [26] we obtain (i).
This lemma explores sufficient conditions for the required continuity of  solely via restrictions on the behavior of   which in applications is generally more analytically tractable than .Furthermore it combines joint epi-onvergence with pointwise (on ) extended uniform (with respect to ) almost sure convergence.Finally notice that analogous result would also hold if ergodicity is replaced by any kind of mixing condition that would justify the LLNs used in the previous proof or implied in Remark 13.

Definition, Existence, and Consistency of the Indirect
Estimator.We are now ready to define the IE and explore the issues of its existence and consistency.Proposition 8 along with the measurability of the auxiliary correspondence and the upper hemicontinuity of  facilitate the use of   ((),  #  ( 0 )) for some distinguished  0 ∈ Θ, for the definition of the IE and the subsequent existence argument.Again an almost surely nonnegative random variable will assume the role of the "optimization error" in this second step of the estimation procedure.
where  #  is a nonnegative random variable defined on Ω.
The dependence of  #  on  0 follows form the analogous dependence of  #  which in turn follows from the dependence of   on  0 .Obviously in practice this dependence is not "visible." We are initially concerned with the question of existence of the IE.We again suppress the dependence of   on  when there is not a risk of confusion.
The fundamental selection theorem (Theorem 2.13 of [22]) would also enable the definition of the IE as a measurable function with values in Θ.Having established existence we turn to the issue of consistency.We need an assumption of indirect identification that is essentially derived from the form of the roots of the hemimetric used.Notice that this assumption along with the proof of the following proposition justifies the definition of the Definition 15 by an analogy principle.Mathematically both the definition and the existence argument do not require the following assumption in order to be valid.
Remark 18.This condition is weaker than a condition of the form "If  ̸ =  0 ⇒ ( 0 ) ∩ () = 0" and stronger that a condition of the form "If  ̸ =  0 ⇒ ( 0 ) ̸ = ()." The latter cannot be used due to the properties of   upon which the definition of the IE is based.In the case that the binding correspondence is single valued, these become equivalent.This also makes evident the claim that if the auxiliary estimator is defined by a measurable selection of  #  the corresponding identification condition cannot be weaker than the one above.
The main result of the current section follows.It merely concerns the existence of strongly consistent IE inside the established framework.Denote by ( 0 ) the arg min Θ   ((), ( 0 )) which is nonempty and compact due to the compactness of Θ, the properness of   , the hemicontinuity of , and Lemma A.  2) correspond to the fact that the statistical model is only indirectly set identified given this framework.They are trivial when ( 0 ) = Θ and the closer to zero   ({ 0 }, ( 0 )) is the more informative they become.We once again point out that the implications Proposition 19 (1) and (1 * ) merely explore the issue of the existence of strongly consistent estimators among those that comply with Definition 15.The properties of the   function along with Proposition 8 do not permit for a stronger result without strengthening the assumption framework.Finally notice that this framework enables both the definition and the result on consistency of the IE to be derived via the use of exact same notions that were used for the analogous results concerning the auxiliary one.

Extension.
In most cases  is analytically unknown even if several of its properties, such as some of the ones discussed above, can be established.In these cases the estimators defined in Definition 15 are infeasible.However, it may be the case that (possibly) stochastic and algorithmically feasible approximations of the unknown  can be used for the construction of several other classes of feasible IE.In such a context the results derived previously could be used to describe properties of such approximations that would imply that these estimators are well defined and among them strongly consistent ones exist.Let   (, ) denote such an approximation.We readily obtain the following result for the indirect estimator defined by the substitution of  with   in Definition 15.

Proposition 20. Consider the 𝐼𝐸 defined by
where   ,  #  as before.
( +   ( () ,  ( 0 )) ,  a.s., where the first inequality follows from the triangle inequality and the second from the definition of  and the  a.s.Fell convergence of  #  to ( 0 ) and Lemma A.4. Due to the  a.s.pointwise with respect to  Fell convergence of   to  we have that lim sup  (  (, ), ()) = 0  a.s. and therefore we obtain the needed result.
Notice that this proposition generalizes the results of Propositions 16 and 20.For a simple example consider the case where   =  #  .This is possible when by some sort of resampling technique (e.g., bootstrap or Monte Carlo) realizations of the   random elements are available to the practitioner for any  and thereby so is  #  (,   ) for any  and some optimization error   independent of .Then a feasible IE can be defined by the approximate minimization of   ( #  (,   ),  #  ( 0 ,   )) with respect to .In this case the joint measurability of  #  would follow from the joint measurability of   and   , the separability of , and the subsequent joint measurability of the relevant projection.The  a.s.upper hemicontinuity of  #  would follow from an easy extension of the implication Proposition 8(1) if Assumption 1 is strengthened so that the mapping  → epi  (, ) is  a.s.Fell continuous.The  a.s.joint continuity of   would suffice.Then the  a.s.pointwise Fell convergence to  would follow as in Proposition 8(2) or (3) and the  a.s.continuous w.r.t  upper convergence to  would follow if Proposition 12 holds with the set of unit probability independent of .It would suffice that   is  a.s.jointly continuous and converges to  jointly uniformly.

Examples
In this section we consider four simple examples that represent some of the previous results.The first concerns the case of a linear semiparametric model, the second a model comprised of Levy processes, and the final two emerge in the context of conditionally heteroskedastic ones.In any of these, Θ is a compact subset of R  and  a compact subset of R  .In the second and the fourth ones the binding function is actually single valued (hence a fortiori compact valued) and 1-1 enabling the direct application of Proposition 19(2 * ).The first and second examples include cases in which the IE can be interpreted as performing "inconsistency" correction to the auxiliary one.
Example 21 (semi-parametric linear model with linear auxiliary).Consider the  ×  and  ×  dimensional random matrices () and (), respectively, where  ≥  ≥ .Suppose that (  /) →     , (  /) →     , (  /) →      a.s., where rank(    ) = rank(    ) =  and  ≤  ≑ rank(    ) ≤ .For  an  × 1 random vector, let the underlying statistical model be the set of "regressions" (, ) =  + ,  ∈ Θ.For  a large enough compact and convex subset of R  and any  ∈ , let   (, , ) = (1/)( − )  ( − ), which clearly satisfies Assumption 1 due to continuity with respect to  and the compactness of .Obviously,   is constructed by the auxiliary set of regression with respect to .Proposition 4 ensures the existence of   which in the light of the previous can be interpreted as an OLSE in the context of the auxiliary model.Let P : R  →      be the (generally nonlinear) projection defined by the optimization problem arg min for  in R  .P is well defined due to the compactness and the convexity of  and the linearity and continuity of     and continuous.Furthermore, for any  ∈ col(    ), consider the linear system      = , which is always satisfied by any member of the coset  +  − , where  is a matrix of rank  and  − is a  − -dimensional subspace of R  , which is trivial if and only if  =  whereas  =  −1    and maximal in the case that  = .(2) implies that any IE defined by Definition 15 can be perceived as an "inconsistency corrector" of the underlying OLSE for .
We know consider the case of the estimation of the drift of a continuous time cadlag process.
Example 22 (the drift of a levy process with bounded jumps).Let  denote a standard Bownian motion and V a finite measure on the Borel algebra of R − {0}, such that Obviously V is a Levy measure (see paragraph 1.2.4 of [27]).For  = 1 consider the stochastic process on R + defined by the following Levy-Ito decomposition (see Theorem 2.4.16 of [27]) where  denotes the independence to ) the existence of which is established by Theorem 2.3.6 of [27].Let the underlying statistical model be the set of the previous stochastic processes and for  a large enough compact subset of R and any  ∈ , let   (, , ) = (1/) ∑  =1 (  − ) 2 , where   = exp(  −  −1 ) − 1.This can be perceived to emerge as an approximate likelihood function of the auxiliary model that contains the relevant discretizations of the processes that satisfy the SDE for each  ∈ .Obviously Assumption 1 is satisfied, due to continuity with respect to  and the compactness of .Proposition 4 ensures the existenceof   which in the light of the previous can be interpreted as an (approximate) MLE in the context of the auxiliary model.Furthermore since and for Due to the definition of  the process  is i.i.d. and this along with the compactness of Θ, and  and the existence of the previous moments imply the joint uniform  a.s.convergence of   to which implies both Assumptions 5 and 11 (via Proposition 12 and Remark 13).
In this case Assumption 17 applies and therefore Proposition 19(2 * ) implies that any IE defined by Definition 15 is consistent for any  0 ∈ Θ.When V([ 1 ,  2 ]) = 0 (and therefore V = 0) whereas  = 1 the IE can be perceived as an "inconsistency corrector" of the underlying MLE for the estimation of the drift of a geometric Brownian motion (see, e.g., paragraph 6.1.1 of [13]).
For the last pair of examples, let  : Ω → R Z be an i.i.d.(double infinite) sequence of random variables, with  0 = 0 and  2 0 = 1.Consider a random element  2 : Θ × Ω → (R + ) Z , with the product space Θ × Ω equipped with B(Θ)⊗J with  2  () independent of (  ) ≥ , for all  ∈ Z, for all  ∈ Θ. Analogously, define the random element  : Θ × Ω → (R) Z as Then for all  ∈ Θ, (  ()) ∈Z is called a conditionally heteroskedastic process, while the random element (  ()()) ∈Z,∈Θ a conditionally heteroskedastic model.Our examples will solely concern ergodic heteroskedastic models.(The establishment of the ergodicity is initiated by the analogous establishment for ( 2  ()) ∈Z for all  ∈ Θ. Sufficient conditions for that are described and employed in a variety of heteroskedastic models in Chapter 5 of [24] via Theorem 5.2.1.Then the ergodicity of (  ()) ∈Z and ( 2  ()) ∈Z for all  ∈ Θ follows from the definition of , , the previous assumption and Proposition 2.2.1 of [24]).
Consider the random vector () = ( 2  ()) ∈{1,...,} and the  × 2 dimensional random matrices jointly measurable with respect to J⊗B(Θ), where  > 2 and ergodic for any  ∈ Θ.For  = ( let   (, , ) = ‖(1/)  ()(() − ())‖ which clearly satisfies Assumption 1 due to joint continuity with respect to (, ) the compactness of , the joint measurability, and the fact that   is defined via composition with a norm.This consideration is motivated from the AR(1) representation of the ARCH(1) process with respect to the martingale dif and only iference noise V  = ( 2  − 1) 2  () (see, e.g., [28]) and   can be perceived to emerge from an auxiliary model that is consisted of the set of "auxiliary" regression functions of  on , along with the instrumental variables appearing in the columns of  where obviously the th element in any column is clearly orthogonal to V  for  ≤ .Proposition 4 ensures the existenceof   which in the light of the previous sentence can be interpreted as an IV estimator in the context of the auxiliary model.Due to the compactness of , the definition of the ARCH (1)  ( .
In fact a simple calculation shows that where proj  2 denotes projection to the  2 -axis and mid() denotes the midpoint of the smallest interval that contains .Notice that in our case mid(proj  2 (  )) is well defined due to the fact that   is  a.s.compact valued hence its proj  2 (  ) is a  a.s.compact subset of the real line.Finally and due to the fact that bootstrap resampling techniques are readily available in the context of this model, Proposition 20 implies also the analogous properties for IE defined by   when this equals the auxiliary estimator derived from bootstrap resampling for any .
The final example is about an asymmetric heteroskedastic process.
Example 24 (−  is the quasilikelihood function of an approximate to QARCH(1) model).Let | 0 | < 1 and  > 0, and consider for  ∈ Θ = [−, 0]  0 ,  0 > 0,  =  0 + ( 2 /4 0 ), and  0 < exp(−2 ln | 0 |) the stochastic dif and only iference equation For any  ∈ Θ, the previous defines a unique stationary and ergodic QARCH(1) volatility process with existing log moments that is uniformly bounded from below away from zero (see Lemmas 2.1 and 3.4 and Remark R.2 of Arvanitis and Louka [29]).Notice that Jensen's inequality allows  0 ≥ 1 which in turn implies that  2  () = +∞.For  * ∈  = [0, ] consider the process defined by where ℎ  (,  * ) is well defined due to the definition of  and it is stationary and ergodic with existing log moments due to the previous and Proposition 2.1.1.of [24].Now consider where −  can be considered as an approximation of (a monotonic transformation of) the conditional quasilikelihood function of the auxiliary conditionally heteroskedastic model defined by 3 and .Also the ergodicity of (  ) for any (,  * ) follows from the previous and Proposition 2.1.1.of [24].(In practice   (, ,  * ) is unknown but approximated by an analogous ĉ (, ,  * ) dependent on nonergodic solutions of the stochastic dif and only iference equation that defines ℎ based on arbitrary initial conditions.In this case, due to ergodicity, Proposition 5.2.12 of [24] for  an arbitrary member of the partition.Notice also that Notice that (0,  * ) is uniquely minimized at  * = 0 (see, e.g., Part 1. of the proof of Theorem 5.3.1.of [24] to obtain the analogous arguments along with the fact that ((+ = 1 if and only if  * = 0).When  ̸ = 0 then (, −) < (, 0) due to the fact that and that when  > 0, then ln(1 + ) < .Furthermore, using the fact that by 3 h is  a.s.two times dif and only iferentiable with respect to  * for  * ̸ = 0 and since establishing along with the previous that () = {−}.This validates simultaneously both Assumptions 11 and 17.
Notice that Assumption 11 could also be verified by the use of Lemma 14 due to the continuity of  2  and ℎ  with respect to the parameters, the existence of log moments, and the fact that ℎ  is uniformly bounded from below away from zero.Then Proposition 19(2 * ) implies that any IE defined by Definition 15 is consistent for any  0 ∈ [−, 0].

Conclusions
In this paper we generalize the definition of IE and are occupied with the questions of existence and strong consistency.We allow for cases where the binding function is a compact valued correspondence.We have used conditions that concern the asymptotic behavior of the epigraphs of the criteria involved in the relevant procedures, a relevant notion of continuity for the binding correspondence and an indirect identification condition that restricts the behavior of the aforementioned correspondence.These results are generalizations of the analogous ones in the relevant literature and hence permit a broader scope of statistical models.
First, notice that our framework could still be extended in the following manner.The established results would remain almost intact if the underlying parameter spaces were only locally compact under more restrictive assumptions on the behavior of the criteria involved.In such a case Proposition 4.2.1.(i) of [30] would permit the validity of the results,   (, ) ≤   (,   ) +   (  ,   ) +   (,   ) establishing that lim inf    (  ,   ) ≥   (, ).Note that despite the fact that the image of  may include infinities, epi() is by definition a subset of  × R. If and only if  is lower semicontinuous (lsc) we have that, due to Proposition A.2 of [22], epi() ∈ ( × R) with respect to the obvious product topology.Hence any relevant lsc function can be identified with its epigraph, which in turn lies in a space endowed with Fell topology, which in turn implies a notion of convergence.It is easy to see that uniform convergence implies epiconvergence (see, e.g., Remark 6 above).Furthermore the relevant set of lsc is closed with respect to the Fell topology.This notion is particularly suitable for the description of the asymptotic behavior of the set of minimizers of sequences of lsc functions (see Theorem 3.4 of [22] along with Theorem 7.1.4 of [30], Definition D.1 and Proposition D.2 of [22]).

A.3. Closed and Compact Valued Correspondences-Random Closed Sets.
A closed valued correspondence is by definition a representation of an underlying function  from a set Ω to  0 () (i.e., a closed valued multifunction with domain the set Ω), when this is considered as a relation in Ω × .The benefit of not directly working with the underlying function is the fact that we can consider the graph of the correspondence as the set {(, ) :  ∈ ()} which resides in Ω ×  instead of the set {(, ϝ) : ϝ = ()} inside Ω× 0 ().When () is compact for any , then the correspondence in obviously termed as compact valued.In the following we do not make explicit distinction between the correspondence and the underlying multifunction.
The Borel -algebra on  0 () generated by T  will be abbreviated by B(  ) and is usually termed Effros algebra (see Paragraph 1.1 of [22]).If (Ω, J) is a measurable space, then  is a random closed set if and only if { ∈ Ω : () ∈ ϝ} ∈ J for any ϝ ∈ B(  ).Analogously we abbreviate by B(  ) the Borel -algebra on  0 () generated by T  and by B(  ×  ) the Borel -algebra on  0 ()× 0 () generated by the product topology described in Lemma A.6.Finally denote by B(R) the Borel -algebra of the extended real numbers with respect to the usual topology.Lemma A.9.If  is compact, separable, and metrized by , then   is B(R)/B(  ) ⊗ B(  ) measurable.
Proof.The separability of  implies the separability of ( 0 (), T  ) and ( 0 (), T  ) for if {  ,  = 0, 1, . ..} is dense in  then the countable subset of  0 (), {{  },  = 0, 1, . ..} intersects any basic open set with respect to to either topology.This implies the separability of  0 () ×  0 () when equipped with the topology discussed in Lemma A.6.This in turn implies that the Borel -algebra with respect to to the product topology on  0 () ×  0 () coincides with B(  ) ⊗ B(  ) by Lemma 1.4.1.of [33].The rest follows by Lemma A.6 along with the fact that the sets in the subbase of the upper topology of R generate B(R).(It is also possible to prove that in the context of separability B(  ) = B(  )).

Definition A. 8 .
A sequence (  ) of lsc functions epiconverges to  (    → ) if and only if epi(  ) → epi() with respect to the Fell topology.