Decision-theoretical formulation of the calibration problem

The choice of calibration policy is of basic importance in analytical chemistry. A prototype of the practical calibration problem is formulated as a mathematical task and a Bayesian solution of the resulting decision problem is presented. The optimum feedback calibration policy can then be found by dynamic programming. The underlying parameter estimation and filtering are solved by updating relevant conditional distributions. In this way: the necessary information is specified (for instance, the need for knowledge of the probability distribution of unknown samples is clearly recognized as the conceptually unavoidable informational source); the relationship of the information gained from a calibration experiment to the ultimate goal of calibration, i.e., to the estimation of unknown samples, is explained; an ideal solution is given which can serve for comparing various ways of calibration; and a consistent and conceptually simple guideline is given for using decision theory when solving problems of analytical chemistry containing uncertain data. The abstract formulation is systematically illustrated by an example taken from gas chromatography.


Introduction
The correct choice of calibration method has extreme importance in analytical chemistry and many papers can be found on this, topic (see, for example, Hilton et al. [1], Kateman [2], Hunter and Lamboy [3]). The usual formulation of the calibration problem considers calibration as a separate preliminary phase of the measurement process which results in an estimate of some parameters of the calibration graph. In the course of the real measurement this calibration result is used only for the determination of the desired quantities from the measurement results.
When analysing in depth this seemingly simple (at least from the statistical point of view) problem, we found that the full range of statistical decision theory is needed when formulating it. Much has remained to be done in order to construct a feasible suboptimum calibration policy as the optimum one is the well known non-feasible dual control design.
The paper is organized as follows. Based on the practical calibration task the exact formulation of the problem is given as a special case of the statistical decision model. The solution takes the form of dynamic-programming optimization. The optimization requires to solve identification and filtering problems is discussed in the final section before the conclusions.
The concepts in the paper are illustrated by a simple calibration example arising in gas chromatography. The parts related to this example are printed in italics.
The practical calibration problem A measurement device evaluates a set of samples. The result of the tth measurement is denoted byy(t) and is called (discrete) time. The goal of the measurements is an estimation of a related quantity, say x(t). The mapping C:x(t)-->y(t) (1) characterizes the measurement device and it is stochastic in nature. The measurement process is to be designed in such a way that relation (1) is: as deterministic as possible; (almost) invertible, allowing the construction of the backward mapping C-l:y(t)-->x(t) (2) simple enough to be practically feasible; and stable through the time course.
In automated analytical measurements of large sequences of samples, the above two phases cannot be separated artificially. In automated laboratories it is common practice to insert known samples into a sequence of unknown samples in order to control the validity of the previous calibration and/or to re-calibrate the measurement system. There is a need to determine the optimum way of performing such checking and re-calibration. The aim of this paper is to propose a consistent formulation of this problem which enables the optimum calibration policy to be established.
Author to whom correspondence should be addressed.
Fairly often the deterministic constituent of the mapping C is an (almost) linear function characterized by an absolute term and by a slope. The calibration problem arises when the mentioned pair or, more generally, a finite-dimensional vector of parameters w determining fully the mapping C is unknown.
The calibration is usually performed in such a way that to the sequence of samples with unknown values of x(t), a portion is added with known values of x(t) (control samples). The results of the calibration measurements are then used to specify C more completely (by estimating the vector of unknown parameters and by substituting its point estimate for the unknown parameter vector). In gas chromatography, the tth measurement resulty(t) can be the area or the height of the peak corresponding to the component to be determined. The estimated quantity x (t) is the concentration of that component.
The mapping C is the relationship between the measured peak area or the peak height and the concentration of the component. It is usually assumed to be a linear function (the case of linear calibration), i.e., ofthe formy(t) ax(t) + b + e(t) where a and b are constants and e(t) is a random measurement error with zero mean.
The unknown vector parameter w consists of a, b and of parameter(s) characterizing the error distribution (typically, dispersion and mutual correlation of measurement errors for different samples).  [4] for more detailed information about this topic. The vital part of the solution, i.e., Bayesian identification (presented in control engineering terms), can be found in Peterka [5].
Our explanation consists essentially of the specification of key concepts of the decision theory, such as sample space, state space and decision space, in terms of the optimum calibration policy design. We start with the description of variables met throughout calibration. They are imbedded in three (overlapping) 'spaces' according to the way in which they influence the calibration and measurement process.

Sample space
A sequence of outputs y(1...M) (y(1)y(2),...y(M)) measured within a given finite-time horizon M and the control samples x(ti) at checking time instances tie{ 1,2,...,M} are the sources of measured information for the calibration task, they form so called sample space. We assume that both the measured outputs y(t) and inputs x(ti) are real scalars.
In our example, the sample space consists o fall measured data, i.e. of the peak areas or the peak heights and of the concentrations of the control samples. mappings C form so called state space of the calibration problem. To avoid subtleties of the non-parametric estimation, we assume that the mapping C is completely characterized by a finite-dimensional vector of unknown parameters w, i.e., the state space consist of possible values of x(1...M) and w.
Note that the control samples {x(ti)} belong both to the sample and state spaces, and are the directly measurable state entries.
The concentrations of unknown and of control samples are the x-component of the sample space. The random mapping C is 'parametrized' by assuming that the case of linear calibration is relevant for gas chromatography and the measurement noise e (1... M) has normal, mutually uncorrelated entries with a common dispersion r. The parameter vector completing the state of the problem is then w (a, b, r).
Decision space The triplet of decisions u(t) (ul(t), u2(t), u3(t)) have to be accepted at each moment t. The entries are given the following meaning: ua(t)" a known sample will be/will not be added (3) u2(t)" a value of the known sample chosen (4) u3(t)" an estimate of the unknown quantity x(t) ( Any reasonable calibration policy has to balance its contribution to the measurement precision and its cost (in terms of time, money, energy, etc.). For automated design of the calibration, the degree of (dis)balance must be quantified. The following two elements of the discussed decision model are concerned with this aspect.

Loss function
The loss function J is a scalar function of the data d(1...M) , u(t)) and of the state x(1...M) (the parameter w is not argument of the loss function as it is of secondary interest in the case treated).
The following additive form of the loss is appropriate for the majority of the practical cases: The partial loss related to the tth measurement q(t) q(d(t), x(t)) should usually be a smooth function of both arguments and a function bounded from below (hence it can be scaled to be non-negative).
The partial loss can often be structured as (7) where: the first term, given by the function c(.), which depends generally on the value ofthe control sample u2(t), counts for the overall 'price' of the control sample (both price for the preparation of a control sample and the loss in the measuring productivity); and the second term, given by the function (.,.), should reflect the loss due to mis-determination of the unknown sample. It is reasonable to assume that this function has its absolute minimum for the correctly determined value of the inspected sample, i.e., for u3(t) x(t).
Under the above widely acceptable assumptions, the loss-(.,.) measuring estimation error of an unknown sample can be approximated (using Taylor's expansion and the proper scaling) by the quadratic function with an appropriate weight cq.
Taken into account that the function c(u2(t)) can often be approximated by a constant, the partial loss can be written in the simple form where the constant c has now the meaning o f a relative price of the control sample cost versus the price of a particular error in the determination of the estimated concentration.

Criterion
The loss function assigns values to various calibration and estimation outcomes after the measurement and it is not suitable for a prior comparison of possible calibrations and so is not suitable for the design purposes. For this, at the design stage, we shall minimize the expected value of the above loss function taking as the optimized In this notation, the definition (10) denotes the expected value of the loss function with no available measured information, i.e., with prior information only.
The expectation has to convert the loss function into the performance index, which can be evaluated a priori; it has to 'remove' all random and unknown quantities which prevent a priori evaluation of the consequences of the 72 decision proposed. In this respect the unknown and random constituents of the relation u(t) > y(t) are operationally indistinguishable. It supports the basic Bayesian postulate which states that the random and unknown quantities should be described in the same probabilistic way. this means that the expectation is applied both to random quantities in the usual sense and also to unknown parameters.
In the illustrative example, the expectation is taken with respect to has the significant advantage that no policy exists which would achieve a uniformly smaller loss (irrespective of state realization) that does the Bayesian calibration policy.

Admissible decision rules
The last but very important part of the formalization specifies the admissible decision rules. When making decisions under uncertainty, the information available for it is the dominant optimization restriction.
The particular ui(t) can be chosen by a (generally random) mapping to the set Ui from its information set: The above restrictions formalize that different entries of u(t) can use different information sets. This fact can be explained easily if one takes into account the interpretations of the entries (see the discussion of the decision space). Note that this structure is not usual for standard formulations of decision tasks.
Having measured the peak area height data up to the time 1 we have to decidefirst ifa known or an unknown sample will be measured next [u(t)]. The measured data can be used for this decision.
If we have decided to measure an unknown sample, i.e., if we know not only data up to the time 1 but also the value of u(t), we have to choose the value of known sample concentration u(t).
Selecting an unknown sample [u(t) 0], its estimation, i.e., the choice of ux(t), is additionally based on the measured sample value y(t)

Summary of the formal calibration problem
Having specified all elements of the formal calibration problem, we are ready to summarize its formulation, as follows:

Conceptual solution of the calibration problem
The most effective procedure for the design of the optimum calibration policy, with the information restrictions of the type (13), (14) and (15) on admissible decision rules, is dynamic programming. We shall write the form which is specific to our case.
It can be shown that the optimum calibration policy can be taken as a non-randomized one. For any fixed data from the admissible information set the optimally chosen decision is the minimizing argument in the following functional (Bellman where the minimum is taken over admissible decision rules.
When illustrating the above optimization we assume that the necessary moments of treated random variables are available. Their construction is described in the next section. functions of data with respect to which the expectation and minimization should be performed for M 1, M 2, A more complete discussion of various aspects of the design as well as further references can be found in Kdrnj et al. [6].

State estimation for calibration purposes
The dynamic programming consists of the sequence of taking the conditional expectation and its minimization. Let us study which information is needed for these evaluations. The state estimation task will be formulated and also solved in this way. It splits in the measurementoutput prediction which relies on the parameter estimation (called also identification) and in the estimation of unknown samples (called filtering It exemplifies that a model building and a state estimation have to supply the predictive c.p.d.f." p(y(t) t-1; ul(t), u2(t)) (26) and the filtering c.p.d.f.: is advantageous to deal with this joint c.p.d.f., which can be viewed as the outer probabilistic description of the calibration graph. The term 'outer' stresses that the description is based on the prior information and the measured data only. This model can be expressed as follows: p(x(t),y(t) It-1; ul(t), u2(t)) y p(x(t),y(t), w t-1; u(t), u(t))dw y p(y(t) t-1; x(t), w, u(t), u(t)) p(x(t) t-1; w, u(t), u(t))p(w t-1; u(t), u(t))dw y p(y(t) x(t), w)p(x(t) w, u,(t), u2(t))p(w t-1)dw The two leading expressions in the above equation are implied by elementary calculus with c.p.d.f.s. The simplifications in the last step are a consequence of the assumptions which are discussed separately for each of the three factors involved. These c.p.d.f.s have definite meanings reflecting the modelling of the measurement process, the choice of measured samples and the Bayesian experience accumulation.
The probabilistic model of the measurement process." The outputy(t) of the measurement is assumed to be fully determined by the value of the measured sample x(t) and by the parameters w. No additional information for output prediction can be gained observing the data d( 1...t 1) and u(t), u(t). In probabilistic terms, the pair (x(t), w) is a sufficient statistics for output prediction. It should be noted that the samples could be assumed to be independent of w, i.e., of the measuring device. Sometimes, however, it is advantageous to relate parameters of the p.d.f, describing the distribution ofx(t) to the precision of the measuring device. Moreover, by extending the parameter vector w by parameters characterizing distribution of the unknown samples, the incomplete knowledge of their distribution can also be faced. For simplicity, this approach will not be followed here.
The newly introduced c.p.d.f describes simply the distribution of an unknown sample typical for the measured concentrations. For instance, p(x(t) w) Nxt)(-, rR) can be often used. The mean x and the multiple of the measuring device precision R are either known or they extend the vector of unknown parameters of the problem. p(w It) ocp(y(t) t-1;w,u(t),u2(t))p(w It-l) (33) where c denotes the proportionality up to a w-independent factor and the recursion starts with a user-supplied prior c.p.d.f, p(w) p(w 0).
The model required for this recursion [note that x(t) is missing in the condition] is given by p(y(t) 1; w, ul(t), u2(t)) y p(y(t)lt-1; x(t), w)p(x(t) w, u,(t), ,2(t))dx(t) (34) where the above-discussed models and simplifications have been used.
Applying the identification equations to our particular calibration case, we assume the known parameters , R) of the c.p.d.f p(x(t) w) describing the distribution of the measured samples. In the calibration phase [u(t) 1J, the sample value is known and equal to ue(t) so that equation (34) reduces to p(y(t) 1; w, u(t), u,(t)) Ny(o(ax(t) + b, r) Nyco(au,(t) + b, r) (35) In the estimation phase [u(t) 0], the assumed normality of p(x(t) w) and equation (34) lead to p(y(t) It-; w, u,(t) o, u,(t)) Ny(t)(a-+ b, r (Ra 2 + 1)) i.e., the studied c.p.d.f, is again normal with the unknown value of the sample x(t) replaced by the population mean -. Compared with the calibration step the dispersion is increased, @ending on the slope and the dispersion ratio R.
Having the c.p.d.f, p(y(t) 1; w, u(t), u(t)), we are able to apply the Bayes equation (33). It is computationally advantageous to start it with the self-reproducing prior p.d.f, p(w O) which guarantees p(wlt) to have a fixed functional form. In the assumed normal case, the self-reproducing c.p.d.f of unknown The above quantities are to be given the initial conditions which express the user's available prior knowledge. The discussion of this choice is outside the scope of this paper and the reader is referred to Kdrnfi [7]. The evaluation of the integral (29) would provide the solution of the experience accumulation in the calibration task. It is, however, numerically difficult owing to dependence of the matrix V on the unknown parameter a. This is a consequence of the fact that not only control but also unknown samples are used for gaining information about the measuring device. We shall not enter into the computational details. Instead, the independence of the information matrix on the slope a is reached by the crude approximation d(t,a) d(t,d(t 1)) (41) where d(t 1) is an estimate of the slope a based on the data d(1...t 1). Assuming this simplified case, the required c.p.d.f (29) becomes a two-dimensional Student distribution [8], characterized by v(t) degrees of freedom, by the expected value f= E{[y(t),x (t)] 1; u (t), u(t)) [V13, V3] v IV V(t 1, d(t 2))] and the covariance matrix cov{[y(t),x(t)]]t-1; u,(t), u2(t)) (Vyx V3xf J) -' (v (t-# ) v,, where Vyx is the left-upper corner (2,2) submatrix of V. The evaluation of the above characteristics can be shown to be formally equivalent to well known recursive least squares. Note that the measured data enter into the second moment in a very non-linear fashion (cf discussion of dynamic programming).

Conclusions
The performed formalization of the calibration problem could be useful in the following directions: it clarifies the necessary models, assumptions and informational sources for the systematic solution of the problem; and it gives a conceptual solution of the problem which combines all informational sources (field and expert knowledge together with data) in a consistent way.
Application of the described methodology is far from trivial. It gives, however, guidelines both to experts in chemistry (which models are necessary and in which form) and associated mathematicians (the set of equations to be numerically handled is uniquely given). In our experience these claritications are ot such importance that they justify adding another paper to the literature dealing with calibration problems.