Automatic detection of the autocorrelation-type measurement error component

Automatic detection of measurement errors is extremely important in automatic analysis of large analytical sample sequences. Error detection usually involves samples from known, control solutions which are regularly introduced into a sequence of unknown samples. The control can then be performed using proven graphical methods [1] based on the sequence of the measured control values during a longer period (day or shift). Numerical methods of detection are becoming increasingly important in microcomputer-based laboratory monitoring systems.


Introduction
Automatic detection of measurement errors is extremely important in automatic analysis of large analytical sample sequences. Error detection usually involves samples from known, control solutions which are regularly introduced into a sequence of unknown samples. The control can then be performed using proven graphical methods [1] based on the sequence of the measured control values during a longer period (day or shift). Numerical methods of detection are becoming increasingly important in microcomputer-based laboratory monitoring systems. This paper discusses one component of the measurement error: the so-called 'autocorrelation-type' error. This error component appears when the measured values of subsequent samples influence each other; it is a frequent error in analytical measurements. Its main sources are either sorption phenomena or inertia effects in the measurement or registration devices.
The main purpose of this paper is to propose an efficient numerical method for the detection of autocorrelationtype measurement error components.
First, several models of the measurement error in automatic analysis are investigated. Based on this, the numerical method proposed is decribed. Finally, the results of the validation of the proposed method and its comparison with the graphical LAG-1 method is presented for control sample sequences.

Materials and experimental methods
The sample sequences investigated were control samples with 0.5, 1.0 and 3.0 g/1 Merck quality ethylalcohol concentrations. They were analysed with a Perkin-Elmer F42 gas chromatograph, which was equipped with an automatic headspace sampler. The sample sequences contained control samples introduced in a random manner in order to produce the autocorrelation-type error component. Flushing was used to remove the remaining part of the sample from the sampling capillary and LAB pipettes were used for sampling. The internal standard method was used for evaluation and a solution of 0.5 g/1 l-propanol was added to each sample for this purpose. To avoid systematic error due to sampling, the internal standard was added tt) the sample with the same pipette and pipette-tip in all cases.
The sample sequences were repeatedly measured under different conditions. The parameters varied were sampling with the same or different pipettes and tips, and the duration of the flush.
Measurements were performed under the following conditions: (1) Sequence: sampling with the same pipette (the first source of autocorrelation-type error) and with the same tip (the second source of autocorrelation-type error), 0 s flush (no flush) (the third source of autocorrelation type error); (2) Sequence: sampling with the same pipette but three different tips, 0 s flush; (3) Sequence: sampling with the same pipette and tip, 15 s flush (a weak source of autocorrelation-type error); (4) Sequence: sampling, with three different pipettes and three different tips, 15 s flush.
The evaluation of the chromatograms was performed by two separate methods: based on the ratios ofpeak areas or that of peak heights of ethylalcohol and the internal standard.  [2] has thoroughly investigated measurement errors in gas chromatography, and several attempts have also been made to construct a generally valid measurement error process model [3 and 4].
The measured value sequence, as well as the measurement error sequence, can be modelled using discrete time stochastic processes (stochastic sequences). It can be shown that under relatively weak assumptions (assuming no discontinuity in the trajectories of the investigated processes with probability and the Markov-property), the continuous time measurement error and measured value processes (in the case of continuously-flow samples) can be described by Ito-processes [5]. The discrete time model, i.e. the measurement error and measured value sequences can be derived from this model by applying equidistant sampling [6]. This discrete time model takes the form: t m(t.l,t) + f3 (t, /)*Et There are three different terms on the right-hand side of equation (5)  A(t.l,,t) ao + a*(t-t-1).
(6) Substituting equation (6) into equation (5) gives: t a0a* (t-t-1) --M*t + S*et + xt. (7) Applying the definition of the measurement error sequence results in the following equation: xt a0 + M't.+ a(t-_) + S*et. (8) Note that in almost all practical cases, the standard deviation of the random measurement error component (S) and the mean of the measurement error (a0 + M't) depend on the true value Xt: S S(Xt) and a0 + M*t A0 (Xt). (9) Detection of the autocorrelation-type error component Detection requires testing the hypothesis on the presence of the autocorrelation-type measurement error component based on the mathematical model (equation (5)).
The absence of the autocorrelation-type component is mathematically equivalent with the equality a 0.
It can also be seen from equation (8)  In order to test this hypothesis, the domain ofthe possible values in the difference (-t_) must be divided by the points: , < 2 <... < /.
(10) With the help of these points, hypothesis 1, can be approximated by a set of other hypotheses as follows: It is important.to note that the set of hypothesis 2 is only an approximation of hypothesis 1, but it has a great practical 'advantage in this form'. As it can be seen from equation (11), the hypothesis can be easily tested with known statistical tests (F-test and t-test) and the necessary values in the condition can be easily computed from the measured value sequence, t, itself. The only problem from the computational viewpoint is how the values of the measurement error xt can be computed.
For this purpose, the control samples can be used because their true values, x, are assumed to be known. In this case it is assumed that the error of the control liquid sample preparation is negligible compared to the effects of the other measurement error components.
In the case of automatic analysis of large analytical sample sequences, control samples occur rarely in the sample sequences compared to the unknown samples. Thus the points in equation (10) must be chosen very carefully in order to have enough sample for testing each hypothesis in set 2 (equation (11)) for a good approximation of hypothesis 1. In order to verify the numerical method, sample sequences containing only control samples were used in the authors experiments. This results in an increased number of suitable samples in the measurement error and in an obvious choice of the points in equation (10). The control samples with 0.5, 1.0, and 3.0 g/1 ethylalcohol The proposed method and its application to control sample sequences In order to show how tests for the set of hypotheses given in equation (11) can be performed easily, the computed quantities needed for the numerical method have been collected and arranged in tables 1-4, according to the four sample sequences described previously. The quantities in tables 1-4 have been computed from the measured values as follows.
A row in a table belongs to a given control sample concentration and to a given evaluation method (for example 0.5 g/1 ethylalcohol concentration and ethylalcohol/internal standard peak area ratio evaluation).
The test of the hypothesis 2 set for each row is done in three steps: (1) The number, the empirical mean and the empirical variance of the samples with the given concentration is computed and placed in the fourth column. After this, these samples are divided into three groups according to the previous sample concentration. The above characteristics (number, empirical mean and empirical variance) of each group is computed and put into columns one to three respectively.
(2) F-tests [7] can be used for testing hypothesis 2 (equation (11) for the variances. It is sufficient, however, to perform the test for the ratio ofthe columns with maximal and minimal variances. The computed F-value, together with the result of the hypothesis test, is put to the fifth column. The result of a test is positive ('+' sign) if the hypothesis has been proved true on the given significance level.
(3) If the result of the F-test is positive, the two sample t-tests can be applied to discover whether hypothesis 2 (equation [11]) holds for the mean values. In this case, it is also sufficient to perform the test for the columns with minimal and maximal means, applying the empirical variance of the fourth column as a common variance. The computed t-value and the result of the test can be found in the last column.

Conclusions
By applying this numerical and graphical (LAG-I) method for detecting autocorrelation-type measurement errors, it is evident that the results are the same for both methods.
In the case of the peak height ratio evaluation method, there were no such measurement circumstances when no autocorrelation-type measurement error component was present.
When applying the peak area ratio evaluation method, sequences three and four have been shown to have no autocorrelation-type measurement error component according to the numerical (see tables 3 and 4) and the graphical methods. This indicates that the circumstances of the flush have much more influence on the measurement error than do the other measurement circumstances (pipettes).
From the data ofthe numerical method (tables and 2), it can also be seen that the empirical variances of the ratio of the peak height is much smaller than that of the peak areas. This fact is in good agreement with previous investigations [8]. As a consequence, the use of the peak height ratio allows the detection of smaller autocorrelation-type measurement error components than would be possible in the case of the ratio of the peak areas. At the same time, it can also be found that the empirical variance of the samples is much more influenced by the difference of the current and previous measured value, than the empirical mean of them, i.e. the autocorrelationtype measurement error component appears much more sensistive in the empirical variance.