Statistical Analysis of Time Series with Scaling Indices

Statistical techniques based on scaling indices are applied to detect and investigate patterns in empirically given time series. The key idea is to use the distribution of scaling indices obtained from a delay representation of the empirical time series to distinguish between random and non-random components. Statistical tests for this purpose are designed and applied to specific examples. It is shown that a selection of subseries by scaling indices can significantly enhance the signal-to-noise ratio as compared to that of the total time series.


INTRODUCTION
If the behavior of a system evolving in time shows features that are "irregular" in some sense, the observed data series invites statistical modeling as a stochastic or random process. In such cases it is often useful and possible to conceptually distinguish between some null mode of behavior on one side and deviations from that null mode on the other. The notion of a null mode refers to some knowledge about the basic physical (null) characteristics of the system, which can be understood in terms of random or deterministic mechanisms. If there is no such knowledge, the situation is more difficult insofar as one has to start with assumptions as to model classes that may be appropriate to reproduce the observed data.
There are many approaches that have been developed in this spirit for many different purposes. Their success generally depends on both the nature of the deviations and that of the null mode. Deviations of a pre-conceived sort, such as a shift Corresponding author. or an abrupt change in the mean or trend of the process, can be detected efficiently if one uses a correspondingly tailored procedure, and if the actual deviation is of the anticipated sort. On the other hand, if the goal is to detect any deviation, or deviations one has only a vague notion of, then less specific procedures of a wider scope may be more appropriate, even at the risk of losing precision with respect to the distinction between different types of specific deviations.
In many practical cases of pattern detection and pattern recognition, the basic problem is to distinguish a (more or less) faint regular pattern, the deviation, in front of a random background, the null mode. In this framework, the most ambitious situation for modeling purposes is met if neither the type of regularity nor the type of randomness are known. This is the kind of situation for which a data-analytic method called scaling index analysis is particularly helpful. It aims at detecting non-random deviations from a random null mode and is applicable if the random process is (approximately) stationary in the null mode. Non-random contributions can be due to some (weak) regular signal or pattern superimposed on or replacing the null process, or due to increased correlations between successive measurements reducing the local variability of the time series. Using scaling indices, one can study this time variability geometrically by comparing the density gradients in suitably constructed point sets on different distance scales. In physical contexts, scaling indices or "crowding indices" [1] originate from the dimensional analysis of single fractals and multifractals ( [2][3][4], for an overview see [5]). For a mathematically oriented perspective see [6].
Scaling index analysis has primarily been used as a data-analytic tool so far, i.e., as a numerical algorithm extracting potentially interesting features embedded in a random background (see, e.g., [7][8][9][10][11]). The crucial concept in a scaling index analysis is a so-called Ha histogram (see below) whose properties can be investigated and discriminated against the known or expected null mode histogram. This is no problem as long as the deviations from the null behavior are marked enough to be easily visible. If this is not the case, it may be difficult to distinguish systematic but small deviations from deviations due to random fluctuations produced by the null process. Then a statistical test with an appropriate discriminant statistics is required, in which the error of the first kindto falsely classify a time series as "deviant" although it is in null mode is well controlled. If deviations occur only sporadically, it may also be interesting to single out those time intervals in which "something deviant seems to go on".
The goal of this paper is to propose methods for (i) the construction of statistical tests based on scaling indices, and (ii) the identification of conspicuous parts of the observed time series, selected by particular ranges of scaling indices. In this way, we focus on a precise statistical formulation of scaling index analyses which was not addressed in earlier publications listed above. Section 2.1 indicates the basic definitions and procedures for a scaling index analysis, Section 2.2 describes the test construction, and Section 2.3 gives a detailed example. Section 3 addresses how scaling index analyses can be used to enhance the signal-to-noise-ratio by selecting and investigating those parts of a time series that are responsible for deviations from a random null process. Section 4 discusses and summarizes the main results. For every and pre-defined radii rl,r2(0 < rl < r2) let a scaling index cx(t) be defined as log N2 (z(t)) log N (z(t)) a(t) (1) log r2 log r where N1,2(Zref)--#{t: [z(t)--Zref[<_rl,2} denotes the number of all points z(t), t= 1,..., T-d+ 1, whose Euclidean distance to the reference point Zre does not exceed rl,2.
Any scaling index analysis is based on the distribution of the scaling indices. This distribution can be represented by a histogram, the socalled Ha histogram. Its integrated version, the (cumulative) empirical distribution function (edf) of the scaling indices (si) is more appropriate for our purposes, because of its improved statistical stability. H(c0 will be referred to as the si-edf associated with {x(t)}.
The basic idea is to compare this si-edf H(c) with an appropriate reference distribution function Hrof characterizing the null process. In cases in which a continuous time process can be characterized by a multifractal measure, the limiting Hrof for Toc, then rl,2--+0, increases in the range of c corresponding to the (set of) dimension(s) on which the measure is concentrated. (For single fractals, the increase of Href reduces to a Heaviside step function corresponding to a f function type histogram.) If the null process is random, the range of c, over which Href increases, keeps growing toward larger c with increasing embedding dimension d. By contrast, for regular or chaotic deterministic processes the increase of the reference distribution stays at the same range of c for increasing d, provided that d is sufficiently greater than the "true" (set of) dimension(s) of the process.
In the case of a fixed finite observation segment and fixed radii, Href is generically broadened. As a consequence, sharp discriminations as in the limits T--+ oc, rl,2 0 become impossible (even for single fractals Other mechanisms (random, chaotic, or regular) typically produce clusters and voids, resulting in a scaling index distribution whose tails are more pronounced.
The distribution of the scaling indices also provides us with a clue to more refined analyses. For example, it allows to sort out, and study in itself, those parts of the time series corresponding to small scaling indices, which are of special interest as potential carriers of signals or patterns. Properly assessing the statistical significance of effects is particularly important in such a kind of refined analysis.
The main ingredients of the statistical framework adopted here are as follows.
(a) It is assumed that the null mode behavior of the system under study can be described by a Some remarks: Subsequently, we drop the notational distinction between time series considered as a sequence of measurement data or considered as a stochastic process. Secondly, it should be noted that the null hypothesis does not impose any restriction on the internal structure of the null process. However, our proposals make sense only if the process is stationary (or nearly so at least). Thirdly, 79 denotes the distribution of the whole stochastic process {x(t)}, not the (marginal) distribution of a single random variable x(t) at fixed (and likewise for 790 and {Xcal,i(t)}). Finally, the "true" null distribution 79o need not be known, but it is assumed that a reasonable model is available for the null process which allows us to generate a sample of auxiliary time series {xau(t): t-1,... ,T} (j-1,..., m) with a distribution close to Do.
Both calibration data and auxiliary data can be considered as surrogate data [13], though of different type. Usually, surrogate data are generated to simulate the distribution of the test statistics under the null hypothesis. To serve this purpose, calibration data from a real physical system are available in our case. The auxiliary data are generated for a different purpose, namely to construct a test statistics adapted to the null process (see Section 2.2). Furthermore, in our approach the distribution 79o as a whole should be matched. This is in contrast to the typical use of surrogate data (e.g., based on randomized Fourier phases [14]), where only some characteristics of 79o are taken into account.

Test Construction
As indicated above the test of the null hypothesis 7% proposed here will be based on the difference The notation P0 indicates that the probability of the event in brackets is computed with respect to the distribution 79o of the null process.
The deviation profile c(c0 has to account for the strong c-dependence of the variance (under 7{o) of the random variables A(c0. Typically this variance is much larger for c-values in the central part of a scaling index distribution rather than in its tails. If c were chosen as a constant, possible weak systematic deviations in the left wing of H(c0 would be concealed by large chance fluctuations in the center. For instance, this would be the case for a classical Kolmogorov-Smirnov type test, which rejects 7% if maxlA(c01 > c' or, in terms of the partial maxima, if M(c0 > c' for some c. Using partial maxima M(c0 instead of the (sometimes strongly fluctuating) function has a certain regularizing effect, but the essential The factor ")/A > 0 is determined (e.g., by trial and error) such that the (empirical) profile crossing probability PA n-l#{i <_ n: mc.l,i(o > CA(OZ for some a [Ctmin OZmax (7) with mcal,i (o) max Omi 0 calculated from the calibration data. The relative frequency PA represents a bootstrap estimate of the corresponding error probability in (5). Of course, the number of calibration data and auxiliary data should be large enough to make this a stable estimate.

Version (B):
where # is the same as in (A), and q(a) is computed as the root of the empirical variance m q(a) 2 m-' Maux,j(a) 2-#(a) 2 (9) j=l from the auxiliary data. The factor "B is determined such that the empirical profile crossing probability PB, defined analogous to PA, satisfies (7).
Other variants of this approach in addition to these two versions may also be worth considering.
For example, one could use the empirical standard deviations SA,aux(Ct) V/(1/m) )m= Aaux,j(Oz)2 to construct the preliminary profile and then correct it by a factor 7zx such that one has (with cx 7zxszx) The null hypothesis 7-{o is then rejected if IA(a)l > czx(a) for some a [amin, amax].

An Example
Details of the proposed test construction for a specific example are illustrated in Figures 1-6. We had n 1320 calibration time series (as described in [9]) at our disposal, each of length T= 10000. As a working hypothesis it is assumed that the Xcal,i(t)(t 1,... ,T; i= 1,... ,n) are independent random variables, identically distributed according to a binomial distribution B(nb,rb) with rib--200 trials and a probability of success 7rb= 1/2. A number of m=2500 auxiliary time series (roughly twice as much as the calibration data) were generated according to this "binomial model". The embedding dimension is d= 4, and the radii are rl 4.6, r2--12.7. therefore generically discrete (J: (1)). In the left tail of H, points with small occupation numbers Nl,z(Z(t)) produce discrete steps which are clearly visible in Figure 1. The central part of the distribution appears to be continuous, however, since the corresponding occupation numbers are large.
The difference A(c0 of an individual si-edf H and Hrer is shown in Figure 2 //(1/n) il Acal,i(c) 2, is also plotted in Figure 2 for comparison. For B(182,0.55) a prominent deviation from zero can be found within the range 3 < c < 4, whereas for c > 4 A is of the same order as sx,ca. The partial maxima M(0,c0, with lower limit Omin--0, are shown in Figure 3 for the same data sets. The notation M(omin,Oe) instead of M(c) indicates the interval over which the maximum is taken. The same notation will be adopted for the empirical error probabilities in (7), e.g., PA--PA(Omin,Omax).  It might be interesting to mention that Ref.
[9] discusses a concrete (real-world) example in which a similar, truncated form of Version A is used to detect non-random contributions in a random time series. In contrast to the detailed procedure proposed here, the procedure of [9] includes no adjustment of , and the profile CA(CO is assumed as #(c) (-y= 1).

ENHANCING THE SIGNAL-TO-NOISE RATIO
Let us assume that a random null process is sporadically substituted or superimposed by some faint regular or deterministic signal. Is it possible to identify those subseries of the entire time series which contain the "deviant" non-random contributions? This section deals with the question whether those subseries can be found by selection procedures based on scaling indices. If this can be done, it is an attractive idea to construct tests of the null hypothesis based on the selected subseries and study their statistical characteristics. Some preliminary steps in this direction are included at the end of this section. Details in this respect will be addressed in future work.
Obviously there are other conceivable methods to detect deviations from randomness in a time series; e.g., any type of autocorrelation analysis or, equivalently, Fourier analysis. If one focuses on the selection of local subseries containing those deviations rather than a global characterization of them, wavelet techniques might be good candidates for alternative approaches. This, however, would be beyond the scope of the present paper.
Our main purpose in this paper is to make a concrete proposal for a selection method based on scaling index distributions and to investigate, by simulation, whether it yields a higher signal-tonoise ratio in the selected subseries than in the original time series. In this context, the signal-to-noise ratio is defined as the relative number of data points x(t) due to the non-random signal.
Since non-random signals are basically expected to be associated with small scaling indices, the key idea is to select all data points x(t) whose scaling index c(t) falls below some threshold value A1. In order to avoid the discreteness problems discussed in Section 2.3, a lower threshold value A0 is required in addition. Since c(t) depends on the dtuple z(t)=(x(t),...,x(t+ d-1)) rather than x(t), selecting a data point at time in principle implies selecting the subsequent d-1 data points as well. Therefore the selected subseries consists of data points x(t) at instants belonging to a set S which can be formally defined as To study the performance of the selection procedure, auxiliary time series are generated according to the binomial model of Section 2.3. These auxiliary time series are then perturbed by "signals" in such a way that at randomly selected times "r < < "r the segment x('rj+l),...,x(-rj+L-1) of {x(t)} is replaced by some L-tuple =(O,...,L-1) fixed ad hoc. (For -j.+ 1--j < L, each segment is overwritten to the extent to which it overlaps with the subsequent segment.) The perturbed data series are then used as the time series {x(t)} under study. Within this setting, the signal-to-noise ratio is defined as #R/T for the full time series, and as #RA S/#S for the subseries {x(t): E S}. Here R denotes the set of all instants for which the original x value has been replaced by a value. Table I summarizes the results for various tuples of different length and form. Each entry is based on 20 time series generated as described above, with the same embedding dimension d and radii rl,2 as in Section 2.3. The largest enhancements of the signal-to-noise ratio, up to a factor of 19, are Nl,2-+-Nl,2 denote the corresponding quantities for the perturbed series in which a (small) number u-fT of tuples replaces the original z tuples at randomly selected times. It is easy to see that c_<0, such that the scaling indices always decrease under such replacements. A rough quantitative estimate of that shift can be obtained on noting that a fraction f of the (T-Nl,2) tuples z(t) outside the ball of radius rl,2 around is expected to be replaced by -tuples. This gives the estimate (SNI,zf (T-N,2), or provided that If/(1-f)]T/N1,2 is sufficiently small. Therefore, the (negative) shift c increases for decreasing N1. At first sight this seems to contradict the claim that the best enhancement is achieved for values close to 100, where N is large (and cd*). However, the statistical variability of A (c) decreases sharply with decreasing c in the left wing of H(c). On the other hand, in rarefied regions of the embedding space, where N1 is relatively small, the (expected) c shift is large, so that c tends to fall below the threshold A0.
The perturbed time series have a mean and standard deviation different from that of the      generated. The same selection procedure as for Tables I and II has been applied. As in case of replacement, reordering shows that the selection of subseries with small c can lead to an enhancement of the signal-to-noise ratio up to a factor of 21.
The reordered time series has the same state distribution as the original time series but different transition probabilities. In particular, the elements x(t) are statistically dependent. Therefore, sequential correlations (order) are introduced in the generated time series. These correlations are checked with a runs test (Tab. IV). Comparing Ptotal and Pselected in Table IV shows that the runs test is more sensitive when applied to the selected subseries of the reordered time series rather than to the whole reordered series. Of course, the same qualifying remarks as made above concerning t-and F-tests apply with respect to Pselected. Comparing  As a consequence of (i), the /-adjustments have to be computed anew (on the basis of the given calibration data) if the test is to be applied to another null process. Further massive computation is needed for the construction of the deviation profiles.
The null processes considered in our examples are all of the i.i.d, type. It is expected that qualitatively similar results would be obtained for state (null) distributions different from the binomial distribution B(200,1/2) (our main example), provided they are not too long-tailed. The situation must be more complex with null processes exhibiting statistical dependencies between states, e.g., ARMA processes containing deterministic components.
The power of the test depends on the special type of deviation from the null hypothesis. If the process {x(t)} under study is i.i.d, and only changes of the state distribution of {x(t)} are considered, there are well-known tests which are more efficient than the one proposed here. Since H(c0 is invariant under translations of the form x(t)+x(t)+ constant, the test has no power at all against mean shifts in the state distribution. It does have some power against scale dilations (variance changes), which cannot compete, however, with the power of the (variance ratio) F-test which is especially tailored to deviations of that kind. For instance, for time series generated according to the binomial model of Section 2.3 with B(193, 0.482), corresponding to a reduction of the variance by 3.63%, our test rejects the null hypothesis in about 20% of the data series, whereas the F-test (one-sided, c =0.05) has a rejection rate of about 50%.
More interesting situations arise if the time series is not stationary and/or its elements are correlated, particularly when this is the case only sporadically within the observed segment. Corresponding types TIME SERIES ANALYSIS WITH SCALING INDICES 309 of deviations can be studied with random time series (of the i.i.d, type) modified by various kinds of perturbations at randomly selected instants (or intervals). For example, tuples of L subsequent xvalues may be replaced by some ad hoc fixed Ltuple , or the time series may be reordered in such a way that the 's appear at randomly selected instants (or intervals).
As can be seen from Tables I and III, the test proposed in Section 2.2 is the more sensitive the less "complex" the non-random process is. For instance, a deterministic process with a small number of different states around the median of the x(t)'s of the random (null) process with a simple temporal order of the tuples results in a high detection rate. The test works equally well when the empirical distribution of the perturbed x(t)'s is the same as for the unperturbed ones, i.e., when the perturbation consists in a reordering of the time series. If the perturbation consists of a replacement by a fixed Ltuple , the power tends to be the higher the closer the -values are to the median of the x(t)'s. This may be contrasted with the behavior of standard tests which often react sensitively to outliers but draw little information from data points close to the center of the state distribution. Summing up, the test proposed in this paper is well suited for time series in which (1) states generated by a deterministic process with low complexity are to be separated from random elements, and (2) the deterministic elements are close to the median of the random elements. Within the investigated time series, subseries can be selected for which the signal-to-noise ratio is enhanced compared to the total time series.