Dependent Functional Data

This paper reviews recent research on dependent functional data. After providing an introduction to functional data analysis, we focus on two types of dependent functional data structures: time series of curves and spatially distributed curves. We review statistical models, inferential methodology, and possible extensions. The paper is intended to provide a concise introduction to the subject with plentiful references.


Introduction
Functional data analysis FDA is a relatively new branch of statistics, going back to the early 1990s, but its mathematical foundations are rooted in much earlier developments in the theory of operators in a Hilbert space and the functional analysis. In the most basic setting, the sample consists of N curves X 1 t , X 2 t , . . . , X N t , t ∈ T. The set T is typically an interval of the line. In increasingly many applications, it is however a subset of the plane, or a sphere, or even a 3D region. In those cases, the data are surfaces over a region, or more general functions over some domain, hence the term functional data. This survey is concerned mostly with the analysis of curves, but some references to more general functions are given in Section 1.1.
Functional data are high-dimensional data, as, in a statistical model, each functions X n consists of infinitely many values X n t , t ∈ T. In traditional statistics, the data consist of a sample of scalars or vectors. For example, for each survey participant, we may record age, gender, income, and education level. The data point thus has dimension four; it is a vector with quantitative and categorical entries. High-dimensional data typically have dimension comparable to or larger than the sample size. As they are often analyzed using regression models in which the sample size is denoted by n and the number of explanatory variables by p, high-dimensional data often fall into the "large p, small n" paradigm, but clearly they form a much broader class, with a great deal of work focusing on covariance matrices based ISRN Probability and Statistics on a sample of np-dimensional vectors. A distinctive feature of functional data is that the curves or surfaces are assumed to be smooth in some sense; if t 1 is close t 2 , the values X n t 1 and X n t 2 should be similar. In the "large p, small n" paradigm, there need not be any natural ordering of the covariates or any natural measure of distance between them. The analysis often focuses on the selection of a small number of relevant covariates the variable selection problem . In the FDA, the analysis involves obtaining a smooth, low dimensional representation of each curve.
FDA views each curve in a sample as a separate statistical object. In this sense, FDA is part of the object data analysis in which data points are not scalars or vectors, but structures which are modeled by complex mathematical objects, for example, by graphs. Some references are given in Section 1.1.
However, even curves are far more complicated structures than scalars or vectors. The curves are characterized not only by magnitude but also by shape. The shape of a random curve plays a role analogous to the dependence between the coordinates of a random vector. Human growth curves provide a well-known example. Suppose that there are N randomly selected subjects of the same gender. Let X k t j be the height of the kth subject measured at time t j from birth. The points t j are different for different subjects. Using methods of FDA, we can construct continuous and differentiable curves X k t , 0 ≤ t ≤ T . The shapes and magnitudes of these curves give us an idea about the variability in the process of growth, rather just about the variability of the final height, which can be assessed using the scalars X k T , 1 ≤ k ≤ N.
Some data can be very naturally viewed as curves. For example, if the height measurements are available at a fairly regular and sufficiently dense grid of times t j , it is easy to visualize them as curves, even though it is not immediately obvious how to compute derivatives of such curves. In many situations, the points t j are extremely dense. For example, physical instruments may return an observation every five seconds, so in a day, we will have 17,280 values t j . A day is a natural time domain in many applications, and the problem is to replace the 17,280 values X n t j available in day n by a smaller more manageable set of numbers. This is generally possible due to the assumption of some smoothness. At the other extreme are sparse longitudinal data. Such data often arise in medical research. For example, a measurement can be made on a patient only several times during the course of treatment. Yet we know that the quantity that is measured exists at any time, so it is a curve that is observed only at a few sparse time points. References to the relevant functional methodology are given in Section 1.1.
Growth curves or sparse observations on a sample of patients can be viewed as independent curves drawn from a population of interest. A large body of research in FDA has been motivated by various problems arising in such a setting. At the same time, many functional data sets, most notably in physical and environmental sciences, arise from long records of observations. An example is presented in Figure 1 which shows seven consecutive functional observations curves . These curves show a very rough periodic pattern, but modeling periodicity is difficult, as this pattern is, in fact, severely disturbed several times a month due to ionospheric storms. The 24 h period must however enter into any statistical model as it is caused by the rotation of the Earth. It is thus natural in this context to treat the long continuous record as consisting of consecutive curves, each defined over a 24 h time interval. Space physics researchers have long been associating the occur enhancements on a given day with physical phenomena in near Earth space. This gives additional support to treating these data as a time series of curves of evolving shape, which we will call a functional time series. Similar functional series arise, for example, in urban pollution monitoring studies. A functional time series does not however need to arise from cutting a continuous time record into adjacent pieces of natural length. Figure 2 shows a functional time series of curves representing centered Eurodollar futures prices. For each day, the time is not time within that day, but a time to the expiration of a contact, more details are given in Chapter 14 of Horváth and Kokoszka 1 . The curves show how the prices of contracts with various expiration horizons evolve from day to day.
Functional time series focus on temporal or otherwise sequential dependence between curves. In many applications, the curves are available at points in space. Such spatially indexed functional data will be denoted X s k , t , where s 1 , s 2 , . . . , s N are locations in some region and X s k , t is the value of the function X s k at time t. To compare such data structures to functional time series, consider two ways of looking at pollution data. If we consider only one location and are interested in the day to day pattern of pollution, we may consider curves X n t , where n is the day index and t is the time within the day. In many studies, we are however interested in long-term trends in pollution over a region, for example, a large city. In such a case, we may wish to smooth out the day to day variability on consider the curves X s k , t , where s k is the location of a measurement station within the city and t is time within a few decades. Such a data structure is difficult to show in one graph. The spatial location s k can be shown on a map. The curves X s k , t must be shown on a separate graph, as in Figure 3. As for the functional time series, the main feature of such data is that there is dependence of the characteristics of the curves which depended on the distance between them.
Of course, one can consider a more complex structure in which one may study spatially indexed functional time series, with the data being X n s k , t ; for a fixed s k , we have a functional time series, for a fixed n, we have a functional spatial random field. We briefly discuss such a setting in Section 5.
The focus of this paper is on the recent developments in the research on dependent functional data. Section 3 considers functional time series, and Section 4 spatially indexed functions. The required background is provided in Section 2. A summary, which includes the discussion of some problems of current interest, is given in Section 5.

Further Reading
In this section, we discuss several important topics mentioned above, mostly by directing the reader to appropriate references.

Monographs on FDA
The two editions of the book of Ramsay and Silverman 2 have done a lot to introduce FDA to the statistics community and beyond. The monograph introduces most of the fundamental ideas of FDA including penalized smoothing, data registration, functional linear regression, functional principal components and functional analysis of derivatives. Ramsay and Silverman 3 elaborate on many data examples in Ramsay and Silverman 2 by providing detailed case studies, while Ramsay et al. 4 focus on the computational aspects of the analyses introduced in Ramsay and Silverman 2 . These books are concerned with iid samples of curves.
The book of Ferraty and Vieu 5 covers nonparametric methods with special emphasis on nonparametric prediction and classification. It has several chapters on dependent curves α-mixing . It studies both practical and abstract aspects of the problems. The collections Ferraty and Romain 6 and Ferraty 7 contain contributions from prominent researchers illustrating the increasingly many facets of FDA. Ferraty and Romain 6 has a smaller ISRN Probability and Statistics 5 number of detailed papers, while Ferraty 7 has a large number of short communications. Shi and Choi 8 consider Gaussian regression models.
The monograph of Bosq 9 is highly recommended to anyone seeking a fast and rigorous introduction to the theory of functional time series. It contains all practically necessary facts from the theory of probability in Hilbert and Banach spaces, and a great deal of relevant theory of linear functional time series. Bosq and Blanke 10 elaborate on some aspects of this theory in an abstract way.
To fully understand some theoretical aspects of FDA a good background in the theory of Hilbert spaces is needed. There are many introductory textbooks, my personal favorites are Riesz and Nagy 11 , Akhiezier and Glazman 12 , and Debnath and Mikusinski 13 . There are fewer books on probability in Hilbert and Banach spaces, I have benefited from the monographs of Araujo and Giné 14 , Linde 15 , and Vakhaniia et al. 16 .
The monograph of Horváth and Kokoszka 1 elaborates on many topics mentioned in this paper, especially those related to the functional principal components.

Basis Expansions
In this paper, we focus on procedures for densely observed curves. For such data, functional objects needed to perform the calculations can be created using the R package fda. We will not address the details of calculations, but note that all necessary background is given in Ramsay et al. 4 . In the following, we often refer the choice of the a basis and the number of basis functions. We now briefly discuss this point.
Functions are observed at discrete sampling values t j , j 1, . . . , J, which may or may not be equally spaced. We work with N functions. These data are converted to functional objects. In order to do this, we need to specify a basis. A basis is a system of basis functions, a linear combination of which defines the functional objects. The elements of a basis may or may not be orthogonal. We express a functional observation X n as where the φ k , k 1, . . . , K are the basis functions. One of the advantages of this approach is that instead of storing all the data points, one stores the coefficients of the expansion, that is, the c nk . This step thus involves an initial dimension reduction and some smoothing. All subsequent computations are performed on the matrices built from the coefficients c nk . The number K of the basis functions impacts the performance of some procedures, but others are fairly insensitive to its choice. For the data studied in this paper, we generally choose K so that the plotted functional objects resemble original data with some smoothing that eliminates the most obvious noise. The two most commonly used basis systems are the Fourier and the B-spline bases. The Fourier basis is usually used for periodic, or nearly periodic functions with no strong local features and a roughly constant curvature. They are inappropriate for data with discontinuities in the function itself or in low order derivatives. The B-spline basis is typically used for nonperiodic locally smooth data. The B-spline basis functions are not orthogonal.

Sparsely Observed Functions
As noted above, in some applications, the data are available only at a few sparsely distributed points t j , which may be different for different curves, and the data may exhibit nonnegligible measurement errors. Yao et al. 17 introduce methodology to deal with such data; smoothing with a basis expansion is at best inappropriate, and often not feasible. The essence of the alternative approach is explained in Müller 18 . While there are various elaborations, see for example, Hall et al. 19 , the idea is that smoothing must be applied to all observations collectively, not to individual trajectories which basically do not exist for sparse data . The mean function is first obtained by smoothing the mean of all trajectories. Then the functional principal components are calculated as the eigenfunctions of a covariance kernel obtained by surface smoothing. Extensions of this technique have been applied in many settings, see for example, Müller

Extensions beyond Curves
Various brain scans can be viewed as functions over a spatial domain. Analysis of such data is discussed in several papers, see for example, Reiss  The study of the brain also provides motivation to study complex statistical objects. Aydin et al. 29 model blood vessels in the brain as trees. Kenobi et al. 30 and Crane and Patrangenaru 31 provide references to data analysis on manifolds.

A Modeling Framework for Functional Data Analysis
This paper focuses on inferential procedures for functional data. To derive them, a statistical model for the data is needed. In particular, the curves must be viewed as elements of some space. Several modeling frameworks have been proposed, including the semimetric spaces, see Ferraty and Vieu 5 , Sobolev spaces, see for example, Li and Hsing 32 , and more general Besov spaces, see for example, Abramovich and Angelini 33 and Pensky and Sapatinas 34 . Sobolev spaces are Hilbert spaces in which derivatives enter into the definition of the inner product. They are used to study procedures which involve smoothing by penalizing functions with large integrated derivatives of higher order typically order two . Besov spaces have been used to study wavelet expansions of functions, they are typically Banach spaces. In this paper, we consider only the simplest framework in which the curves are assumed to belong to the space L 2 T of square integrable functions on T, and, to be more specific, we assume that T 0, 1 . All results we state in the following that involve inner products hold in an arbitrary separable Hilbert space. However, when we use integrals and functions with an argument like t or s, the formulas assume that the Hilbert space is L 2 0, 1 . This space is sufficient to describe procedures based on functional means and variances; most inferential procedures for scalar data use only means and variances. The corresponding functional framework is introduced in Section 2.1. In Section 2.2, we define the functional principal components which provide a dimension reduction beyond that obtained via the basis expansion 1.1 ; K is typically around one hundred, whereas in our applications the number of the functional principal components needed to effectively approximate the data is a single digit number, very often 2, 3, or 4.
In Sections 2.1 and 2.2, we define the required first-and second-order characteristics functional parameters of a random function X. We cannot start with definitions based on a random sample because for dependent data sample statistics often need to be defined in a different way to ensure that they estimate the corresponding population theoretical parameters.

The Hilbert Space Model
The space L 2 L 2 0, 1 is the set of measurable real-valued functions x defined on 0, 1 satisfying 1 0 x 2 t dt < ∞. The space L 2 is a separable Hilbert space with the inner product x, y x t y t dt.

2.1
An integral sign without the limits of integration is meant to denote the integral over the whole interval 0, 1 . We view a random curve X {X t , t ∈ 0, 1 } as a random element of L 2 equipped with the Borel σ-algebra. We say that X is integrable if E X E X 2 t dt 1/2 < ∞. If X is integrable, there is a unique function μ ∈ L 2 such that E y, X y, μ for any y ∈ L 2 . It follows that μ t E X t for almost all t ∈ 0, 1 . The expectation commutes with bounded operators, that is, if Ψ ∈ L and X is integrable, then EΨ X Ψ EX . If X is square integrable, that is, and EX μ, the covariance operator of X is defined by It is easy to see that

ISRN Probability and Statistics
One can show that a bounded operator C is a covariance operator of some square integrable random function taking values in L 2 if and only if it is symmetric positive-definite and its eigenvalues satisfy ∞ j 1 λ j < ∞. To understand these statements, some background on operators in a Hilbert space is needed. The facts we now state will be used in the remainder of the paper.

Operators in a Hilbert Space
Consider an arbitrary separable Hilbert space H with inner product ·, · which generates the norm · , and denote by L the space of bounded continuous linear operators on H with the norm An operator Ψ ∈ L is said to be compact if there exist two orthonormal bases {v j } and {f j }, and a real sequence {λ j } converging to zero, such that The λ j may be assumed positive because one can replace The space S of Hilbert-Schmidt operators is a separable Hilbert space with the scalar product where {e i } is an arbitrary orthonormal base, the value of 2.7 does not depend on it. One can An operator with the last property is sometimes called positive semidefinite, and the term positive-definite is used when Ψ x , x > 0 for x / 0. A symmetric positive-definite Hilbert-Schmidt operator Ψ admits the decomposition The v j can be extended to a basis by adding a complete orthonormal system in the orthogonal complement ISRN Probability and Statistics 9 of the subspace spanned by the original v j . The v j in 2.10 can thus be assumed to form a basis, but some λ j may be zero.
An important class of operators in L 2 are the integral operators defined by with the real kernel ψ ·, · . Such operators are Hilbert-Schmidt if and only if If ψ s, t ψ t, s and ψ t, s x t x s dtds ≥ 0, the integral operator Ψ is symmetric and positive-definite, and it follows from 2.10 that

Functional Principal Components
Suppose X is a square integrable random function in L 2 . To lighten the notation, assume that μ 0. It is clear how to modify all formulas by subtracting the mean function μ from X. The eigenfunctions of the covariance operator defined by 2.3 are called the Functional Principal Components FPCs . The FPCs v j , j ≥ 1, are thus defined by The v j are orthogonal, and we assume that they are normalized to unit norm. The v j are defined only up to a sign. The random function X admits the Karhunen-Loéve expansion: and the covariance kernel c ·, · admits the expansion: The random variables ξ j are called the scores of X.

ISRN Probability and Statistics
An interpretation of the v j is often based on the following decomposition of variance: Another interpretation is the following. Fix p and consider the expected squared error E X − p j 1 a j u j 2 , where u 1 , u 2 , . . . , u p are orthonormal functions in L 2 , and a j are scalars. The above error is minimized if u j v j and a j X, v j .

Functional Time Series
Functional time series FTSs were introduced in Section 1, were several examples where given. Using the framework of Section 2.1, we will understand by a FTS a sequence of curves X n {X n t , 0 ≤ t ≤ 1}. Notice that n is the time index, which typically refers to day or year, and t is the time within that unit. In most textbooks and papers on time series analysis, the index t denotes time, but in FDA it denotes the argument of a function, so we use n to denote time and N to denote the number of available curves; N corresponds to the length of a time series.
The main idea behind functional time series modeling is that in many situations the time record can be split into natural intervals, and instead of modeling periodicity, we treat the curve in each interval as a whole observational unit. There are proponents and opponents of this approach. In several applications, it has been shown that the functional approach yields superior results, see for example, Antoniadis et al. 35 , a few examples are also given in Bosq 9 . But is clear that in many cases the more standard time series techniques are competitive, if not superior. Functional techniques can be expected to be definitely more useful if the daily or annual curves are not defined on a grid of equidistant time points. An interesting example of such a setting is studied by Liebl 36 who considers curves defined on different intervals on different days. The daily domain interval is defined by the minimum, m n , and maximum, M n , of the electricity demand on day n. For the demand h ∈ m n , M n , X n h is the price of electricity.
Most methods described in this section are based on the FPC. In certain applications, different orthonormal systems, similar in spirit to the FPCs but more optimal for a specific application can be used. In addition to the predictive factors of Kargin and Onatski 37 , we note the factors introduced by Bathia et al. 38 who considered the problem of estimating the dimension d assuming that the functional time series takes values in a finite dimensional subspace of L 2 .
The remainder of this section is organized as follows. We first review the autoregressive functional model which has been, by far, most extensively used and studied. We then turn to more general ways of describing temporal dependence between functions. We conclude with a discussion of some recent developments.
A reader seeking a good reference to the fundamental concepts of time series analysis is referred to Brockwell

Functional Autoregressive Process
The theory of autoregressive and more general linear processes in Hilbert and Banach spaces is developed in the monograph of Bosq 9 , so here we focus only on some recent research. A sequence {X n , −∞ < n < ∞} of mean zero elements of L 2 follows a functional AR 1 model if where Ψ ∈ L and {ε n , −∞ < n < ∞} is a sequence of iid mean zero errors in L 2 satisfying E ε n 2 < ∞.

Estimation
A popular approach to the estimation of the autoregressive operator Ψ involves the FPCs, see Section 2.2. Observe first that Define the lag-1 autocovariance operator by and denote with superscript · T the adjoint operator. Then, C T 1 CΨ T because, by a direct verification, C T 1 E X n , x X n−1 , that is, 3.4 We would like to obtain an estimate of Ψ by using a finite sample version of the relation Ψ C 1 C −1 . The operator C does not however have a bounded inverse on the whole of H. To see it, recall that C admits representation 2.10 , which implies that This makes it difficult to estimate the bounded operator Ψ using the relation Ψ C 1 C −1 . A practical solution is to use only the first p most important EFPC's v j , and to define The operator IC p is defined on the whole of L 2 , and it is bounded if λ j > 0 for j ≤ p. By judiciously choosing p, we hope to find a balance between retaining the relevant information in the sample, and the danger of working with the reciprocals of small eigenvalues λ j . In 12 ISRN Probability and Statistics formula 3.6 , the v j are the empirical or sample FPCs and the λ j are are their eigenvalues, both defined by the relation C v j λ j v j , where defines the empirical covariance operator. Direct calculations, see Chapter 13 of Horváth and Kokoszka 1 , lead to the estimator If the operator Ψ is a kernel operator defined by Ψ x t ψ t, s x s ds, then the estimator 3.8 is a kernel operator with the kernel In some simulated cases, the surfaces ψ p t, s may not closely resemble the true surface ψ t, s , see Chapter 13 of Horváth and Kokoszka 1 .

Prediction
The most direct use of the functional AR 1 model is to predict the curve X n 1 , and the most obvious method is to use the predictor X n 1 Ψ p X n , where Ψ p is the estimator 3.8 . For kernel operators, this predictor is given explicitly by X n 1 t ψ p t, s X n s ds where Kargin and Onatski 37 proposed a more sophisticated method that uses the so called predictive factors rather than the FPCs. Predictive factors are interesting in that they are functions which can be used to expand the curves, very much like the FPCs are used, but they define directions in the space L 2 which are most relevant for prediction. Using a finite sample implementation, Didericksen et al. 43 found that it does not lead to more accurate predictions. In general, predicted curves that use formula 3.10 tend to be closer to the mean curve and smoother than the actual observations. These predictions are however significantly better than just using the mean curve. A more detailed study is given in Didericksen et al. 43 and in Chapter 13 of Horváth and Kokoszka 1 .

Order Determination
A generalization of the functional AR 1 model 3.1 is the FAR p model We now use Z instead of X, because the letter X will be used below to denote a different quantity. The work of Kokoszka and Reimherr 44 shows how to determine the order p, and how to estimate the operators Φ j . The idea is to write 3.12 as a fully functional linear model. We start by expressing Φ j Z i−j as an integral over the interval j − 1 /p, j/p . Setting x : s j − 1 /p, a change of variables yields

3.13
Denoting by I j the indicator function of the interval j − 1 /p, j/p , we obtain 3.14 Next we define

3.15
Setting Y i Z i , we have where Ψ is an integral Hilbert-Schmidt operator with the kernel ψ, that is, Thus, if we can estimate Ψ, then we can estimate each of the Φ j . An estimator of Ψ can be constructed by any method valid for the fully functional linear model, for example using the basis or the FPCs expansions, see Chapter 8 of Horváth and Kokoszka 1 .
Determining the order p also uses representation 3.17 . The FAR p − 1 model will be rejected in favor of FAR p if an estimate of Φ p is large in an appropriate sense. Kokoszka and

ISRN Probability and Statistics
Reimherr 44 developed a multistage procedures based on this idea, but it is too complex to be described here.

Approximable Functional Time Series
For many functional time series, it is not clear what specific model they follow, and for many statistical procedures, it is not necessary to assume a specific model. It is however important to know what the effect of temporal dependence on a given procedure is. In this section, we describe a very general notion of dependence, which is convenient to use in the framework of functional data. We restrict ourselves to this framework. Several related useful results, including a functional in D 0, In this section, we use H to denote the function space L 2 L 2 0, 1 to avoid confusion with the space L p of scalar random variables.

3.22
We call this method the coupling construction. Since this modification lets condition 3.21 unchanged, we will assume from now on that the X Example 3.2 functional autoregressive process . Suppose that Ψ ∈ L satisfies Ψ L < 1. Let ε n ∈ L 2 H be iid with mean zero. Then there is a unique stationary sequence of random elements X n ∈ L 2 H such that X n t Ψ X n−1 t ε n t .
Several other models are L p -m-approximable, examples are given in Chapter 16 of Horváth and Kokoszka 1 . But we emphasize that the main role of L p -m-approximability is to provide a convenient nonparametric framework for quantifying temporal dependence of curves. In this sense it is similar to various mixing conditions, but there is no obvious relationship between it and the traditional mixing conditions, as defined by Doukhan 49 or Bradley 50 . It is not related to the conditions of Doukhan and Louhichi 51 either. It is similar to the conditions of Wu 52,53 , in that it assumes that the observed functions are nonlinear moving averages Bernoulli shifts of iid unobservable errors.
Recently Lian 54 extended the concept of L p -m-approximability by replacing the L p norm by a more general Orlicz norm. If ψ is a convex function on 0, ∞ with ψ 0 0, then we define

3.24
If ψ x x p , then ν ψ ν p . The remaining elements of Definition 3.1 are extended without difficulty.

Convergence of the Eigenfunctions and Eigenvalues
An important result valid for L 4 -m-approximable FTSs is that the empirical eigenfunctions and eigenvalues converge at the same rate as for the iid functions. This property allows us to extend many results established for functional random samples to FTSs. We state this result in Theorem 3.3, in which v j and λ j are the eigenelements of the empirical covariance operator 3.7 .
H is an L 4 -m-approximable sequence such that the largest d eigenvalues λ j of its covariance operator are positive and distinct. Then, for 1 ≤ j ≤ d,

3.25
Kokoszka and Reimherr 55 showed that under the assumptions of Theorem 3.3 the v j are asymptotically normal. For linear functional series this asymptotic normality is implicit in the work of Mas 56 .

The Long-Run Variance
A central concept in time series analysis is the long run variance LRV which replaces the variance in many well-known formulas for valid for iid observations. Let {X n } be a scalar stationary sequence. Its long run variance is defined as σ 2 j∈Z γ j , where γ j Cov X 0 , X j , provided this series is absolutely convergent. In that case Var X N ∼ σ 2 /N. The concept of the LRV is extended to stationary random vectors. It turns out that L 2 -m-approximability implies that the LRV of exists and can be consistently estimated for vector-valued sequences. Details are given in Chapter 16 of Horváth and Kokoszka 1 . In the functional context, it is more natural to work with long-run variance kernels. The following results were established by Horváth et al. 57 .
converges in L 2 0, 1 × 0, 1 (hence σ ·, · is square integrable). Moreover, where Z is a mean zero Gaussian element of L 2 0, 1 with the covariance function E Z t Z s σ t, s .
To apply Theorem 3.4, we must estimate the kernel σ ·, · . We now turn to this objective. Let K be a kernel weight function defined on the line and satisfying the following conditions: K 0 1, K is continuous and bounded,

3.28
Condition 3.28 is assumed only to simplify the proofs, a sufficiently fast decay could be assumed instead. Next we define the empirical sample correlation functions The estimator for σ t, s is given by where h h N is the smoothing bandwidth satisfying Then, under the conditions on the kernel K · and the bandwidth h h N stated above,

Self-Normalization
Estimation of the LRV, even for scalar time series, is difficult due to the selection of the bandwidth h. Data-driven methods exist, most notably those based on the theory of Andrews 58 , but they do postulate a specific form of dependence. Moreover, some tests in which the LRV is estimated by such data driven procedures suffer from the problem of nonmonotonic power, see Shao and Zhang 59 . Block bootstrap and subsampling offer useful tools for solving many problems of time series analysis, but they again depend on bandwidth type block sizes whose selection is not obvious. Even though some recommendations for the selections of such block sizes exist, see for example, Bühlmann in the Skorokhod space. The parameter σ 2 is the LRV:

3.37
To see why 3.37 holds, set and observe that Next, observe that n j 1

3.41
Consequently, The convergences in 3.40 and 3.42 are joint, so 3.37 follows.
The key point is the cancelation of σ 2 when 3.40 is divided by 3.42 . Relation 3.42 shows that D N is an inconsistent estimator of σ 2 . The distribution of the right-hand side of 3.37 can however be simulated, and the critical values can be obtained with arbitrary precision. Relation 3.37 can be used to construct a confidence interval for μ without estimating the LRV. Such a construction does not require the selection of a bandwidth parameter in the kernel estimates of σ 2 .
Zhang et al. 66 extend the idea of self-normalization to the detection of change point in FTSs. A normalization analogous to D N is not suitable for the change point problem, see Shao and Zhang 59 . It must be modified to take into account a possible change point. The details are too complex to be discussed here. The change point problem is discussed in Section 3.3.

Two-Sample Problems
In a two-sample problem, we consider two samples X 1 , . . . , X N and X * 1 , . . . , X * M , which may be realizations of two functional time series, or a single series, but taken over different time intervals. The statistical problem consists in testing if some specified characteristics of the samples are equal. To illustrate, suppose we want to test if the mean functions are equal. We assume the model

3.43
We wish to test the null hypothesis H 0 : μ μ * against the alternative that H 0 is false. We can also test for the equality of the covariance operators, or any other quantities that might be of interest. In the FDA setting, testing for the equality of means, the FPCs, and the covariance operators has received a fair deal of attention.
An important contribution has been made by Benko et al. 67 who developed bootstrap procedures for testing the equality of mean functions, the FPCs, and the eigenspaces spanned by them. Horváth et al. 57 developed asymptotic tests for testing the equality of mean functions. Laukaitis and Račkauskas 68 considered the model X g,i t μ g t ε g,i t , g 1, 2, . . . , G, with innovations ε g,i and group means μ g , and test H 0 :

The Change Point Problem
When curves form a time series, it is typically assumed that, possibly after some transformation, they have the same distribution in L 2 . If some aspects of this distribution change, inference will become invalid. For example, if a mean changes at some time point, the sample mean will not estimate any population parameter. Change point analysis is a very broad field of statistics with applications in industrial quality control, study of economic, financial and environmental time series, and many other areas. Many monographs are available, including Brodsky and Darkhovsky 76 , Basseville and Nikifirov 77 , Csörgő and Horváth 78 , Chen and Gupta 79 , and Basseville et al. 80 .
The idea of change point analysis is simple, and we explain it in the functional context using the change in mean function as an example. We observe functions X i , 1 ≤ i ≤ N, and we test the null hypothesis The simplest alternative is that of a single change point: where the Y i has the same distribution with mean zero, and μ 1 · / μ 2 · . Note that under H 0 , we do not specify the value of the common mean, and under H A we do not specify μ 1 · nor μ 2 · . It is also important to distinguish between a change point problem and the problem of testing for the equality of means. In the latter setting, it is known which population or group each observation belongs to. In the change point setting, we do not have any a priori partition of the data into several sets with possibly different means. The change can occur at any point, and we want to test if it occurs or not, and, if it does, to estimate the point of change.
The simplest, and in many settings most effective, change point detection procedures are based on cumulative sums CUSUM procedures . In the above setting, denote If the mean is constant, the difference Δ k t μ k t − μ k t is small for all 1 ≤ k < N and all t ∈ 0, 1 . However, Δ k t can become large due to chance variability if k is close to 1 or to N. It is therefore usual to work with the sequence 3.47 in which the variability at the end points is attenuated by a parabolic weight function. If the mean changes, the difference P k t is large for some values of k and of t. use self-normalization to deal with temporal dependence and study change points in mean and in the covariance structure. Aston and Kirch 83 consider change point detection in a dependent sequence of functions assuming that the alternative is not a single change point but a change interval, the so-called epidemic alternative. The problem of detecting a change in the mean of a sequence of Banach-space valued random elements under an epidemic alternative is theoretically studied by Račkauskas and Suquet 84 who propose a statistic based on increasingly fine dyadic partitions of the index interval, and derive its limit, which is nonstandard.

Other Temporal Dependence Structures
The most extensively used and studied model for FTS is the FAR 1 model. Some nonlinear ARCH-type models are introduced in Hörmann et al. 85 . Similar models are used by Kokoszka and Reimherr 86 in a simulation study. Kokoszka and Reimherr 86 study the predictability not the prediction of the cumulative intraday returns CIDRs defined by R n t j 100 ln P n t j − ln P n t 1 , j 2, . . . , m, n 1, . . . , N, where P n t j , n 1, . . . , N, j 1, . . . , m is the price of a financial asset at time t j on day n. Kokoszka et al. 87 , see also Kokoszka and Zhang 88 , study the dependence of the CIDRs defined above on other factors, including the CIDRs on market indexes and energy futures. They consider a factor model of the form: 3.49

ISRN Probability and Statistics
A different class of functional factor models was recently proposed by Hays et al. 89 . These models, introduced to forecast yield curves, say X n , are of the form The factors F k do not depend on n and are orthonormal functions to be estimated. The dynamics are in the coefficients β nk which are assumed to follow Gaussian autoregressive processes the ε n are also Gaussian . Model 3.50 could be termed a statistical factor model. It is designed for temporal forecasting, while model 3.49 is designed for regression type prediction in which that correlation structure of factors plays a major role. Gabrys et al. 90 developed a test to determine if the regressors ε n in the fully functional linear model Y n t ψ t, s X n s ds ε n t are correlated. Benhenni et al. 91 consider a nonparametric regression Y n r X n ε n in which X n are functions and Y n and ε n scalars. They allow the ε n to have either short or long memory.
A different framework is developed by Battey and Sancetta 92 who assume that the FTS X n is a Markov process and are concerned with the estimation of the conditional probabilities P X n t ≤ x t | X n−1 x 0 , where x and x 0 are deterministic functions.

Geostatistical Functional Data
An interesting class of functional data are curves observed at several spatial locations, as already mentioned in Section 1. For example, X s k ; t can be the concentration of a pollutant at location s k and measured at time t. In this section, we thus assume that the data consists of N curves X s k ; t , t ∈ 0, 1 , 1 ≤ k ≤ N. The analysis of such a data structure must draw heavily on the concepts and tools of spatial statistics. In fact, the random field X defined above is a special case of a spatiotemporal process. In this paper, we cannot provide the details of the tools of spatial statistics that we use. The required background, and much more, is given, for example, in Schabenberger and Gotway 93 , Gelfand et al. 94 , and Sherman 95 . We will however introduce the concepts to the extend needed to get a good idea of the main problems and solutions. This is done in Section 4.1. Then, in Section 4.2, we consider the fundamental problem of the estimation of the mean function. Section 4.3 points out to research on functional kriging. We note that there are functional data structures beyond those discussed here, we refer to Delicado et al. 96 , who also discuss Bayesian approaches.

Basic Concepts of Spatial Statistics
A sample of spatial data is {X s k , s k ∈ S, k 1, 2, . . . , N}. The X s k are now scalars. The spatial domain S is typically a subset of the two-dimensional plane or sphere. The observed value X s k is viewed as a realization of a random variable. Just as in time series analysis, stationary random fields play a fundamental role in modeling spatial data. To define arbitrary shifts, we must assume that S is either the whole Euclidean space R d , or the whole sphere. If C h depends only on the length h of h, we say that the random field is isotropic. The covariance function of an isotropic random field is typically parametrized as The function φ is called the correlation function. For example, the exponential correlation function is given by φ h exp{− h/ρ }. The main idea of spatial modeling is that some sort of empirical estimate of C h is first obtained, which is typically a very irregular, noisy function. Then a suitable parametric covariance function C h is fitted to ensure that the resulting covariance matrix {C s k − s , 1 ≤ k, ≤ N} is positive definite. There are many approaches and nontrivial issues involved in this process.

Estimation of the Mean Function and of the FPCs
We assume that the functions X s k , · are observations of a stationary and isotropic spatial random field taking values in the space L 2 . In particular, the mean function μ t EX s; t does not depend on the location s. The estimation of this mean function is the most important first step of the inference for spatially distributed functional data. If the curves X k are iid or form a time series, the usual sample mean is used. But if the curves are available at spatial locations, the curves at locations which are close to each other should be given smaller weights because they are very similar and contribute roughly the same information. This suggests that μ could be estimated by with the weights w k defined to minimize E N n 1 w n X s n − μ 2 subject to the condition N n 1 w n 1. Using the method of the Lagrange multiplier, it is not difficult to find a closed form expression for the weights w n . This expression involves unknown covariances Gromenko et al. 97 proposed an iterative procedure for the estimation of the weights w n and the covariances C k . They showed that the weighted average 4.4 is a significantly better estimator than the usual sample average. Hörmann

Estimation of the FPCs
If the random functions X s k , · are realizations of a stationary L 2 -valued random field, then they have the same FPCs v j . Estimation of the v j in the spatial setting involves issues similar to those related in the estimation of the mean function μ, but the notation and approaches are more complex. They are discussed in Gromenko

Kriging and Other Research
A very important problem is to predict a curve at a specified location using the curves at available locations. This problem was addressed in Nerini et al. 104 and Giraldo 105 . Spatial prediction of this type is called kriging. Theoretical background to kriging of scalar fields is given in Stein 106 . The book of Wackernagel 107 gives an accessible introduction to kriging emphasizing multidimensional observations, a setting related to kriging of curves. Just as in the case of estimating the mean function, there are several approaches to kriging. The approach advocated by Giraldo et al. 105 is "most functional" in that it treats the curves as whole data objects. Similarly to 4.4 , it predicts the unobserved function X s 0 by a linear combination The estimation of the weights w n requires some work. Giraldo et al. 108 consider the problem of determining clusters in in spatially correlated functional data. Härdle and Osipienko 109 show how spatial analysis of functional data can be used to calculate more accurate prices of weather derivatives.
An emerging important class of functional models are those with hierarchical structure and correlation at some levels, similar to spatial correlations discussed in this Section. Such models have applications in the analysis of medical experiments in which tissue samples are taken at several locations in an organ of a subject. Staicu et al. 110 propose fact methods for the estimation of such models.

Summary and Future Directions
We have reviewed recent developments in the analysis of dependent functions. We considered two data structures: functional time series and spatially distributed functions. Functional time series are collections of curves X n , where n can be interpreted as time, most often day or year. The central point in the analysis and modeling of such data is to take into account the dependence of the curves X n . Spatially distributed functions, or geostatistical functional data, consist of curves X s k available at locations s k . The main issue is to take into account the uneven distribution in space of the points s k .
Inference for data of both types assumes that the data are stationary, possibly after some transformation. At present there are no suitable tests of stationarity for such functional data. Second-order stationarity of FTS could potentially be tested using spectral methods. In many settings, a hybrid data of the form X n s k , t can be considered. It could, for example, be maximum temperature or the count of flu cases on day 1 ≤ t ≤ 365 of year n at location s k . Such a data structure could be useful to study long term trends or changes in the annual pattern of a variable of interest over a region. There has not been much work on functional data of this form; Odei et al. 116 consider Bayesian modeling of snow melt curves. It their setting, X n s k , t is the amount of snow melt water on day 1 ≤ t ≤ 365 of year n at a location s k in Utah.
For long records of geophysical, weather, or environmental data, a serious problem are long segments of missing observations. For example, if X s k , t is the rainfall on day t at location s k , there may be whole years of missing data at certain locations when a station is closed , and these segments will be different at different locations. An important challenge is to develop useful approaches that allow us to pool together information from curves at all locations when some of them have long missing segments. This problem is somewhat related to the problem of dealing with sparse functional data mentioned in Section 1.1, but it adds the complication of spatial dependence.
The functions discussed in this paper, whether those observed consecutively over time or at spatial locations, are assumed to be smooth, so that methods relying of basis and FPCs expansions can be applied. Some functions do not fall into this category and may exhibit sharp spikes and flat regions. There has not been much work on the time series or spatial fields of functions of this type. Timmermans and Von Sachs 117 propose an exploratory tool that aims at detecting the closeness of curves whose significant sharp features might not be well aligned.
The study of extremes involves work with point processes. For example P n s k could be a point process of threshold exceedances in year n at location s k . No systematic modeling framework for such point process valued time series is available. The estimation of exceedance probabilities is studies in Draghicescu and Ignaccolo 118 .
In summary, the study of dependent functional data has reached a level of maturity that makes it a useful subfield of FDA, but many important problems remain to be addressed. It is hoped that this paper has provided a useful introduction into this area.