PROBSTAT ISRN Probability and Statistics 2090-472X International Scholarly Research Network 958254 10.5402/2012/958254 958254 Review Article Dependent Functional Data Kokoszka Piotr D'Urso P. Jorgensen P. E. Department of Statistics, Colorado State University, Fort Collins, CO 80523 USA colostate.edu 2012 16 10 2012 2012 12 07 2012 13 08 2012 2012 Copyright © 2012 Piotr Kokoszka. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This paper reviews recent research on dependent functional data. After providing an introduction to functional data analysis, we focus on two types of dependent functional data structures: time series of curves and spatially distributed curves. We review statistical models, inferential methodology, and possible extensions. The paper is intended to provide a concise introduction to the subject with plentiful references.

1. Introduction

Functional data analysis (FDA) is a relatively new branch of statistics, going back to the early 1990s, but its mathematical foundations are rooted in much earlier developments in the theory of operators in a Hilbert space and the functional analysis. In the most basic setting, the sample consists of N curves X1(t),X2(t),,XN(t),t𝒯. The set 𝒯 is typically an interval of the line. In increasingly many applications, it is however a subset of the plane, or a sphere, or even a 3D region. In those cases, the data are surfaces over a region, or more general functions over some domain, hence the term functional data. This survey is concerned mostly with the analysis of curves, but some references to more general functions are given in Section 1.1.

Functional data are high-dimensional data, as, in a statistical model, each functions Xn consists of infinitely many values Xn(t),t𝒯. In traditional statistics, the data consist of a sample of scalars or vectors. For example, for each survey participant, we may record age, gender, income, and education level. The data point thus has dimension four; it is a vector with quantitative and categorical entries. High-dimensional data typically have dimension comparable to or larger than the sample size. As they are often analyzed using regression models in which the sample size is denoted by n and the number of explanatory variables by p, high-dimensional data often fall into the “large p, small n” paradigm, but clearly they form a much broader class, with a great deal of work focusing on covariance matrices based on a sample of  np-dimensional vectors. A distinctive feature of functional data is that the curves or surfaces are assumed to be smooth in some sense; if t1 is close t2, the values Xn(t1) and Xn(t2) should be similar. In the “large p, small n” paradigm, there need not be any natural ordering of the covariates or any natural measure of distance between them. The analysis often focuses on the selection of a small number of relevant covariates (the variable selection problem). In the FDA, the analysis involves obtaining a smooth, low dimensional representation of each curve.

FDA views each curve in a sample as a separate statistical object. In this sense, FDA is part of the object data analysis in which data points are not scalars or vectors, but structures which are modeled by complex mathematical objects, for example, by graphs. Some references are given in Section 1.1.

However, even curves are far more complicated structures than scalars or vectors. The curves are characterized not only by magnitude but also by shape. The shape of a random curve plays a role analogous to the dependence between the coordinates of a random vector. Human growth curves provide a well-known example. Suppose that there are N randomly selected subjects of the same gender. Let Xk(tj) be the height of the kth subject measured at time tj from birth. The points tj are different for different subjects. Using methods of FDA, we can construct continuous and differentiable curves Xk(t),0tT. The shapes and magnitudes of these curves give us an idea about the variability in the process of growth, rather just about the variability of the final height, which can be assessed using the scalars Xk(T),1kN.

Some data can be very naturally viewed as curves. For example, if the height measurements are available at a fairly regular and sufficiently dense grid of times tj, it is easy to visualize them as curves, even though it is not immediately obvious how to compute derivatives of such curves. In many situations, the points tj are extremely dense. For example, physical instruments may return an observation every five seconds, so in a day, we will have 17,280 values tj. A day is a natural time domain in many applications, and the problem is to replace the 17,280 values Xn(tj) available in day n by a smaller more manageable set of numbers. This is generally possible due to the assumption of some smoothness. At the other extreme are sparse longitudinal data. Such data often arise in medical research. For example, a measurement can be made on a patient only several times during the course of treatment. Yet we know that the quantity that is measured exists at any time, so it is a curve that is observed only at a few sparse time points. References to the relevant functional methodology are given in Section 1.1.

Growth curves or sparse observations on a sample of patients can be viewed as independent curves drawn from a population of interest. A large body of research in FDA has been motivated by various problems arising in such a setting. At the same time, many functional data sets, most notably in physical and environmental sciences, arise from long records of observations. An example is presented in Figure 1 which shows seven consecutive functional observations (curves). These curves show a very rough periodic pattern, but modeling periodicity is difficult, as this pattern is, in fact, severely disturbed several times a month due to ionospheric storms. The 24 h period must however enter into any statistical model as it is caused by the rotation of the Earth. It is thus natural in this context to treat the long continuous record as consisting of consecutive curves, each defined over a 24 h time interval. Space physics researchers have long been associating the occur enhancements on a given day with physical phenomena in near Earth space. This gives additional support to treating these data as a time series of curves of evolving shape, which we will call a functional time series. Similar functional series arise, for example, in urban pollution monitoring studies.

The horizontal component of the magnetic field measured in one minute resolution at Honolulu magnetic observatory from 1/1/2001 00:00 UT to 1/7/2001 24:00 UT.

A functional time series does not however need to arise from cutting a continuous time record into adjacent pieces of natural length. Figure 2 shows a functional time series of curves representing (centered) Eurodollar futures prices. For each day, the time is not time within that day, but a time to the expiration of a contact, more details are given in Chapter 14 of Horváth and Kokoszka . The curves show how the prices of contracts with various expiration horizons evolve from day to day.

Eurodollar futures curves on ten consecutive days.

Functional time series focus on temporal or otherwise sequential dependence between curves. In many applications, the curves are available at points in space. Such spatially indexed functional data will be denoted X(sk,t), where s1,s2,,sN are locations in some region and X(sk,t) is the value of the function X(sk) at time t. To compare such data structures to functional time series, consider two ways of looking at pollution data. If we consider only one location and are interested in the day to day pattern of pollution, we may consider curves Xn(t), where n is the day index and t is the time within the day. In many studies, we are however interested in long-term trends in pollution over a region, for example, a large city. In such a case, we may wish to smooth out the day to day variability on consider the curves X(sk,t), where sk is the location of a measurement station within the city and t is time within a few decades. Such a data structure is difficult to show in one graph. The spatial location sk can be shown on a map. The curves X(sk,t) must be shown on a separate graph, as in Figure 3. As for the functional time series, the main feature of such data is that there is dependence of the characteristics of the curves which depended on the distance between them.

F2-layer critical frequency curves at three locations. Top to bottom (latitude in parentheses): Yakutsk (62.0), Yamagawa (31.2), and Manila (14.7). The functions exhibit a latitudinal trend in amplitude.

Of course, one can consider a more complex structure in which one may study spatially indexed functional time series, with the data being Xn(sk,t); for a fixed sk, we have a functional time series, for a fixed n, we have a functional spatial random field. We briefly discuss such a setting in Section 5.

The focus of this paper is on the recent developments in the research on dependent functional data. Section 3 considers functional time series, and Section 4 spatially indexed functions. The required background is provided in Section 2. A summary, which includes the discussion of some problems of current interest, is given in Section 5.

In this section, we discuss several important topics mentioned above, mostly by directing the reader to appropriate references.

1.1.1. Monographs on FDA

The two editions of the book of Ramsay and Silverman  have done a lot to introduce FDA to the statistics community and beyond. The monograph introduces most of the fundamental ideas of FDA including penalized smoothing, data registration, functional linear regression, functional principal components and functional analysis of derivatives. Ramsay and Silverman  elaborate on many data examples in Ramsay and Silverman  by providing detailed case studies, while Ramsay et al.  focus on the computational aspects of the analyses introduced in Ramsay and Silverman . These books are concerned with iid samples of curves.

The book of Ferraty and Vieu  covers nonparametric methods with special emphasis on nonparametric prediction and classification. It has several chapters on dependent curves (α-mixing). It studies both practical and abstract aspects of the problems. The collections Ferraty and Romain  and Ferraty  contain contributions from prominent researchers illustrating the increasingly many facets of FDA. Ferraty and Romain  has a smaller number of detailed papers, while Ferraty  has a large number of short communications. Shi and Choi  consider Gaussian regression models.

The monograph of Bosq  is highly recommended to anyone seeking a fast and rigorous introduction to the theory of functional time series. It contains all practically necessary facts from the theory of probability in Hilbert and Banach spaces, and a great deal of relevant theory of linear functional time series. Bosq and Blanke  elaborate on some aspects of this theory in an abstract way.

To fully understand some theoretical aspects of FDA a good background in the theory of Hilbert spaces is needed. There are many introductory textbooks, my personal favorites are Riesz and Nagy , Akhiezier and Glazman , and Debnath and Mikusinski . There are fewer books on probability in Hilbert and Banach spaces, I have benefited from the monographs of Araujo and Giné , Linde , and Vakhaniia et al. .

The monograph of Horváth and Kokoszka  elaborates on many topics mentioned in this paper, especially those related to the functional principal components.

1.1.2. Basis Expansions

In this paper, we focus on procedures for densely observed curves. For such data, functional objects needed to perform the calculations can be created using the R package fda. We will not address the details of calculations, but note that all necessary background is given in Ramsay et al. . In the following, we often refer the choice of the a basis and the number of basis functions. We now briefly discuss this point.

Functions are observed at discrete sampling values tj, j=1,,J, which may or may not be equally spaced. We work with N functions. These data are converted to functional objects. In order to do this, we need to specify a basis. A basis is a system of basis functions, a linear combination of which defines the functional objects. The elements of a basis may or may not be orthogonal. We express a functional observation Xn as (1.1)Xn(t)k=1Kcnkϕk(t), where the ϕk,k=1,,K are the basis functions. One of the advantages of this approach is that instead of storing all the data points, one stores the coefficients of the expansion, that is, the cnk. This step thus involves an initial dimension reduction and some smoothing. All subsequent computations are performed on the matrices built from the coefficients cnk. The number K of the basis functions impacts the performance of some procedures, but others are fairly insensitive to its choice. For the data studied in this paper, we generally choose K so that the plotted functional objects resemble original data with some smoothing that eliminates the most obvious noise. The two most commonly used basis systems are the Fourier and the B-spline bases. The Fourier basis is usually used for periodic, or nearly periodic functions with no strong local features and a roughly constant curvature. They are inappropriate for data with discontinuities in the function itself or in low order derivatives. The B-spline basis is typically used for nonperiodic locally smooth data. The B-spline basis functions are not orthogonal.

1.1.3. Sparsely Observed Functions

As noted above, in some applications, the data are available only at a few sparsely distributed points tj, which may be different for different curves, and the data may exhibit nonnegligible measurement errors. Yao et al.  introduce methodology to deal with such data; smoothing with a basis expansion is at best inappropriate, and often not feasible. The essence of the alternative approach is explained in Müller . While there are various elaborations, see for example, Hall et al. , the idea is that smoothing must be applied to all observations collectively, not to individual trajectories (which basically do not exist for sparse data). The mean function is first obtained by smoothing the mean of all trajectories. Then the functional principal components are calculated as the eigenfunctions of a covariance kernel obtained by surface smoothing. Extensions of this technique have been applied in many settings, see for example, Müller and Stadtmüller , Müller et al. , Yao and Müller , and Di and Cariniceanu .

1.1.4. Software

All procedures described in this paper can be implemented in readily available statistical software. Ramsay et al.  provide a solid introduction to computational issues for functional data, as well as numerous examples. Their book describes the R package fda and analogous Matlab code. Clarkson et al.  describe the implementation in S+. Many specialized R packages now exist. Techniques for sparse data are implemented in the Matlab package PACE developed at the University of California at Davis, available at http://anson.ucdavis.edu/~mueller/data/software.html, at the time of writing.

1.1.5. Extensions beyond Curves

Various brain scans can be viewed as functions over a spatial domain. Analysis of such data is discussed in several papers, see for example, Reiss and Ogden  and Zipunnikov et al. . Guillas and Lai , consider functional regression models with surfaces as regressors, Morris et al.  propose a Bayesian framework for the analysis of images.

The study of the brain also provides motivation to study complex statistical objects. Aydin et al.  model blood vessels in the brain as trees. Kenobi et al.  and Crane and Patrangenaru  provide references to data analysis on manifolds.

2. A Modeling Framework for Functional Data Analysis

This paper focuses on inferential procedures for functional data. To derive them, a statistical model for the data is needed. In particular, the curves must be viewed as elements of some space. Several modeling frameworks have been proposed, including the semimetric spaces, see Ferraty and Vieu , Sobolev spaces, see for example, Li and Hsing , and more general Besov spaces, see for example, Abramovich and Angelini  and Pensky and Sapatinas . Sobolev spaces are Hilbert spaces in which derivatives enter into the definition of the inner product. They are used to study procedures which involve smoothing by penalizing functions with large integrated derivatives of higher order (typically order two). Besov spaces have been used to study wavelet expansions of functions, they are typically Banach spaces. In this paper, we consider only the simplest framework in which the curves are assumed to belong to the space L2(𝒯) of square integrable functions on 𝒯, and, to be more specific, we assume that 𝒯=[0,1]. All results we state in the following that involve inner products hold in an arbitrary separable Hilbert space. However, when we use integrals and functions with an argument like t or s, the formulas assume that the Hilbert space is L2([0,1]). This space is sufficient to describe procedures based on functional means and variances; most inferential procedures for scalar data use only means and variances. The corresponding functional framework is introduced in Section 2.1. In Section 2.2, we define the functional principal components which provide a dimension reduction beyond that obtained via the basis expansion (1.1); K is typically around one hundred, whereas in our applications the number of the functional principal components needed to effectively approximate the data is a single digit number, very often 2, 3, or 4.

In Sections 2.1 and 2.2, we define the required first- and second-order characteristics (functional parameters) of a random function X. We cannot start with definitions based on a random sample because for dependent data sample statistics often need to be defined in a different way to ensure that they estimate the corresponding population (theoretical) parameters.

2.1. The Hilbert Space Model

The space L2=L2([0,1]) is the set of measurable real-valued functions x defined on [0,1] satisfying 01x2(t)dt<. The space L2 is a separable Hilbert space with the inner product (2.1)x,y=x(t)y(t)dt. An integral sign without the limits of integration is meant to denote the integral over the whole interval [0,1].

We view a random curve X={X(t),t[0,1]} as a random element of L2 equipped with the Borel σ-algebra. We say that X is integrable if EX=E[X2(t)dt]1/2<. If X is integrable, there is a unique function μL2 such that Ey,X=y,μ for any yL2. It follows that μ(t)=E[X(t)] for almost all t[0,1]. The expectation commutes with bounded operators, that is, if Ψ and X is integrable, then EΨ(X)=Ψ(EX).

If X is square integrable, that is, (2.2)EX2=EX2(t)dt<, and EX=μ, the covariance operator of X is defined by (2.3)C(y)=E[(X-μ),y(X-μ)],yL2. It is easy to see that (2.4)C(y)(t)=c(t,s)y(s)ds,wherec(t,s)=E[(X(t)-μ(t))(X(s)-μ(s))].

One can show that a bounded operator C is a covariance operator of some square integrable random function taking values in L2 if and only if it is symmetric positive-definite and its eigenvalues satisfy j=1λj<. To understand these statements, some background on operators in a Hilbert space is needed. The facts we now state will be used in the remainder of the paper.

2.1.1. Operators in a Hilbert Space

Consider an arbitrary separable Hilbert space H with inner product ·,· which generates the norm ·, and denote by the space of bounded (continuous) linear operators on H with the norm (2.5)Ψ=sup{Ψ(x):x1}. An operator Ψ is said to be compact if there exist two orthonormal bases {vj} and {fj}, and a real sequence {λj} converging to zero, such that (2.6)Ψ(x)=j=1λjx,vjfj,xH. The λj may be assumed positive because one can replace fj by -fj, if needed. A compact operator is said to be a Hilbert-Schmidt operator if j=1λj2<. The space 𝒮 of Hilbert-Schmidt operators is a separable Hilbert space with the scalar product (2.7)Ψ1,Ψ2𝒮=i=1Ψ1(ei),Ψ2(ei), where {ei} is an arbitrary orthonormal base, the value of (2.7) does not depend on it. One can show that Ψ𝒮2=j1λj2 and ΨΨ𝒮.

An operator Ψ is said to be symmetric if (2.8)Ψ(x),y=x,Ψ(y),x,yH, and positive-definite if (2.9)Ψ(x),x0,xH. (An operator with the last property is sometimes called positive semidefinite, and the term positive-definite is used when Ψ(x),x>0 for x0.)

A symmetric positive-definite Hilbert-Schmidt operator Ψ admits the decomposition (2.10)Ψ(x)=j=1λjx,vjvj,xH, with orthonormal vj which are the eigenfunctions of Ψ, that is, Ψ(vj)=λjvj. The vj can be extended to a basis by adding a complete orthonormal system in the orthogonal complement of the subspace spanned by the original vj. The vj in (2.10) can thus be assumed to form a basis, but some λj may be zero.

An important class of operators in L2 are the integral operators defined by (2.11)Ψ(x)(t)=ψ(t,s)x(s)ds,xL2, with the real kernel ψ(·,·). Such operators are Hilbert-Schmidt if and only if (2.12)ψ2(t,s)dtds<, in which case (2.13)Ψ𝒮2=ψ2(t,s)dtds.

If ψ(s,t)=ψ(t,s) and ψ(t,s)x(t)x(s)dtds0, the integral operator Ψ is symmetric and positive-definite, and it follows from (2.10) that (2.14)ψ(t,s)=j=1λjvj(t)vj(s)in  L2([0,1]×[0,1]).

2.2. Functional Principal Components

Suppose X is a square integrable random function in L2. To lighten the notation, assume that μ=0. It is clear how to modify all formulas by subtracting the mean function μ from X. The eigenfunctions of the covariance operator defined by (2.3) are called the Functional Principal Components (FPCs). The FPCs vj,j1, are thus defined by (2.15)C(vj)=λjvj. The vj are orthogonal, and we assume that they are normalized to unit norm. The vj are defined only up to a sign.

The random function X admits the Karhunen-Loéve expansion: (2.16)X(t)=j=1ξjvj(t),ξj=X,vj, and the covariance kernel c(·,·) admits the expansion: (2.17)c(t,s)=j=1λjvj(t)vj(s),0t,s1. The random variables ξj are called the scores of X.

An interpretation of the vj is often based on the following decomposition of variance: (2.18)EX2=j=1E[X,vj2]=j=1Cvj,vj=j=1λj. Another interpretation is the following. Fix p and consider the expected squared error EX-j=1pajuj2, where u1,u2,,up are orthonormal functions in L2, and aj are scalars. The above error is minimized if uj=vj and aj=X,vj.

3. Functional Time Series

Functional time series (FTSs) were introduced in Section 1, were several examples where given. Using the framework of Section 2.1, we will understand by a FTS a sequence of curves Xn={Xn(t),0t1}. Notice that n is the time index, which typically refers to day or year, and t is the time within that unit. In most textbooks and papers on time series analysis, the index t denotes time, but in FDA it denotes the argument of a function, so we use n to denote time and N to denote the number of available curves; N corresponds to the length of a time series.

The main idea behind functional time series modeling is that in many situations the time record can be split into natural intervals, and instead of modeling periodicity, we treat the curve in each interval as a whole observational unit. There are proponents and opponents of this approach. In several applications, it has been shown that the functional approach yields superior results, see for example, Antoniadis et al. , a few examples are also given in Bosq . But is clear that in many cases the more standard time series techniques are competitive, if not superior. Functional techniques can be expected to be definitely more useful if the daily or annual curves are not defined on a grid of equidistant time points. An interesting example of such a setting is studied by Liebl  who considers curves defined on different intervals on different days. The daily domain interval is defined by the minimum, mn, and maximum, Mn, of the electricity demand on day n. For the demand h[mn,Mn], Xn(h) is the price of electricity.

Most methods described in this section are based on the FPC. In certain applications, different orthonormal systems, similar in spirit to the FPCs but more optimal for a specific application can be used. In addition to the predictive factors of Kargin and Onatski , we note the factors introduced by Bathia et al.  who considered the problem of estimating the dimension d assuming that the functional time series takes values in a finite dimensional subspace of L2.

The remainder of this section is organized as follows. We first review the autoregressive functional model which has been, by far, most extensively used and studied. We then turn to more general ways of describing temporal dependence between functions. We conclude with a discussion of some recent developments.

A reader seeking a good reference to the fundamental concepts of time series analysis is referred to Brockwell and Davis  or Shumway and Stoffer . More advanced treatments of the theory of linear time series are given in Brockwell and Davis  and Anderson .

3.1. Functional Autoregressive Process

The theory of autoregressive and more general linear processes in Hilbert and Banach spaces is developed in the monograph of Bosq , so here we focus only on some recent research. A sequence {Xn,-<n<} of mean zero elements of L2 follows a functional AR(1) model if (3.1)Xn=Ψ(Xn-1)+εn, where Ψ and {εn,-<n<} is a sequence of iid mean zero errors in L2 satisfying Eεn2<.

3.1.1. Estimation

A popular approach to the estimation of the autoregressive operator Ψ involves the FPCs, see Section 2.2. Observe first that (3.2)E[Xn,xXn-1]=E[Ψ(Xn-1),xXn-1],xL2. Define the lag-1 autocovariance operator by (3.3)C1(x)=E[Xn,xXn+1] and denote with superscript ·T the adjoint operator. Then, C1T=CΨT because, by a direct verification, C1T=E[Xn,xXn-1], that is, (3.4)C1=ΨC. We would like to obtain an estimate of Ψ by using a finite sample version of the relation Ψ=C1C-1. The operator C does not however have a bounded inverse on the whole of H. To see it, recall that C admits representation (2.10), which implies that C-1(C(x))=x, where (3.5)C-1(y)=j=1λj-1y,vjvj. This makes it difficult to estimate the bounded operator Ψ using the relation Ψ=C1C-1. A practical solution is to use only the first p most important EFPC’s v^j, and to define (3.6)IC^p(x)=j=1pλ^j-1x,v^jv^j. The operator IC^p is defined on the whole of L2, and it is bounded if λ^j>0 for jp. By judiciously choosing p, we hope to find a balance between retaining the relevant information in the sample, and the danger of working with the reciprocals of small eigenvalues λ^j. In formula (3.6), the v^j are the empirical (or sample) FPCs and the λ^j are are their eigenvalues, both defined by the relation C^(v^j)=λ^jv^j, where (3.7)C^(x)=N-1i=1NXi-X-N,x(Xi-X-N),xL2, defines the empirical covariance operator. Direct calculations, see Chapter 13 of Horváth and Kokoszka , lead to the estimator (3.8)Ψ^p(x)=1N-1k=1N-1j=1pi=1pλ^j-1x,v^jXk,v^jXk+1,v^iv^i. If the operator Ψ is a kernel operator defined by Ψ(x)(t)=ψ(t,s)x(s)ds, then the estimator (3.8) is a kernel operator with the kernel (3.9)ψ^p(t,s)=1N-1  k=1N-1j=1pi=1pλ^j-1Xk,v^jXk+1,v^iv^j(s)v^i(t).

In some simulated cases, the surfaces ψ^p(t,s) may not closely resemble the true surface ψ(t,s), see Chapter 13 of Horváth and Kokoszka .

3.1.2. Prediction

The most direct use of the functional AR(1) model is to predict the curve Xn+1, and the most obvious method is to use the predictor X^n+1=Ψ^p(Xn), where Ψ^p is the estimator (3.8). For kernel operators, this predictor is given explicitly by (3.10)X^n+1(t)=ψ^p(t,s)Xn(s)ds=k=1p(=1pψ^kXn,v^)v^k(t), where (3.11)ψ^ji=λ^i-1(N-1)-1n=1N-1Xn,v^iXn+1,v^j. Kargin and Onatski  proposed a more sophisticated method that uses the so called predictive factors rather than the FPCs. Predictive factors are interesting in that they are functions which can be used to expand the curves, very much like the FPCs are used, but they define directions in the space L2 which are most relevant for prediction. Using a finite sample implementation, Didericksen et al.  found that it does not lead to more accurate predictions. In general, predicted curves that use formula (3.10) tend to be closer to the mean curve and smoother than the actual observations. These predictions are however significantly better than just using the mean curve. A more detailed study is given in Didericksen et al.  and in Chapter 13 of Horváth and Kokoszka .

3.1.3. Order Determination

A generalization of the functional AR(1) model (3.1) is the FAR(p) model (3.12)Zn=j=1pΦj(Zn-j)+εn. We now use Z instead of X, because the letter X will be used below to denote a different quantity. The work of Kokoszka and Reimherr  shows how to determine the order p, and how to estimate the operators Φj. The idea is to write (3.12) as a fully functional linear model. We start by expressing Φj(Zi-j) as an integral over the interval ((j-1)/p,j/p]. Setting x:=(s+j-1)/p, a change of variables yields (3.13)[Φj(Zi-j)](t)=01ϕj(t,s)Zi-j(s)ds=(j-1)/pj/pϕj(t,xp-(j-1))Zi-j(xp-(j-1))pdx. Denoting by Ij the indicator function of the interval ((j-1)/p,j/p], we obtain (3.14)j=1p[Φj(Zi-j)](t)=01j=1pIj(x)ϕj(t,xp-(j-1))Zi-j(xp-(j-1))pdx. Next we define (3.15)Xi(s)=j=1pZi-j(sp-(j-1))Ij(s),ψ(t,s)=pj=1pϕj(t,sp-(j-1))Ij(s). Setting Yi=Zi, we have (3.16)Yi=Ψ(Xi)+εi, where Ψ is an integral Hilbert-Schmidt operator with the kernel ψ, that is, (3.17)Yi(t)=ψ(t,s)Xi(s)ds+εi(t).

Thus, if we can estimate Ψ, then we can estimate each of the Φj. An estimator of Ψ can be constructed by any method valid for the fully functional linear model, for example using the basis or the FPCs expansions, see Chapter 8 of Horváth and Kokoszka .

Determining the order p also uses representation (3.17). The FAR(p-1) model will be rejected in favor of FAR(p) if an estimate of Φ^p is large in an appropriate sense. Kokoszka and Reimherr  developed a multistage procedures based on this idea, but it is too complex to be described here.

3.2. Approximable Functional Time Series

For many functional time series, it is not clear what specific model they follow, and for many statistical procedures, it is not necessary to assume a specific model. It is however important to know what the effect of temporal dependence on a given procedure is. In this section, we describe a very general notion of dependence, which is convenient to use in the framework of functional data. We restrict ourselves to this framework. Several related useful results, including a functional (in D([0,1])) central limit theorem, were established by Aue et al. [45, 46] for vector-valued time series. Sharp invariance principles and additional references are given in Berkes et al. . The exposition that follows is based on Hörmann and Kokoszka .

For p1, we denote by Lp=Lp(Ω,𝒜,P) the space of (classes of) real-valued random variables such that Xp=(E|X|p)1/p<. Further, we let LHp=LHp(Ω,𝒜,P) be the space of H=L2 valued random functions X such that (3.18)vp(X)=(EXp)1/p=(E|X(t)|pdt)1/P<. In this section, we use H to denote the function space L2=L2([0,1]) to avoid confusion with the space Lp of scalar random variables.

Definition 3.1.

A sequence {Xn}LHp is called Lp-m-approximable if each Xn admits the representation (3.19)Xn=f(εn,εn-1,), where the εi are iid elements taking values in a measurable space S, and f is a measurable function f:SH. Moreover we assume that if {εi} is an independent copy of {εi} defined on the same probability space, then letting (3.20)Xn(m)=f(εn,εn-1,,εn-m+1,εn-m,εn-m-1,), we have (3.21)m=1νp(Xm-  Xm(m))<.

Definition 3.1 implies that {Xn} is strictly stationary. It is clear from the representation of Xn and Xn(m) that EXm-Xm(m)p=EX1-X1(m)p, so that condition (3.21) could be formulated solely in terms of X1 and the approximations X1(m). Obviously the sequence {Xn(m),n} as defined in (3.20) is not m-dependent. To this end we need to define for each n an independent copy {εk(n)} of {εk} (this can always be achieved by enlarging the probability space) which is then used instead of {εk} to construct Xn(m), that is, we set (3.22)Xn(m)=f(εn,εn-1,,εn-m+1,εn-m(n),εn-m-1(n),). We call this method the coupling construction. Since this modification lets condition (3.21) unchanged, we will assume from now on that the Xn(m) are defined by (3.22). Then, for each m1, the sequences {Xn(m),n} are strictly stationary and m-dependent, and each Xn(m) is equal in distribution to Xn.

The following example gives a good feel what Definition 3.1 means.

Example 3.2 (functional autoregressive process).

Suppose that Ψ satisfies Ψ<1. Let εnLH2 be iid with mean zero. Then there is a unique stationary sequence of random elements XnLH2 such that (3.23)Xn(t)=Ψ(Xn-1)(t)+εn(t). The AR(1) sequence (3.23) admits the expansion Xn=j=0Ψj(εn-j), where Ψj is the jth iterate of the operator Ψ. We thus set Xn(m)=j=0m-1Ψj(εn-j)+j=mΨj(εn-j(n)). It is easy to verify that for every A in , νp(A(Y))Aνp(Y). Since Xm-Xm(m)=j=m(Ψj(εm-j)-Ψj(εm-j(m))), it follows that νp(Xm-Xm(m))2j=mΨjνp(ε0)=O(1)νp(ε0)Ψm. By assumption ν2(ε0)< and therefore m=1ν2(Xm-Xm(m))<, so condition (3.21) holds with p2, as long as νp(ε0)<.

Several other models are Lp-m-approximable, examples are given in Chapter 16 of Horváth and Kokoszka . But we emphasize that the main role of Lp-m-approximability is to provide a convenient nonparametric framework for quantifying temporal dependence of curves. In this sense it is similar to various mixing conditions, but there is no obvious relationship between it and the traditional mixing conditions, as defined by Doukhan  or Bradley . It is not related to the conditions of Doukhan and Louhichi  either. It is similar to the conditions of Wu [52, 53], in that it assumes that the observed functions are nonlinear moving averages (Bernoulli shifts) of iid unobservable errors.

Recently Lian  extended the concept of Lp-m-approximability by replacing the Lp norm by a more general Orlicz norm. If ψ is a convex function on [0,) with ψ(0)=0, then we define (3.24)νψ(X)=inf{C>0,E[ψ(XC)]1}. If ψ(x)=xp, then νψ=νp. The remaining elements of Definition 3.1 are extended without difficulty.

3.2.1. Convergence of the Eigenfunctions and Eigenvalues

An important result valid for L4-m-approximable FTSs is that the empirical eigenfunctions and eigenvalues converge at the same rate as for the iid functions. This property allows us to extend many results established for functional random samples to FTSs. We state this result in Theorem 3.3, in which v^j and λ^j are the eigenelements of the empirical covariance operator (3.7).

Theorem 3.3.

Suppose that {Xn}LH4 is an L4-m-approximable sequence such that the largest d eigenvalues λj of its covariance operator are positive and distinct. Then, for 1jd, (3.25)limsupNNE[|λj-λ^j|2]<,limsupNNE[c^jv^j-vj2]<.

Kokoszka and Reimherr  showed that under the assumptions of Theorem 3.3 the v^j are asymptotically normal. For linear functional series this asymptotic normality is implicit in the work of Mas .

3.2.2. The Long-Run Variance

A central concept in time series analysis is the long run variance (LRV) which replaces the variance in many well-known formulas for valid for iid observations. Let {Xn} be a scalar stationary sequence. Its long run variance is defined as σ2=jγj, where γj=Cov(X0,Xj), provided this series is absolutely convergent. In that case Var[X-N]~σ2/N. The concept of the LRV is extended to stationary random vectors. It turns out that L2-m-approximability implies that the LRV of exists and can be consistently estimated for vector-valued sequences. Details are given in Chapter 16 of Horváth and Kokoszka . In the functional context, it is more natural to work with long-run variance kernels. The following results were established by Horváth et al. .

Theorem 3.4.

Suppose that {Xn} is an L2-m-approximable sequence of functions in L2([0,1]) with μ=EXn. Then the infinite sum (3.26)σ(t,s)=h=-E[{X0(t)-μ(t)}{Xh(s)-μ(s)}] converges in L2([0,1]×[0,1]) (hence σ(·,·) is square integrable). Moreover, (3.27)N-1/2n=1N[Xn-μ]dZ, where Z is a mean zero Gaussian element of L2([0,1]) with the covariance function E[Z(t)Z(s)]=σ(t,s).

To apply Theorem 3.4, we must estimate the kernel σ(·,·). We now turn to this objective. Let K be a kernel (weight function) defined on the line and satisfying the following conditions: (3.28)K(0)=1,K  is  continuous  and  bounded,K(u)  =  0,if  |u|>c,for  some  c>0. Condition (3.28) is assumed only to simplify the proofs, a sufficiently fast decay could be assumed instead. Next we define the empirical (sample) correlation functions (3.29)γ^i(t,s)=1Nj=i+1N(Xj(t)-X-N(t))(Xj-i(s)-X-N(s)),0iN-1, where X-N(t)=N-11iNXi(t). The estimator for σ(t,s) is given by (3.30)σ^N(t,s)=γ^0(t,s)+i=1N-1K(ih)(γ^i(t,s)+γ^i(s,t)), where h=h(N) is the smoothing bandwidth satisfying (3.31)h(N),h(N)N0,as  N.

Theorem 3.5.

Suppose that the functional time series {Xn} is L2-m-approximable and (3.32)limmm[E(Xn(t)-Xn(m)(t))2dt]1/2=0. Then, under the conditions on the kernel K(·) and the bandwidth h=h(N) stated above, (3.33)(σ^N(t,s)-σ(t,s))2dtdsP0.

3.2.3. Self-Normalization

Estimation of the LRV, even for scalar time series, is difficult due to the selection of the bandwidth h. Data-driven methods exist, most notably those based on the theory of Andrews , but they do postulate a specific form of dependence. Moreover, some tests in which the LRV is estimated by such data driven procedures suffer from the problem of nonmonotonic power, see Shao and Zhang . Block bootstrap and subsampling offer useful tools for solving many problems of time series analysis, but they again depend on bandwidth type block sizes whose selection is not obvious. Even though some recommendations for the selections of such block sizes exist, see for example, Bühlmann and Künsch , Politis et al. , Bickel and Sakov , this remains a difficult practical problem. An improved approach has recently been proposed by Shao and Politis . A different idea, the self-normalization, was proposed by Shao [64, 65]. We explain it now briefly using a simple example involving a scalar time series.

Suppose that {Xn} is a stationary time series such that (3.34)N-1/21kNr(Xk-μ)dσW(r),0r1, in the Skorokhod space. The parameter σ2 is the LRV: (3.35)σ2=limNNVar(X-N)=hγ(h). Set (3.36)DN=N-2n=1N{j=1n(Xj-X-N)}2. Then, (3.34) implies (3.37)N(X-N-μ)2DNdW2(1)01B2(r)dr.

To see why (3.37) holds, set (3.38)SN(r)=N-1/21kNr(Xk-μ),0r1, and observe that (3.39)SN(1)=N1/2(X-N-μ), so that (3.40)N(X-N-μ)2=SN2(1)dσ2W2(1). Next, observe that (3.41)j=1n(Xj-X-N)=j=1n(Xj-μ)-n(X-N-μ);N-1/2j=1n(Xj-X-N)=SN(nN)-nNSN(1). Consequently, (3.42)DN=N-1n=1N{SN(nN)-nNSN(1)}2dσ201{W(r)-rW(1)}2dr. The convergences in (3.40) and (3.42) are joint, so (3.37) follows.

The key point is the cancelation of σ2 when (3.40) is divided by (3.42). Relation (3.42) shows that DN is an inconsistent estimator of σ2. The distribution of the right-hand side of (3.37) can however be simulated, and the critical values can be obtained with arbitrary precision. Relation (3.37) can be used to construct a confidence interval for μ without estimating the LRV. Such a construction does not require the selection of a bandwidth parameter in the kernel estimates of σ2.

Zhang et al.  extend the idea of self-normalization to the detection of change point in FTSs. A normalization analogous to DN is not suitable for the change point problem, see Shao and Zhang . It must be modified to take into account a possible change point. The details are too complex to be discussed here. The change point problem is discussed in Section 3.3.

3.3. Other Research Directions 3.3.1. Two-Sample Problems

In a two-sample problem, we consider two samples X1,,XN and X1*,,XM*, which may be realizations of two functional time series, or a single series, but taken over different time intervals. The statistical problem consists in testing if some specified characteristics of the samples are equal. To illustrate, suppose we want to test if the mean functions are equal. We assume the model (3.43)Xi(t)=μ(t)+εi(t),1iN,Xi*(t)=μ*(t)+εi*(t),1iM. We wish to test the null hypothesis H0:μ=μ* against the alternative that H0 is false. We can also test for the equality of the covariance operators, or any other quantities that might be of interest. In the FDA setting, testing for the equality of means, the FPCs, and the covariance operators has received a fair deal of attention.

An important contribution has been made by Benko et al.  who developed bootstrap procedures for testing the equality of mean functions, the FPCs, and the eigenspaces spanned by them. Horváth et al.  developed asymptotic tests for testing the equality of mean functions. Laukaitis and Račkauskas  considered the model Xg,i(t)=μg(t)+εg,i(t),g=1,2,,G, with innovations εg,i and group means μg, and test H0:μ1(t)==μG(t). Other related contributions are Cuevas et al. , Delicado , and Ferraty et al. .

Regarding testing the equality of the covariance operators, Panaretos et al.  developed a test assuming the data are iid and Gaussian. Their work was extended by Fremdt et al.  and Kraus and Panaretos  to non-Gaussian settings. Horváth et al.  studied two sample problems for functional regression models.

3.3.2. The Change Point Problem

When curves form a time series, it is typically assumed that, possibly after some transformation, they have the same distribution in L2. If some aspects of this distribution change, inference will become invalid. For example, if a mean changes at some time point, the sample mean will not estimate any population parameter. Change point analysis is a very broad field of statistics with applications in industrial quality control, study of economic, financial and environmental time series, and many other areas. Many monographs are available, including Brodsky and Darkhovsky , Basseville and Nikifirov , Csörgő and Horváth , Chen and Gupta , and Basseville et al. .

The idea of change point analysis is simple, and we explain it in the functional context using the change in mean function as an example. We observe functions Xi,1iN, and we test the null hypothesis (3.44)H0:EX1=EX2==EXN. The simplest alternative is that of a single change point: (3.45)HA:Xi(t)={μ1(t)+Yi(t),1ik*,μ2(t)+Yi(t),k*<iN, where the Yi has the same distribution with mean zero, and μ1(·)μ2(·).

Note that under H0, we do not specify the value of the common mean, and under HA we do not specify μ1(·) nor μ2(·). It is also important to distinguish between a change point problem and the problem of testing for the equality of means. In the latter setting, it is known which population or group each observation belongs to. In the change point setting, we do not have any a priori partition of the data into several sets with possibly different means. The change can occur at any point, and we want to test if it occurs or not, and, if it does, to estimate the point of change.

The simplest, and in many settings most effective, change point detection procedures are based on cumulative sums (CUSUM procedures). In the above setting, denote (3.46)μ^k(t)=1k1ikXi(t),μ~k(t)=1N-kk<iNXi(t). If the mean is constant, the difference Δk(t)=μ^k(t)-μ~k(t) is small for all 1k<N and all t[0,1]. However, Δk(t) can become large due to chance variability if k is close to 1 or to N. It is therefore usual to work with the sequence (3.47)Pk(t)=1ikXi(t)-kN1iNXi(t)=k(N-k)N[μ^k(t)-μ~k(t)], in which the variability at the end points is attenuated by a parabolic weight function. If the mean changes, the difference Pk(t) is large for some values of k and of t. Since the observations are in an infinite dimensional domain, projections on the FPCs are used to construct a test statistic and an estimator of a change point. The details are explained in Berkes et al.  who assume that the Xi are independent; an extension to approximable sequences is developed in Hörmann and Kokoszka . Theoretical properties of the estimator of a change point are studied in Aue et al. [45, 46].

There have been several extensions to more complex settings. Horváth et al.  consider the change point in the covariance operator of the FAR(1) process. Zhang et al.  use self-normalization to deal with temporal dependence and study change points in mean and in the covariance structure. Aston and Kirch  consider change point detection in a dependent sequence of functions assuming that the alternative is not a single change point but a change interval, the so-called epidemic alternative. The problem of detecting a change in the mean of a sequence of Banach-space valued random elements under an epidemic alternative is theoretically studied by Račkauskas and Suquet  who propose a statistic based on increasingly fine dyadic partitions of the index interval, and derive its limit, which is nonstandard.

3.3.3. Other Temporal Dependence Structures

The most extensively used and studied model for FTS is the FAR(1) model. Some nonlinear ARCH-type models are introduced in Hörmann et al. . Similar models are used by Kokoszka and Reimherr  in a simulation study. Kokoszka and Reimherr  study the predictability (not the prediction) of the cumulative intraday returns (CIDRs) defined by (3.48)Rn(tj)=100[lnPn(tj)-lnPn(t1)],j=2,,m,n=1,,N, where Pn(tj),n=1,,N,j=1,,m is the price of a financial asset at time tj on day n.

Kokoszka et al. , see also Kokoszka and Zhang , study the dependence of the CIDRs defined above on other factors, including the CIDRs on market indexes and energy futures. They consider a factor model of the form: (3.49)Rn(t)=β0(t)+j=1pβjFnj(t)+εn(t). A different class of functional factor models was recently proposed by Hays et al. . These models, introduced to forecast yield curves, say Xn, are of the form (3.50)Xn(t)=k=1KβnkFk(t)+εn(t). The factors Fk do not depend on n and are orthonormal functions to be estimated. The dynamics are in the coefficients βnk which are assumed to follow Gaussian autoregressive processes (the εn are also Gaussian). Model (3.50) could be termed a statistical factor model. It is designed for temporal forecasting, while model (3.49) is designed for regression type prediction in which that correlation structure of factors plays a major role.

Gabrys et al.  developed a test to determine if the regressors εn in the fully functional linear model Yn(t)=ψ(t,s)Xn(s)ds+εn(t) are correlated. Benhenni et al.  consider a nonparametric regression Yn=r(Xn)+εn in which Xn are functions and Yn (and εn) scalars. They allow the εn to have either short or long memory.

A different framework is developed by Battey and Sancetta  who assume that the FTS Xn is a Markov process and are concerned with the estimation of the conditional probabilities P(Xn(t)x(t)Xn-1=x0), where x and x0 are deterministic functions.

4. Geostatistical Functional Data

An interesting class of functional data are curves observed at several spatial locations, as already mentioned in Section 1. For example, X(sk;t) can be the concentration of a pollutant at location sk and measured at time t. In this section, we thus assume that the data consists of N curves X(sk;t),t[0,1],1kN. The analysis of such a data structure must draw heavily on the concepts and tools of spatial statistics. In fact, the random field X defined above is a special case of a spatiotemporal process. In this paper, we cannot provide the details of the tools of spatial statistics that we use. The required background, and much more, is given, for example, in Schabenberger and Gotway , Gelfand et al. , and Sherman . We will however introduce the concepts to the extend needed to get a good idea of the main problems and solutions. This is done in Section 4.1. Then, in Section 4.2, we consider the fundamental problem of the estimation of the mean function. Section 4.3 points out to research on functional kriging. We note that there are functional data structures beyond those discussed here, we refer to Delicado et al. , who also discuss Bayesian approaches.

4.1. Basic Concepts of Spatial Statistics

A sample of spatial data is {X(sk),skS,k=1,2,,N}. (The X(sk) are now scalars.) The spatial domain S is typically a subset of the two-dimensional plane or sphere. The observed value X(sk) is viewed as a realization of a random variable. Just as in time series analysis, stationary random fields play a fundamental role in modeling spatial data. To define arbitrary shifts, we must assume that S is either the whole Euclidean space d, or the whole sphere. The random field {X(s),sS} is then strictly stationary if (4.1){X(s1+h),X(s2+h),,X(sm+h)}=d{X(s1),X(s2),,X(sm)} for any points s1,s2,,smS and any shift h. The covariance function is then defined by(4.2)C(h)=Cov(X(s),X(s+h)). If C(h) depends only on the length h of h, we say that the random field is isotropic. The covariance function of an isotropic random field is typically parametrized as (4.3)C(h)=σ2ϕ(h),h0,ϕ(0)=1. The function ϕ is called the correlation function. For example, the exponential correlation function is given by ϕ(h)=exp{-(h/ρ)}. The main idea of spatial modeling is that some sort of empirical estimate of C(h) is first obtained, which is typically a very irregular, noisy function. Then a suitable parametric covariance function C(h) is fitted to ensure that the resulting covariance matrix {C(sk-s),1k,N} is positive definite. There are many approaches and nontrivial issues involved in this process.

4.2. Estimation of the Mean Function and of the FPCs

We assume that the functions X(sk,·) are observations of a stationary and isotropic spatial random field taking values in the space L2. In particular, the mean function μ(t)=EX(s;t) does not depend on the location s. The estimation of this mean function is the most important first step of the inference for spatially distributed functional data. If the curves Xk are iid or form a time series, the usual sample mean is used. But if the curves are available at spatial locations, the curves at locations which are close to each other should be given smaller weights because they are very similar and contribute roughly the same information. This suggests that μ could be estimated by (4.4)μ^N=n=1NwnX(sn), with the weights wk defined to minimize En=1NwnX(sn)-μ2 subject to the condition n=1Nwn=1. Using the method of the Lagrange multiplier, it is not difficult to find a closed form expression for the weights wn. This expression involves unknown covariances (4.5)Ck=E[X(sk)-μ,X(s)-μ]. Gromenko et al.  proposed an iterative procedure for the estimation of the weights wn and the covariances Ck. They showed that the weighted average (4.4) is a significantly better estimator than the usual sample average. Hörmann and Kokoszka  showed that the sample average is not a consistent estimator of μ if there are clusters of points. The clusters were defined using a modification of a metric introduced by Park et al. . Gromenko and Kokoszka  refined the approach of Gromenko et al.  and applied it to testing the hypothesis that the means in the samples of curves over two disjoint regions are different.

The approaches outlined above use parametric model fitting to obtain the covariances Ck. It is well-known that if the number of the spatial locations is small, less than 20~30, then the optimization needed to fit a parametric model may fail or the fit may be poor. Gromenko and Kokoszka  addressed this issue by developing a nonparametric approach. They showed that in relevant settings (all with 10N40), the nonparametric approach is superior to the parametric approach even excluding the cases when the former does not converge. The method of Gromenko and Kokoszka  is a useable refinement and extension of the ideas of Hall et al.  and Hall and Patil .

4.2.1. Estimation of the FPCs

If the random functions X(sk,·) are realizations of a stationary L2-valued random field, then they have the same FPCs vj. Estimation of the vj in the spatial setting involves issues similar to those related in the estimation of the mean function μ, but the notation and approaches are more complex. They are discussed in Gromenko et al.  and Hörmann and Kokoszka .

4.3. Kriging and Other Research

A very important problem is to predict a curve at a specified location using the curves at available locations. This problem was addressed in Nerini et al.  and Giraldo . Spatial prediction of this type is called kriging. Theoretical background to kriging of scalar fields is given in Stein . The book of Wackernagel  gives an accessible introduction to kriging emphasizing multidimensional observations, a setting related to kriging of curves. Just as in the case of estimating the mean function, there are several approaches to kriging. The approach advocated by Giraldo et al.  is “most functional” in that it treats the curves as whole data objects. Similarly to (4.4), it predicts the unobserved function X(s0) by a linear combination (4.6)X^(s0)=n=1NwkX(sn),n=1Nwn=1, with the weights wn minimizing (4.7)E[X(s0;t)-n=1NwkX(sn)]dt. The estimation of the weights wn requires some work.

Giraldo et al.  consider the problem of determining clusters in in spatially correlated functional data. Härdle and Osipienko  show how spatial analysis of functional data can be used to calculate more accurate prices of weather derivatives.

An emerging important class of functional models are those with hierarchical structure and correlation at some levels, similar to spatial correlations discussed in this Section. Such models have applications in the analysis of medical experiments in which tissue samples are taken at several locations in an organ of a subject. Staicu et al.  propose fact methods for the estimation of such models.

5. Summary and Future Directions

We have reviewed recent developments in the analysis of dependent functions. We considered two data structures: functional time series and spatially distributed functions. Functional time series are collections of curves Xn, where n can be interpreted as time, most often day or year. The central point in the analysis and modeling of such data is to take into account the dependence of the curves Xn. Spatially distributed functions, or geostatistical functional data, consist of curves X(sk) available at locations sk. The main issue is to take into account the uneven distribution in space of the points sk.

Inference for data of both types assumes that the data are stationary, possibly after some transformation. At present there are no suitable tests of stationarity for such functional data. Second-order stationarity of FTS could potentially be tested using spectral methods. For scalar time series the relevant references are Grenander and Rosenblatt , Granger and Hatanaka , Priestley et al. , and Dwivedi and Subba Rao . The recent work of Panaretos and Tavakoli  develops the fundamentals of the spectral theory of FTS.

In many settings, a hybrid data of the form Xn(sk,t) can be considered. It could, for example, be maximum temperature or the count of flu cases on day 1t365 of year n at location sk. Such a data structure could be useful to study long term trends or changes in the annual pattern of a variable of interest over a region. There has not been much work on functional data of this form; Odei et al.  consider Bayesian modeling of snow melt curves. It their setting, Xn(sk,t) is the amount of snow melt water on day 1t365 of year n at a location sk in Utah.

For long records of geophysical, weather, or environmental data, a serious problem are long segments of missing observations. For example, if X(sk,t) is the rainfall on day t at location sk, there may be whole years of missing data at certain locations (when a station is closed), and these segments will be different at different locations. An important challenge is to develop useful approaches that allow us to pool together information from curves at all locations when some of them have long missing segments. This problem is somewhat related to the problem of dealing with sparse functional data mentioned in Section 1.1, but it adds the complication of spatial dependence.

The functions discussed in this paper, whether those observed consecutively over time or at spatial locations, are assumed to be smooth, so that methods relying of basis and FPCs expansions can be applied. Some functions do not fall into this category and may exhibit sharp spikes and flat regions. There has not been much work on the time series or spatial fields of functions of this type. Timmermans and Von Sachs  propose an exploratory tool that aims at detecting the closeness of curves whose significant sharp features might not be well aligned.

The study of extremes involves work with point processes. For example Pn(sk) could be a point process of threshold exceedances in year n at location sk. No systematic modeling framework for such point process valued time series is available. The estimation of exceedance probabilities is studies in Draghicescu and Ignaccolo .

In summary, the study of dependent functional data has reached a level of maturity that makes it a useful subfield of FDA, but many important problems remain to be addressed. It is hoped that this paper has provided a useful introduction into this area.

Acknowledgments

This work was partially completed at the Institute for Mathematical Sciences, National University of Singapore, 2012. The research was partially supported by the NSF Grant DMS-0931948.