This paper reviews recent research on dependent functional data. After providing an introduction to functional data analysis, we focus on two types of dependent functional data structures: time series of curves and spatially distributed curves. We review statistical models, inferential methodology, and possible extensions. The paper is intended to provide a concise introduction to the subject with plentiful references.

Functional data analysis (FDA) is a relatively new branch of statistics, going back to the early 1990s, but its mathematical foundations are rooted in much earlier developments in the theory of operators in a Hilbert space and the functional analysis. In the most basic setting, the sample consists of

Functional data are

FDA views each curve in a sample as a separate statistical object. In this sense, FDA is part of the

However, even curves are far more complicated structures than scalars or vectors. The curves are characterized not only by magnitude but also by shape. The shape of a random curve plays a role analogous to the dependence between the coordinates of a random vector. Human growth curves provide a well-known example. Suppose that there are

Some data can be very naturally viewed as curves. For example, if the height measurements are available at a fairly regular and sufficiently dense grid of times

Growth curves or sparse observations on a sample of patients can be viewed as independent curves drawn from a population of interest. A large body of research in FDA has been motivated by various problems arising in such a setting. At the same time, many functional data sets, most notably in physical and environmental sciences, arise from long records of observations. An example is presented in Figure

The horizontal component of the magnetic field measured in one minute resolution at Honolulu magnetic observatory from 1/1/2001 00:00 UT to 1/7/2001 24:00 UT.

A functional time series does not however need to arise from cutting a continuous time record into adjacent pieces of natural length. Figure

Eurodollar futures curves on ten consecutive days.

Functional time series focus on temporal or otherwise sequential dependence between curves. In many applications, the curves are available at points in space. Such spatially indexed functional data will be denoted

F2-layer critical frequency curves at three locations. Top to bottom (latitude in parentheses): Yakutsk (62.0), Yamagawa (31.2), and Manila (14.7). The functions exhibit a latitudinal trend in amplitude.

Of course, one can consider a more complex structure in which one may study spatially indexed functional time series, with the data being

In this section, we discuss several important topics mentioned above, mostly by directing the reader to appropriate references.

The two editions of the book of Ramsay and Silverman [

The book of Ferraty and Vieu [

The monograph of Bosq [

To fully understand some theoretical aspects of FDA a good background in the theory of Hilbert spaces is needed. There are many introductory textbooks, my personal favorites are Riesz and Nagy [

The monograph of Horváth and Kokoszka [

In this paper, we focus on procedures for densely observed curves. For such data, functional objects needed to perform the calculations can be created using the

Functions are observed at discrete sampling values

As noted above, in some applications, the data are available only at a few sparsely distributed points

All procedures described in this paper can be implemented in readily available statistical software. Ramsay et al. [

Various brain scans can be viewed as functions over a spatial domain. Analysis of such data is discussed in several papers, see for example, Reiss and Ogden [

The study of the brain also provides motivation to study complex statistical objects. Aydin et al. [

This paper focuses on

In Sections

The space

We view a random curve

If

One can show that a bounded operator

Consider an arbitrary separable Hilbert space

An operator

A symmetric positive-definite Hilbert-Schmidt operator

An important class of operators in

If

Suppose

The random function

An interpretation of the

Functional time series (FTSs) were introduced in Section

The main idea behind functional time series modeling is that in many situations the time record can be split into natural intervals, and instead of modeling periodicity, we treat the curve in each interval as a whole observational unit. There are proponents and opponents of this approach. In several applications, it has been shown that the functional approach yields superior results, see for example, Antoniadis et al. [

Most methods described in this section are based on the FPC. In certain applications, different orthonormal systems, similar in spirit to the FPCs but more optimal for a specific application can be used. In addition to the predictive factors of Kargin and Onatski [

The remainder of this section is organized as follows. We first review the autoregressive functional model which has been, by far, most extensively used and studied. We then turn to more general ways of describing temporal dependence between functions. We conclude with a discussion of some recent developments.

A reader seeking a good reference to the fundamental concepts of time series analysis is referred to Brockwell and Davis [

The theory of autoregressive and more general linear processes in Hilbert and Banach spaces is developed in the monograph of Bosq [

A popular approach to the estimation of the autoregressive operator

In some simulated cases, the surfaces

The most direct use of the functional AR(1) model is to predict the curve

A generalization of the functional AR(1) model (

Thus, if we can estimate

Determining the order

For many functional time series, it is not clear what specific model they follow, and for many statistical procedures, it is not necessary to assume a specific model. It is however important to know what the effect of temporal dependence on a given procedure is. In this section, we describe a very general notion of dependence, which is convenient to use in the framework of functional data. We restrict ourselves to this framework. Several related useful results, including a functional (in

For

A sequence

Definition

The following example gives a good feel what Definition

Suppose that

Several other models are

Recently Lian [

An important result valid for

Suppose that

Kokoszka and Reimherr [

A central concept in time series analysis is the long run variance (LRV) which replaces the variance in many well-known formulas for valid for iid observations. Let

Suppose that

To apply Theorem

Suppose that the functional time series

Estimation of the LRV, even for scalar time series, is difficult due to the selection of the bandwidth

Suppose that

To see why (

The key point is the cancelation of

Zhang et al. [

In a two-sample problem, we consider two samples

An important contribution has been made by Benko et al. [

Regarding testing the equality of the covariance operators, Panaretos et al. [

When curves form a time series, it is typically assumed that, possibly after some transformation, they have the same distribution in

The idea of change point analysis is simple, and we explain it in the functional context using the change in mean function as an example. We observe functions

Note that under

The simplest, and in many settings most effective, change point detection procedures are based on cumulative sums (CUSUM procedures). In the above setting, denote

There have been several extensions to more complex settings. Horváth et al. [

The most extensively used and studied model for FTS is the FAR(1) model. Some nonlinear ARCH-type models are introduced in Hörmann et al. [

Kokoszka et al. [

Gabrys et al. [

A different framework is developed by Battey and Sancetta [

An interesting class of functional data are curves observed at several spatial locations, as already mentioned in Section

A sample of spatial data is

We assume that the functions

The approaches outlined above use parametric model fitting to obtain the covariances

If the random functions

A very important problem is to predict a curve at a specified location using the curves at available locations. This problem was addressed in Nerini et al. [

Giraldo et al. [

An emerging important class of functional models are those with hierarchical structure and correlation at some levels, similar to spatial correlations discussed in this Section. Such models have applications in the analysis of medical experiments in which tissue samples are taken at several locations in an organ of a subject. Staicu et al. [

We have reviewed recent developments in the analysis of dependent functions. We considered two data structures: functional time series and spatially distributed functions. Functional time series are collections of curves

Inference for data of both types assumes that the data are stationary, possibly after some transformation. At present there are no suitable tests of stationarity for such functional data. Second-order stationarity of FTS could potentially be tested using spectral methods. For scalar time series the relevant references are Grenander and Rosenblatt [

In many settings, a hybrid data of the form

For long records of geophysical, weather, or environmental data, a serious problem are long segments of missing observations. For example, if

The functions discussed in this paper, whether those observed consecutively over time or at spatial locations, are assumed to be smooth, so that methods relying of basis and FPCs expansions can be applied. Some functions do not fall into this category and may exhibit sharp spikes and flat regions. There has not been much work on the time series or spatial fields of functions of this type. Timmermans and Von Sachs [

The study of extremes involves work with point processes. For example

In summary, the study of dependent functional data has reached a level of maturity that makes it a useful subfield of FDA, but many important problems remain to be addressed. It is hoped that this paper has provided a useful introduction into this area.

This work was partially completed at the Institute for Mathematical Sciences, National University of Singapore, 2012. The research was partially supported by the NSF Grant DMS-0931948.