^{1}

^{1}

^{1, 2}

^{1}

^{1}

^{2}

Time-course expression profiles and methods for spectrum analysis have been applied for detecting transcriptional periodicities, which are valuable patterns to unravel genes associated with cell cycle and circadian rhythm regulation. However, most of the proposed methods suffer from restrictions and large false positives to a certain extent. Additionally, in some experiments, arbitrarily irregular sampling times as well as the presence of high noise and small sample sizes make accurate detection a challenging task. A novel scheme for detecting periodicities in time-course expression data is proposed, in which a real-valued iterative adaptive approach (RIAA), originally proposed for signal processing, is applied for periodogram estimation. The inferred spectrum is then analyzed using Fisher’s hypothesis test. With a proper

Patterns of periodic gene expression have been found to be associated with essential biological processes such as cell cycle and circadian rhythm [

Signal processing in the frequency domain simplifies the analysis and an emerging number of studies have demonstrated the power of spectrum analysis in the detection of periodic genes. Considering the common issues of missing values and noise in microarray experiments, Ahdesmäki et al. proposed a robust detection method incorporating the fast Fourier transform (FFT) with a series of data preprocessing and hypothesis testing steps [

While numerous methods have been developed for detecting periodicities in gene expression, most of these methods suffer from false positive errors and working restrictions to a certain extent, particularly when the time-course data contain limited time points. In addition, no algorithm seems available to resolve all of these challenges. Microarray as well as other high-throughput experiments, due to high manufacturing and preparation costs, have common characteristics of small sample size [

Recently, Stoica et al. developed a novel nonparametric method, termed the “real-valued iterative adaptive approach (RIAA),” specifically for spectral analysis with nonuniformly sampled data [

In this study, we found that the RIAA algorithm can provide robust spectral estimates for the detection of periodic genes regardless of the sampling strategies adopted in the experiments or the nonperiodic nature of noise present in the measurement process. We show through simulations that the RIAA can outperform the existing algorithms particularly when the data are highly irregularly sampled, and when the number of cycles covered by the sampling time points is very few. These characteristics of RIAA fit perfectly the needs of time-course gene expression data analysis. This paper is organized as follows. In Section

RIAA is an iterative algorithm developed for finding the least-squares periodogram with the utilization of a weighted function. The essential mathematics involved in RIAA is introduced in this section with the algorithm input being time-course expression data; for more details regarding RIAA, the readers are encouraged to check the original paper by Stoica et al. [

Suppose that the signals associated with the periodic gene expressions are composed of noise and sinusoidal components. Let

Let

The second term in the above equation is data independent and can be omitted from the minimization operation. Hence, the criterion (

We further apply

The target of interest to the fitting problem now becomes

After

Prior to implementation of RIAA for periodogram estimation, the observation interval

The observation interval

To ensure that the smallest frequency separation in time-course expression data with regular or irregular sampling can be adequately detected, the grid size

The following notations are introduced for the implementation of RIAA at a specific frequency

RIAA's salient feature is the addition of a weighted matrix

Assuming that

In Stoica et al. [

From (

The iteration for estimating spectrum starts with initial estimates

The phase of the cosine function

Use (

Obtain

and

At the

as the first iteration.

Terminate simply after 15 iterations (

for

Figure

The scheme of the process for detecting periodicities in time-course expression data.

After the spectrum of time-course expression data is obtained via periodogram estimation, a Fisher's statistic

In real data analysis, deviation might be invoked for the estimation of

Simulations are applied to evaluate the performance of RIAA. The simulation models and sampling strategies used for simulations are described in the following paragraphs.

Three models, one for periodic signals and two for nonperiodic signals, are considered as transcriptional signals. Since periodic genes are transcribed in an oscillatory manner, the expression levels

Additionally, as visualized by Chubb et al., gene transcription can be nonperiodically activated with irregular intervals in a living eukaryotic cell, like pulses turning on and off rapidly and discontinuously [

As for the choices of sampling time points

ROC curves are applied for performance comparison. To this end, 10,000 periodic signals were generated using (

Two yeast cell cycle experiments synchronized using an alpha-factor, one conducted by Spellman et al. [

RIAA performed well in the conducted simulations. As shown in Figure

(a) A time-course periodic signal with frequency

ROC curves strongly illustrate the performance of RIAA. In Figures

The ROC curves derived from simulations with 24 sampling time points, signal amplitude

The ROC Curves derived from simulations with 24 sampling time points, signal amplitude

Figure

The intersection of preserved genes and the benchmark sets using RIAA, LS, and DLS algorithms. (a), (b), and (c) reveal the analysis results when dataset alpha was applied. (d), (e), and (f) reveal the analysis results when dataset alpha 38 was applied.

In this study, the rigorous simulations specifically designed to comfort with real experiments reveal that the RIAA can outperform the classical LS and modified DLS algorithms when the sampling time points are highly irregular, and when the number of cycles covered by sampling times is very limited. These characteristics, as also claimed in the original study by Stoica et al. [

The intersection of detected candidates and proposed periodic genes in the real data analysis (Figure

Besides the comparison of these algorithms, it is interesting to note that the bio-like sampling strategy could lead to better detection of periodicities than the regular sampling strategy (as shown in Figures

The authors would like to thank the members in the Genomic Signal Processing Laboratory, Texas A&M University, for the helpful discussions and valuable feedback. This work was supported by the National Science Foundation under Grant no. 0915444. The RIAA MATLAB code is available at