Vector Radix 2 × 2 Sliding Fast Fourier Transform

. The two-dimensional (2D) discrete Fourier transform (DFT) in the sliding window scenario has been successfully used for numerous applications requiring consecutive spectrum analysis of input signals. However, the results of conventional sliding DFT algorithms are potentially unstable because of the accumulated numerical errors caused by recursive strategy. In this letter, a stable 2D sliding fast Fourier transform (FFT) algorithm based on the vector radix (VR) 2 × 2 FFT is presented. In the VR-2 × 2 FFT algorithm, each 2D DFT bin is hierarchically decomposed into four sub-DFT bins until the size of the sub-DFT bins is reduced to 2 × 2; the output DFT bins are calculated using the linear combination of the sub-DFT bins. Because the sub-DFT bins for the overlapped input signals between the previous and current window are the same, the proposed algorithm reduces the computational complexity of the VR-2 × 2 FFT algorithm by reusing previously calculated sub-DFT bins in the sliding window scenario. Moreover, because the resultant DFT bins are identical to those of the VR-2 × 2 FFT algorithm, numerical errors do not arise; therefore, unconditional stability is guaranteed. Theoretical analysis shows that the proposed algorithm has the lowest computational requirements among the existing stable sliding DFT algorithms.


Introduction
The two-dimensional (2D) discrete Fourier transform (DFT) has been widely used for spectrum analysis of 2D input signals in the field of signal processing.The vector radix (VR) 2 × 2 FFT [1] is one of the most practical approaches to performing the 2D DFT.The VR-2 × 2 FFT algorithm hierarchically decomposes each DFT bin into sub-DFT bins until the size of the DFT bins becomes 2 × 2. Because the DFT bins can be efficiently obtained from the sub-DFT bins using a butterfly structure, the computational cost of the DFT can be significantly reduced.Additionally, various VR FFT algorithms including VR-4 × 4, split VR, and VR-2 2 × 2 2 have been introduced to further enhance the efficiency of the VR-2 × 2 FFT by more finely decomposing the DFT bins [2][3][4][5].
The existing VR-based 2D FFT algorithms are widely applied in various fields of signal processing with satisfactory computing speed.However, these algorithms have computational redundancies when the transform window is shifted to the next sample because the input signals in the previous and current windows overlap.To reduce the redundancies in the sliding window scenario, numerous algorithms have been introduced over the past several years.The sliding DFT (SDFT) algorithm significantly reduces the computational load of the 1D DFT for the shifted window using the circular shift property [6].The hopping DFT (HDFT) applies the SDFT to the hopping window scenario; HDFT can reduce the number of complex multiplications and additions in intermediate calculations [7].Recently, the 2D SDFT proposed in [8] further reduced the number of computations of the SDFT using recursive strategy for a 2D input signal.The moving FFT (MFFT) algorithm proposed in [9] introduced a fast implementation of the 2D FFT by updating the previously calculated FFT bins when the current window is shifted.It is noteworthy that the recursive strategy adopted in the aforementioned sliding algorithms significantly reduces the number of complex multiplications and additions.However, the recursive implementation causes error propagation.Because, in practice, the complex numbers used in the DFT are represented in floating-point format with finite precision, the output DFT bins of the sliding algorithm and those of the traditional algorithm are not the same, although they are mathematically equivalent.Furthermore, the errors accumulate and, thus, can increase in the worst case.
To address this issue, several stable DFT algorithms have been proposed.The stable SDFT proposed in [10], called rSDFT, reduces the numerical error by multiplying the twiddle factor by the damping factor , where 0 ≪  < 1.
In the modulated SDFT (mSDFT) [11], the unstable twiddle factor is multiplied recursively by the previous output DFT bin, which is considered to be the cause of numerical errors.Therefore, the mSDFT removes the twiddle factor from the feedback loop, thereby guaranteeing unconditional stability.Recently, the guaranteed-stable SDFT (gSDFT) was proposed for fast, stable DFT [12].The gSDFT not only guarantees the stability by removing the twiddle factor from the feedback of the resonator but also reduces the computational requirements by adopting the butterfly structure of the traditional FFT algorithm to obtain the bins of the updating vector transform.Nevertheless, the output DFT bins still contain the numerical error because the recursive strategy is used.On the other hand, the sliding FFT (SFFT) algorithm introduced by Farhang-Borojueny and Gazor in [13] does not adopt the recursive strategy.The SFFT calculates the bins of the shifted window by exploiting delayed intermediate calculations of the previous window.Therefore, the SFFT has the advantage that numerical errors do not occur as the transform window slides.However, because the SFFT is designed to handle 1D input data, its computational complexity can be further reduced for the 2D input data.
In this letter, we propose a novel stable, fast 2D sliding FFT algorithm.Because the 2D butterfly structure used in the VR-2 × 2 FFT algorithm is comprised of the 1D butterfly structure, the proposed VR-2 × 2 SFFT adopts the concept of the SFFT to guarantee the stability.In addition, we consider the 2D sliding window to move in only one direction, that is, column-wise or row-wise, to reduce the computational complexity.The rest of this letter is organized as follows.
In Section 2, we first analyze the computational relationship between the sub-DFT bins and the DFT bins of the VR-2 × 2 FFT.Then, we explain how the proposed VR-2 × 2 SFFT algorithm reduces the amount of computation in the sliding window scenario while guaranteeing the computational stability.Section 3 presents the performance of the VR-2 × 2 SFFT by analyzing its arithmetic complexity and stability, making a comparison with those of the conventional algorithms.Finally, conclusions are given in Section 4.
It is obvious that the conventional butterfly diagram represents the computational relations between DFT bins and sub-DFT bins.However, because the butterfly diagram has been developed to illustrate the data flow of the 1D FFT, it is difficult to present both the 2D spatial position of the input X(k, l) Unlike for the conventional 2D butterfly diagram, the 2D spatial location of the input sub-DFT bins and the output DFT bins can be clearly illustrated, as in Figure 2. In addition, because the X-shaped pairs of arrows do not overlap with each other, it is easier to discriminate the relations between input and output DFT bins.However, the diagram still becomes complicated as the amount of data increases.To make the diagram simpler, let us omit the X-shaped arrows and twiddle factors   ,   , and  + from Figure 2.Then, the 3D diagram of the VR-2 × 2 FFT can be simplified as shown in Figure 3.
The decimation procedure repeats log 2  times until the size of   is reduced to 2 × 2. Let   denote the th decimation stage, where  is an integer in the range of [1, log 2 ], and the input data bins appear at stage  1 .The sizes of   and  at stage   are 2 −1 × 2 −1 and 2  × 2  , respectively.The twiddle factors multiplied with  00 (, ),  01 (, ),  10 (, ), and  11 (, ), ,  = 0, 1, . . ., 2 −1 , at   are 1,   ,   , and  + , respectively.An example of the twiddle factors required for the 8 × 8-point VR-2 × 2 FFT is shown in Figure 4. Now, let us consider the computational relationship between the input data and the values of   (, ).As in (2), the VR-2 × 2 FFT algorithm computes the DFT bins using the decomposed 2 × 2   , which means that the input data give hierarchical dependencies to the output DFT bins through all decimation stages.This technique can best be explained via an example.Figure 5 shows a flow graph of the 8 × 8-point VR-2 × 2 FFT algorithm.For the 8 × 8-point VR-2 × 2 FFT, three decimation X(k, l)  Based on this observation, we derive the proposed algorithm, which can reduce the computational complexity of the VR-2 × 2 FFT in the sliding window scenario.Assume that all calculations in the structure shown in Figure 5 have already been performed for the previous data and that the results were stored.Furthermore, assume that the 8 × 8 window on the input data shifts by one column and that the same process must be performed at the current state.
Figure 6 shows 2D flow graphs of the VR-2 × 2 SFFT at both the previous and the current state by projecting the 3D flow graph in Figure 5 onto the column-layer plane.Denote a set of data of the th column in the th layer by    .Then, the number of data corresponding to   is 2 −1 .Figure 6(a) shows that the 8 × 8-point VR-2 × 2 FFT is performed using the input data from the 0th to the 7th column from the previous state.All of the calculated data in  2 and  3 represented by gray circles are stored.At the current state, the input data of the 0th column are removed from the window and those of the 8th column are newly included.After the input data of the current window are arranged in bit-reverse order, they are transformed using the 8 × 8-point VR-2 × 2 FFT.
Here, note that the data in {1st, 5th}, {3rd, 7th}, and {2nd, 6th} columns are paired, becoming the input of  1 at the current state, as they were at the previous state.Then, the values of { 1 2 ,  5 2 ,  3 2 ,  7 2 ,  2 2 ,  6 2 } in  2 and { 1 3 ,  5 3 ,  3 3 ,  7 3 } in  3 at the previous state are the same as those at the current state, and, thus, they can be reused.As a result, only the data indicated by white circles in Figure 6(b) need to be calculated in the sliding scenario.
In addition, the computations for the data indicated by white circles can be further reduced, based on the fact that the 2D window is shifted column-wise.The structure of the 2D butterfly is composed of two stages, where each stage has two 1D butterflies, as shown in Figure 7.    Input data Output DFT bins In the first stage, two sets of the data {, } and {, } are the inputs of the single butterfly.The resultant {  ,   } and {  ,   } are used as the input to the butterflies in the second stage.Here, it can be observed that the data set {, } is not used to calculate {  ,   }.This means that if the data {  ,   } have already been calculated, they do not need to be recalculated.Then, the complexity of one complex multiplication (  ) and two complex additions (  's) can be reduced.
In Figure 6(b), the input data of the 4th and 8th columns are used together as the input to stage  1 .Because the calculations for the data in the 4th column were already performed for the previous state, the results can be reused for the current state.Next,  2 2 and  6 2 are paired with  4 2 and  8 2 , respectively, as the input at stage  2 , and the 2D butterfly is performed for each pair.Because  2 2 and  6 2 have also been calculated, the number of computations for the 2D butterflies can be reduced.The complexity reductions can be achieved at stage  3 in the same manner.
Consequently, each 2D butterfly of the proposed algorithm requires only two   's and six   's, whereas that of the VR-2 × 2 FFT algorithm needs three   's and eight   's.Output DFT bins Output DFT bins  Moreover, because the output DFT bins of the proposed algorithm are identical to those of the VR-2 × 2 FFT algorithm, numerical errors do not accumulate.Therefore, the proposed VR-2 × 2 SFFT algorithm can guarantee stability.Next, we analyze the computational complexity of the proposed VR-2 × 2 SFFT and compare it to that of the conventional algorithms.

Complexity and Stability Analysis of the VR-2 × 2 SFFT Algorithm
( As mentioned in Section 2, a 2 × 2 butterfly needs two   's and six   's, and the VR-2 × 2 SFFT requires  2 −   's and 3( 2 − )   's for  ×  input data.Because the proposed VR-2 × 2 SFFT computes only some of the butterflies among those of the VR-2 × 2 FFT structure, the computational requirements of the proposed algorithm are lower than those of the VR-2 × 2 FFT algorithm.A computational comparison of various window sizes is shown in Table 1, where one   is counted as four real multiplications (  's) and two real additions (  's) and one   is counted as two   's.The 2D MFFT [9], SFFT [13], and 2D SDFT [8] are chosen for the existing fast DFT/FFT algorithms.In addition, the rSDFT [10], mSDFT [11], and gSDFT [12] are chosen for the existing stable DFT algorithms.Considering that the SFFT, rSDFT, mSDFT, and gSDFT are proposed for the 1D input signal, we assume that each 1D transform is horizontally performed on the 2D input signal, and, then, the 1D FFT is vertically applied to the results.All the algorithms listed in Table 1 were implemented using the ANSI-C code and the performance was evaluated on a 3.3-GHz CPU with 8 GB of RAM.In our simulation, the size of transform window was set to 16 × 16 and we measured the processing time required for performing the sliding transform process 10 6 times.For each algorithm, we measured the processing time 10 times and averaged the measured values.The resultant values are listed in Table 2.
The complexity of the proposed VR-2 × 2 SFFT is higher than that of the 2D SDFT and 2D MFFT.However, the proposed algorithm achieves ( 2 ) complexity in both   and   , which is the lowest among the existing stable DFT algorithms.Further, we present the memory requirements of all the algorithms in Table 2.For each algorithm, we examined the amount of required memory for performing the × sliding transform.For the sake of the clarity, the memory to store the input and output signals is not considered.In Table 2, we see that the memory requirements of the algorithms vary depending on the window size.In general, 2D SDFT and 2D MFFT need a relatively small amount of memory as compared to the other algorithms.Note that the size of the transform window is usually much smaller than those of the input and output signals.Therefore, in general, the amount of memory required to performing the sliding transform may be not a big burden for real-world applications.

Stability Analysis.
We investigated the stability of the proposed VR-2 × 2 SFFT algorithm using a complex test signal, which was zero-mean Gaussian noise with a standard deviation of one.The simulation was performed in 64-bit double-precision arithmetic;  was set to 16.In our simulation, the numerical errors are generated by the recursive strategy of the sliding DFT/FFT algorithms.Therefore, we evaluate the stability of the algorithm by computing the differences between the output DFT bins of the sliding algorithm and those of the reference algorithm.The reference algorithm denotes the original algorithm from which the sliding algorithm was derived, for example, the reference algorithm of the VR-2 × 2 SFFT is the VR-2 × 2 FFT.All algorithms were tested over 10 6 iterations.The error   at time index  is calculated as where  Reference  (, ) represents the (, )th DFT bin of the reference algorithm and  Algorithm  (, ) is the bin of the test algorithm.
Then, it is observed that the errors in the 2D MFFT and the 1D rSDFT + 1D FFT are accumulated as  increases, as shown in Figure 8.For the rSDFT, the damping factor  is set to 1−10 −12 .In our simulation, the average   of the 2D MFFT is 4.41×10 −9 and that of the 1D rSDFT + 1D FFT is 3.53×10 −9 .In terms of the average increasing ratio of   , the 2D MFFT and 1D rSDFT + 1D FFT are 6.36 × 10 −15 and 5.14 × 10 −15 , respectively.
The measured   values of the 1D mSDFT + 1D FFT, 1D gSDFT + 1D FFT, and VR-2 × 2 SFFT are presented in Figure 9.The errors of the 1D mSDFT + 1D FFT and the 1D gSDFT + 1D FFT did not accumulate; however, they fluctuated in the range of   algorithm are exactly the same as those of the VR-2 × 2 FFT, and errors do not accumulate.Therefore, the proposed VR-2 × 2 SFFT outperforms the other algorithms in terms of stability.

Conclusion
In this letter, a new stable SFFT based on the VR-2 × 2 FFT algorithm was presented for 2D input data.We first analyzed the computational relationship between the sub-DFT bins of the structure of the VR-2 × 2 FFT.Then, we adopt the concept of the 1D SFFT, which calculates the bins of the shifted window by exploiting the delayed intermediate calculations of the previous window.The proposed VR-2 × 2 SFFT algorithm achieves ( 2 ) complexity, which is the lowest among the existing stable DFT algorithms.Because the output DFT bins of the proposed method are exactly the same as those of the VR-2 × 2 FFT algorithm, the numerical errors do not accumulate in the sliding transform process.

Figure 3 :
Figure 3: Simplified version of the 3D butterfly diagram.
stages, that is,  1 ,  2 , and  3 , are required, and, thus, there are four data layers, as shown in Figure5.Let   ,  = 1, 2, 3, 4, denote the th data layer, where  1 and  4 consist of the input data and the output DFT bins, respectively.Note that the input data are arranged in bit-reverse order.First, the 2 × 2 input data indicated by black circles in  1 are linearly combined at stage  1 to obtain the corresponding 2 × 2 DFT bins indicated by gray circles in  2 .Then, 4 × 4 data including the resultant 2 × 2 DFT bins in  2 are used at stage  2 to calculate 4 × 4 DFT bins indicated by gray circles in  3 .Finally, the 8 × 8 DFT bins in  4 are obtained at stage  3 using all of the data in  3 .Here, it is noteworthy that the data indicated by white circles in  1 and  2 are not involved in calculating the sub-DFT bins indicated by gray circles in  2 and  3 .

Table 1 :
Computational requirements of the 2D DFT/FFT algorithms for  ×  input data in the sliding window scenario.

Table 2 :
Comparison of the processing time and the additional memory requirements.