With modern global navigation satellite system (GNSS) signals, the FFTbased parallel code search acquisition must handle the frequent sign transitions due to the data or the secondary code. There is a straightforward solution to this problem, which consists in doubling the length of the FFTs, leading to a significant increase of the complexity. The authors already proposed a method to reduce the complexity without impairing the probability of detection. In particular, this led to a 50% memory reduction for an FPGA implementation. In this paper, the authors propose another approach, namely, the splitting of a large FFT into three or five smaller FFTs, providing better performances and higher flexibility. For an FPGA implementation, compared to the previously proposed approach, at the expense of a slight increase of the logic and multiplier resources, the splitting into three and five allows, respectively, a reduction of 40% and 64% of the memory, and of 25% and 37.5% of the processing time. Moreover, with the splitting into three FFTs, the algorithm is applicable for sampling frequencies up to 24.576 MHz for L5 band signals, against 21.846 MHz with the previously proposed algorithm. The algorithm is applied here to the GPS L5 and Galileo E5a, E5b, and E1 signals.
The question of computing a circular correlation between a local code replica and an incoming code having a bit sign transition is a recurrent problem in global navigation satellite system (GNSS) [
The straightforward solution to this problem is to at least double the length of the sequences, by using more samples of the input signal (to observe at least two code periods and thus to be sure to observe one code period that is free of sign transition) and by zeropadding the local code replica [
Note that this straightforward solution is also a solution to other problems. (1) Still with the PCS, the length of the sequences may need to be increased to satisfy a constraint on the FFT length. For example, if the FFT length must be a power of two and if one code period corresponds to 4000 samples, applying directly zeropadding on both incoming and local sequences to get sequences of 4096 samples will result in losses (in general, the zeros will not be inserted at the same position inside the received and local codes; see [
In this paper, we propose the use of a method to reduce the zeropadding in order to improve the efficiency. The method is based on the fact that an
The rest of the paper is organized as follows. Section
The signal received from a GNSS satellite contains a spreading code, whose beginning is unknown, and an unknown residual carrier frequency due to the Doppler effect. The aim of the acquisition is to determine the code delay and the carrier frequency for all visible satellites [
As mentioned in the introduction, the PCS performs a circular correlation between a local code replica of the satellite searched and the received signal using FFTs, usually over the primary code period. Then, extra coherent integration or noncoherent integration can be performed, as shown in Figure
Illustration of the parallel code search (PCS) acquisition. The
Illustration of the problem due to a transition coming from the secondary code or the data. The values in the boxes indicate the chip number. (a) The incoming primary code starts with the first chip; the correlation at the correct alignment is maximum, as it would be without data or secondary code. (b) The incoming primary code does not start at the first chip (usual case); the correlation at the correct alignment is reduced. (c) In the worst case, the incoming primary code starts at the middle of a period; the correlation at the correct alignment is very close to 0.
As mentioned in the introduction, the straightforward solution to this problem is to at least double the length of the sequences, by using more samples of the input signal and by zeropadding the local code [
Straightforward solution to the bit sign transition problem. The magnitude of the first peak is always maximum, whereas the second peak can be reduced because of the sign transitions.
The L5, E5a, and E5b signals have a spreading code period of 1 ms, and the usual minimum sampling frequency considered is 20.46 MHz (assuming a complex sampling since the signal bandwidth is 20.46 MHz) [
Note that the sampling frequencies given here are those required for the acquisition. These frequencies can be the ones used by the RF frontend, but it is also possible to have an RF frontend using a higher sampling frequency and to have a resampling performed before the acquisition block (see [
In the literature, there are different propositions to overcome this problem other than the straightforward solution. For example, some authors proposed performing averaging in the Doppler search space [
In a recent paper [
In summary, with the 5FFT solution proposed in [
In the next section, a different approach is presented, still with the same goal, namely, computing exactly the correlation and reducing the complexity by decreasing the amount of zeropadding.
It is known that an FFT can be computed by combining the results of several smaller FFTs, at the expense of an increase in the number of operations (see [
The discrete Fourier transform (DFT) of a sequence
Then, if we divide the output sequence
Computation of an FFT of
A similar development can be done for the computation of an
Using the developments of the previous section to compute one FFT using three FFTs, it is thus possible to compute a 49 152point FFT using three 16 384point FFTs. Thus, using this approach for the acquisition of the L5, E5a, E5b, and E1 signals, the sequences would need to be zeropadded only up to 49 152 (much less than 65 536). The reduction of the zeropadding directly reduces the complexity of the acquisition, as shown in the next section, with great benefits for its implementation. Next, this solution will be called the 9FFT solution.
This method works if the initial length of the sequence is lower than or equal to 49 152, which corresponds to a maximum sampling frequency of 24.576 MHz for the L5, E5a, and E5b signals and to 6.144 MHz for the E1 signal. Therefore, the range of applicability of this solution is larger than that of the 5FFT solution (21.846 MHz for the L5, E5a, and E5b signals, 5.4615 MHz for the E1 signal [
Similarly, it would be possible to compute a 40 960point FFT using five 8192point FFTs. The sequences would need to be zeropadded only up to 40 960; that is, only 20 zeros would be padded. However, this method works if the initial length of the sequence is lower than or equal to 40 960, which corresponds to a maximum sampling frequency of 20.48 MHz for the L5, E5a, and E5b signals and to 5.120 MHz for the E1 signal. Therefore, the range of applicability in this case is very small, and it implies a loss for the E1 signal. Next, this solution will be called the 15FFT solution.
Remember that the sampling frequencies given are at the acquisition stage. The actual sampling frequency used by the RF frontend can be higher if a resampling is performed before the acquisition.
It is clear that it is not interesting to use higher splitting in our case. Indeed, an FFT can also be split into seven smaller FFTs; however, with seven 4096point FFTs we can compute a 28 672point FFT, which is much below the length of our signal (40 960 or 49 104); and with seven 8192point FFTs we can compute a 57344point FFT, which is much higher than the length of our signal; therefore, such a solution would be less efficient (because there would be more zeropadding and because the combinations of the FFTs inputs would become much more complex), and the only advantage is that such solution could be used for higher sampling frequency, for example, up to 28.672 MHz for the L5, E5a, and E5b signals.
In this section, we compare the 3FFT solution, the 5FFT solution (proposed in [
Schematic of the different solutions: (a) 3FFT solution (straightforward solution), (b) 5FFT solution (proposed in [
First, we review the sampling frequency range applicable to the different solutions. Then, we compare the complexity by evaluating the number of operations, which can be interesting for a digital signal processor based receiver. Finally, we compare the resources for an FPGAbased receiver. Remember that the 5FFT solution has a complexity similar to the algorithms proposed in [
As mentioned in the previous sections, each algorithm can be applied for a certain range of sampling frequency. These ranges are summarized in Table
Range of applicability of the different solutions.
Solution  FFT length  Sampling frequency range  Sampling frequency range 

for L5, E5a, and E5b signals  for the E1 signal  
3FFT solution  65 536  20.46 MHz–32.768 MHz  6.138 MHz–8.192 MHz 
5FFT solution  32 768  20.46 MHz–21.846 MHz  5.4615 MHz 
9FFT solution  16 384  20.46 MHz–24.576 MHz  6.138 MHz–6.144 MHz 
15FFT solution  8192  20.46 MHz–20.480 MHz  5.120 MHz 
A
Note however that these ranges are for an exact computation of the code correlation, and it is possible to use a higher sampling frequency at the cost of a potential loss. For example, it would be possible to use a sampling frequency of 26 MHz with the 9FFT solution. In this case, two code periods would correspond to 52 000 samples, and by removing the last 2848 samples the circular correlation can be performed on the 49 152 remaining samples. This can lead to a little loss, depending on the received code delay. If the delay of the code is between 0 and 23 152 (49 152–26 000), there will be no loss. If the delay is higher, a portion of the local code will not be aligned with the received code, introducing a loss. In the worst case of this example, 2848 samples will not be aligned, leading to a loss of about 1 dB. Of course, the higher the sampling frequency is, the higher the potential loss becomes.
To evaluate the implementations’ complexity, the number of complex multiplications and additions is computed. For this, we consider that an
The summary of the complexity analysis is given in Table
Number of operations of the different solutions.
Solution  Number of complex  Number of complex 

multiplications  additions  
3FFT solution  1 638 400  3 145 728 


5FFT solution  1 294 336  2 457 600 
(−21%)  (−21.9%)  


9FFT solution  1 228 800  2 408 448 
(−25%)  (−23.4%)  


15FFT solution  1 036 288  2 187 264 
(−36.8%)  (−30.5%) 
The percentage is the reduction compared to the 3FFT solution.
Therefore, for a digital signal processor based receiver, the 9FFT solution is not significantly better than the 5FFT solution, but the 15FFT is (20% less multiplications and 11% less additions).
To compare the implementations on an FPGA, we compare the resources in terms of logic elements, memory size, and number of multipliers, when the implementations provide approximately the same processing time. For this evaluation, we consider the Altera Stratix V FPGA family [
The resources of the different elements of the implementations have been evaluated separately. The FFTs resources have been estimated after place and route of a design containing one FFT with the Quartus II software 14.0, the adders and multipliers are estimated using models given in [
For the 3FFT solution, we consider the implementation given by Figure
For the 5FFT solution, the timemultiplexing implementation is given in Figure
FPGA implementation of the solution proposed in [
For the 9FFT solution, the timemultiplexing implementation is given in Figure
FPGA implementation of the proposed solution.
The same approach is considered for the 15FFT solution. In this case, the FFTs length is divided by eight compared to the 3FFT solution, and the FFTs are used five times to get a complete correlation result; therefore, the processing time is 62.5% the one of Figures
The resources used by each implementation are given in Table
Comparison of the resources for the different algorithms using the Altera FFT for the L5, E5a, E5b, and E1 signals, considering time multiplexing.
Implementation  Logic usage  Memory usage  Multipliers usage 

(ALM)  (M20K)  (DSP blocks)  
3FFT solution (Figure 

Total  8760  1824  38 


5FFT solution (Figure 

Total  8274  792  38 
Difference with 3FFT solution  −486 
−1032 
0 


Proposed 9FFT solution (Figure 

Total  8805  484  46 
Difference with 3FFT solution  +45 
−1340 
+8 
Difference with 5FFT solution  +531 
−308 
+8 


Proposed 15FFT solution  
Total  8645  286  52 
Difference with 3FFT solution  −115 
−1538 
+14 
Difference with 5FFT solution  +371 
−506 
+14 
Details of the resources for the different algorithms using the Altera FFT for the L5, E5a, E5b, and E1 signals, considering time multiplexing.
Implementation  Function  Logic usage  Memory usage  Multipliers usage 

(ALM)  (M20K)  (DSP blocks)  
3FFT solution 
3 FFTs (65 536 points)  3 × 2920  3 × 608  3 × 12 
1 multiplier  0  0  2  
Total  8760  1824  38  


5FFT solution 
3 FFTs (32 768 points)  3 × 2758  3 × 264  3 × 12 
1 multiplier  0  0  2  
Total  8274  792  38  


Proposed 9FFT solution 
3 FFTs (16 384 points)  3 × 2883  3 × 140  3 × 12 
4 multipliers  0  0  4 × 2  
Combinations for 
28  0  0  
Combinations for 
64  0  1  
Combinations for 
64  0  1  
2 memories (32 768 points)  0  2 
0  
Total  8805  484  46  


Proposed 15FFT solution  3 FFTs (8192 points)  3 × 2695  3 × 74  3 × 12 
4 multipliers  0  0  4 × 2  
Combinations for 
112  0  0  
Combinations for 
224  0  4  
Combinations for 
224  0  4  
4 memories (16 384 points)  0  4 
0  
Total  8645  286  52 
In Table
With the proposed 9FFT solution, the memory is again reduced, by 39.8% compared to the 5FFT solution; however, the logic is slightly higher (+6.4%, in part because the 16 384point FFT requires more logic than the 32 768point FFT), and the DSP blocks is also higher (+21.1%, because of the multiplication with the complex exponentials and the product between the FFTs outputs). However, the memory saving is more valuable than the increase of the DSP blocks (except in the cases where the overall system implemented into the FPGA strongly limits the number of DSP blocks available). Indeed, the minimum number of DSP blocks in a Stratix V FPGA is 256 [
With the proposed 15FFT solution, the memory is further reduced, by 63.9% compared to the 5FFT solution, but the number of DSP blocks is again increased. However, the number of additional DSP blocks represents at most 5.5% of the total available in a Stratix V FPGA, whereas the number of M20K blocks saved represents between 20% and 53% of the total available.
It can be noted that, compared to the 3FFT algorithm, the memory used by the 9FFT and 15FFT algorithms is, respectively, 3.8 and 6.4 times smaller, which represents a huge amount of memory saved, and allows the implementation of the algorithm in the smallest of the Stratix V FPGAs.
This paper discussed the problem of the complexity of a circular correlation computed by FFTs when two code periods are used. This problem appears in the parallel code search acquisition of GNSS signals, when we want to manage the sign transitions due to data or secondary code or when we need to zeropad the signals to reach a specific FFT length (e.g., a power of two).
With the structure of new GPS L5 and Galileo E5a, E5b, and E1 BOC(1,1) signals, the straightforward solution to this problem uses three 65 536point FFTs (3FFT solution) for sampling frequencies between 20.46 and 32.768 MHz for the L5, E5a, and E5b signals and between 6.138 and 8.192 MHz for the E1 signal processed as BOC(1,1). However, for sampling frequencies in the lower part of these ranges, this solution is not computationally efficient because it implies a lot of zeropadding.
Previously, in [
In this paper, we proposed two algorithms that still compute exactly the output samples, one using nine 16 384point FFTs (9FFT solution) and one using fifteen 8192point FFTs (15FFT solution), respectively. These algorithms exploit the fact that an FFT can be computed using
Compared to the 5FFT solution, the 9FFT solution has a slightly lower number of operations, and for an FPGA implementation the processing time is reduced by 25%, and the memory is reduced by about 40% (which means a reduction of about 75% compared to the straightforward algorithm). In return, there is a small increase of the logic and DSP blocks (+6% and +21%, resp.) but, compared to the resources available in an FPGA, the memory saved represents more than the DSP blocks lost. Moreover, this algorithm can be used for sampling frequencies between 20.46 and 24.576 MHz for the L5, E5a, and E5b signals and between 6.138 and 6.144 MHz for the E1 signal, that is, larger ranges compared to the 5FFT solution. Therefore, this algorithm is overall more efficient and more versatile than the 5FFT solution.
Regarding the 15FFT algorithm, compared to the 5FFT solution, the number of operations is reduced (20% less multiplications and 11% less additions), which is interesting for software defined DSP based receivers. For an FPGA implementation, the processing time is reduced by 37.5% and the memory is reduced by about 64%, for an increase of the logic and DSP blocks, but again this increase is small compared to the resource available in an FPGA. However, the sampling frequency range is more limited since it should be between 20.46 and 20.48 MHz for the L5, E5a, and E5b signals and at most of 5.12 MHz for the E1 signal, implying additional losses due to the higher code step in the acquisition.
In conclusion, two algorithms were proposed that reduce significantly the processing time and the memory resources compared to previously proposed algorithms, one providing better performance than the other but for a limited range of sampling frequency. Therefore, the context will indicate which one is the most interesting. Note also that the mentioned sampling frequencies are those at the acquisition stage; the actual sampling frequency used by the RF frontend can be higher if a sample rate conversion is performed before the acquisition.
When computing an FFT of
The direct computation of these combinations implies 4 complex multipliers and 6 complex adders, that is, 16 real multipliers and 20 real adders (assuming that a complex multiplication requires 4 real multiplications and 2 additions).
However, it is possible to exploit the fact that
There is a third option to reduce the complexity, by exploiting the fact that
First, implementing (
For the local code replica
It is also possible to exploit more the characteristics of the signals, although this would not allow a significant reduction of the resources. Indeed, if the second half of
When computing an FFT of
Regarding the FPGA implementation, now neither the real nor the imaginary part of the complex exponentials is a rational number. Therefore, 16 real multipliers and 52 real adders are needed for the combinations of sections
For
The details of the FPGA resources for the different solutions are given in Table
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors thank MarcAntoine Fortin for its suggestions that improved this paper.