Efficient Parallel Carrier Recovery for Ultrahigh Speed Coherent QAM Receivers with Application to Optical Channels

,


Introduction
The recent emergence of the updated standards IEEE 802.3 for 40 and 100 gigabit per second (Gb/s) Ethernet and G.709 for 40 and 100 Gb/s optical transport network (OTN), as well as the first commercially available devices implementing these data rates, reveals the vertiginous growth on the bandwidth demand in the last decade [1,2].
The projected increase on the bandwidth demand (e.g., ≥100 Gb/s) has set the bases for the next generation of Ethernet and OTN, and it has, therefore, renewed interest on coherent detection and spectrally efficient modulation techniques such as -ary phase-shift keying (-PSK) and -ary quadrature amplitude modulation (-QAM).More precisely, the conjunction among intradyne coherent detection, polarization-division multiplexing (PDM), 16-QAM, and electronic dispersion compensation (EDC) [3,4] allows to reach good tradeoff among complexity, spectral efficiency, minimization of nonlinear distortions, and the possibility to completely compensate with zero penalty the main fiber channel impairments [3] (i.e., polarization mode dispersion (PMD) and chromatic dispersion (CD) [5]).In particular, intradyne detection is preferred over the alternative heterodyne or homodyne architectures because it replaces complex optical phase-locked loops (PLLs) with more robust and easier to implement digital carrier recovery (CR) techniques.In other words, all of these aspects can be summarized in an improved receiver sensitivity in comparison to intensity modulation direct detection (IM/DD) schemes [6,7].
In this context, CPR fulfils a fundamental role in coherent optical receivers [3,8].Feedforward phase estimation schemes such as Viterbi-Viterbi (VV) [9] or blind phase search (BPS) [10] algorithms have been proposed for optical coherent receivers, because of their good laser linewidth tolerance and feasibility for parallel implementation.More specifically, significant amounts of CD lead to an enhancement of the phase noise introduced by the local oscillator and a lower tolerance with respect to carrier frequency offsets.In these feedforward CPR schemes, a perfect compensation of carrier frequency offset is assumed.However, this condition may not be always satisfied in practice.In fact, it has been shown that the phase error variance increases with the frequency offset, degrading the performance of the feedforward phase estimation stage [11].Feedforward techniques to estimate and compensate frequency offset have been investigated in previous works [12][13][14][15].Moreover, parallel architectures of these techniques are feasible for implementation in high-speed receivers.In particular, [15] has been conceived as data-aided (DA) algorithm that uses training sequences to enhance the capture range up to near 1/(2), being  the symbol duration, whereas [12][13][14] are nondata-aided algorithm (NDA) with capture range close to 1/(8) for 16-QAM scheme.
Although accurate frequency offset estimation and compensation can be carried out by well-known techniques, a static frequency offset has been assumed in all these proposals.As it has been recently demonstrated, transmitter or local oscillator laser frequency instability caused by mechanical vibrations significantly degrades the performance of feedforward CPR algorithms [16].Other effects such as power supply noise may also introduce laser frequency fluctuations which can be modeled as a frequency modulation with a sinusoid of large amplitude (e.g., ∼250 MHz) and low frequency (e.g., ≤35 KHz) [16].The effectiveness of frequency offset estimation techniques, such as those mentioned earlier, is limited due to the large amplitude of the modulation signal (i.e., large laser frequency change rate).Recent publications have proposed architectures for compensation of laser frequency fluctuations when quadrature phase-shift keying modulation (QPSK) is used [2,17,18].For example, a two-stage carrier recovery parallel architecture based on a low-latency parallel DPLL and the feedforward VV CPR algorithm has been proposed in [17].This technique offers an excellent tradeoff between complexity and performance for coherent QPSK receivers in the presence of laser phase noise, sinusoidal frequency jitter, and frequency offset.In this work, we generalize the technique introduced in [17] for application to -QAM optical receivers.
As mentioned before, feedforward CPR blocks based on the VV or BPS algorithms achieve good laser linewidth tolerance and overcome some of the latency-related limitations [8].We show here that traditional decision directed DPLLs [19] offer advantages in some aspects of the operation of CPR, for example, the tracking of large amplitude sinusoidal carrier frequency jitter experienced by typical lasers.A traditional PLL is often modeled as a linear filter, assumption which is useful to compute the small signal transfer function [19].However, the PLL is actually a nonlinear filter precluding, in this way, the use of the unfolding techniques discussed by Parhi in [20], are applicable only to strictly linear filters.Therefore, a different approach to reduce the latency of the PLL parallel implementation must be found.
In the present work we introduce a new parallel carrier recovery algorithm which combines a novel low-latency parallel DPLL with a traditional feedforward CPR algorithm.The new low-latency parallel DPLL is used to compensate not only frequency offset but also frequency fluctuations.The proposed DPLL approach takes out of the feedback loop as much processing as possible in order to simplify the loop and reduce its latency.Then, the bottleneck of the critical PLL feedback path is broken by using a novel approximation to the DPLL computation, which provides a capture range and bandwidth close to those achieved by serial DPLLs [17,21].
Computer simulations demonstrate that the degradations caused by frequency offset and laser frequency fluctuations can be eliminated with the proposed parallel carrier recovery technique.Unlike the superscalar parallelization (SP) methods [22][23][24][25], the technique proposed here does not require training symbols to avoid the acquisition problem.Moreover, the buffers required by the SP scheme are completely avoided in our approach.The remainder of the paper is organized as follows.Section 2 presents the system model and analyzes the effects of the carrier frequency fluctuations on the receiver performance.Section 3 describes the two-stage carrier recovery technique.Section 4 introduces the new low-latency parallel DPLL, while numerical results are shown and discussed in Section 5. Finally, conclusions are drawn in Section 6.

System Model
Figure 1 shows a simplified block diagram of the coherent receiver with electronic dispersion compensation.Then, the sample at the equalizer output can be expressed as where   is the th transmitted symbol and   is the total phase noise.Component   represents the amplified spontaneous emission (ASE) noise sample, which is modeled as a white complex Gaussian random variable with power  2 [3].The equalized output signal (1) can be rewritten as where |  | and   are the magnitude and the phase of the complex sample   , respectively.In -PSK and -QAM systems, the symbol information is contained totally or partially in the phase of   , respectively.The received phase   can be expressed as where   is the phase of the transmitted symbol   and Ω  is the angular carrier frequency offset given by Ω  = 2  , with   and  being the carrier frequency offset and the symbol duration, respectively.Term ΔΩ  represents the phase change generated by frequency fluctuations.In this  work we assume that the carrier is modulated by a sinusoidal interfering signal; therefore where   and Δ  are the amplitude and frequency of the modulation tone.
Component   is the total phase noise given by where  (laser)  and  (ASE)  are the laser phase noise and the ASE generated phase noise, respectively.Laser phase noise is modeled as a Wiener process as follows: where   s are independent, identically distributed, Gaussian random variables with zero mean and variance  2  = 2Δ], being Δ] the laser linewidth [8].
2.1.Feedforward CPR.Typical carrier recovery techniques for coherent optical receivers combine a frequency offset compensation stage followed by a feedforward phase estimation block based on the well-known VV or BPS algorithms (see Figure 2) [13].Once the frequency offset is removed, the VV or BPS block estimate and compensate the phase noise.
Figure 3 shows a simplified block diagram of the VV algorithm implementation.The VV block estimates the phase noise based on the th power of the received signal as follows: where U is the unwrap function and   is the output of the VV estimator given by with  being an integer odd number which represents the VV estimator length (see [8] for more details).
An alternative to the VV estimator is the so-called BPS algorithm shown in Figure 4.The BPS blocks estimates the phase noise as follows: where φ is the test phase defined as where  is the number of phases to be tested; term (r  , φ ) is given by where Q(⋅) is the slicer function and  is, again, the estimator length (see [10] for more details).
Both VV and BPS techniques efficiently compensate the effects of the laser phase noise.Particularly, the VV architecture is preferred for -PSK modulation schemes because of its uniform angular spacing and constant modulus between symbols.Although there exist alternatives that enable the VV to operate with -QAM schemes [26], the BPS algorithm is preferred because it performs better in the presence of laser phase noise in spite of its greater computational complexity.

Effects of Frequency Fluctuations.
Mechanical vibrations cause small deformations of electronic components, such as the laser cavity, leading to frequency fluctuations (see [16] and references therein).As expressed in the introduction, these fluctuations can be described as a frequency modulation with a sinusoidal signal of large amplitude (e.g.,   ∼ 250 MHz) and low frequency (e.g., Δ  ≤ 35 KHz).Without loss of generality, we consider in this work differential QPSK and 16-QAM differentially encoded in quadrant.Figures 5 and 6 show the optical signal-to-noise ratio (OSNR) penalty at a bit-error-rate (BER) of 10 −3 versus the tone amplitude   for Δ  = 35 KHz.We use the feedforward VV and BPS CPR schemes depicted in Figures 3 and 4, respectively, with 1/ = 32 giga-samples per second (Gs/s), laser linewidth Δ] = 250 KHz, and several values of the estimator length, .Perfect estimation of the frequency offset is assumed.At the selected symbol rate, and within the jitter tone amplitude range of concern, QPSK does not show a significant penalty when the averaging block length is properly chosen.On the other hand, note that the performance in the 16-QAM case is significantly deteriorated with the amplitude of the frequency modulation tone, which agrees with that reported in [16].Notice also that the value of the estimator length that minimizes the penalty depends on the tone amplitude.This fact suggests the need for an automatic adjustment algorithm for .

Carrier Recovery with Compensation of Frequency Fluctuations
Based on the results shown in Section 2.2, we conclude that the tracking of frequency fluctuations becomes an essential task in ultrahigh speed intradyne coherent optical receivers.Towards this end, a two-stage carrier recovery algorithm is proposed in this work (see Figure 7).A first CPR stage is based on a low-latency parallel DPLL, which is used to compensate not only frequency offset but also carrier frequency fluctuations.The second CPR stage is based on the renowned VV [9] or BPS [10] algorithm, which operates on the signal demodulated by the DPLL.The second CPR stage is mainly used to compensate the laser phase noise.Parallel architectures for both stages must be provided for multigigabit applications.Feedforward phase estimation schemes such as VV or BPS are attractive for high-speed coherent receivers owing to their good laser linewidth tolerance and feasibility for parallel implementation.Nevertheless, the low-latency parallel DPLL proposed in [17] has been designed for QPSK format.In the following section, we generalize the scheme introduced in [17] for application to -QAM.3.1.Phase Domain Digital PLL.We consider a phase domain DPLL in order to reduce computational complexity.The domain change results in the substitution of complex multipliers by real adders, allowing in this way to increase the processing rate of the system, a fundamental aspect in multi-gigabit communications where high processing rates are required.
In a decision directed carrier recovery loop (see Figure 8), the symbol information is first removed [19].In QPSK receivers, this operation can be easily carried out in the phase domain as follows: where (⋅)  denotes modulus .In the absence of phase noise and frequency deviations (i.e.,   = 0 for all  and   = Δ  = 0), notice that φ = (  ) /2 = /4 for all .A similar approach can be adopted for -QAM.For example, for 16-QAM the symbol phase   reduced to the first quadrant results in (  ) /2 ∈ {arctan(1/3), /4, arctan(3)}.Figure 9 depicts the entire QPSK and 16-QAM constellations in the complex plane, where the labels  and  stand for the real and imaginary axes, respectively.Moreover, the shaded areas in Figure 9 highlight the quadrant reduction given by (12).The phase at the numerically controlled oscillator (NCO) output of a type II second-order DPLL (see Figure 8) can be expressed as where all addition operations in the following analysis are modulus 2, and the constants  () and  () are the loop proportional and integral gains, respectively;   is the phase error given by where   is the symbol phase of the transmit symbol reduced to the first quadrant; that is,   = (  ) /2 .Finally, term  −1 in (13) is the accumulated phase error given by Since the phase symbol is not known apriori at the receiver, we use a tentative decision of the transmit symbol to estimate the phase   as follows: where θ is the phase of the demodulated received sample, reduced to the first quadrant; that is, Note that ( + )  = (()  + ()  )  ; therefore, since φ = (  ) /2 , we can get (17).For example, for QPSK while for 16-QAM, ( Figure 10 shows the 16-QAM constellation reduced to the first quadrant of the complex plane and the decision boundaries according to (19).

Evaluation of DPLL for Tracking Frequency Fluctuations.
The effectiveness of the decision directed DPLL to track frequency fluctuations is analyzed in the following section.In our carrier recovery scheme, the serial DPLL is used for compensation of frequency offset and fluctuations, while a feedforward CPR block based on the BPS algorithm is used for phase noise estimation.This carrier recovery architecture will be denoted as S-DPLL + BPS.
Figure 11 shows the OSNR penalty versus the modulation tone amplitude,   , for Δ  = 35 KHz and Δ] = 250 KHz.The BPS filter length is  = 21, while the test phase number is  = 32.Note that the performance degradation caused by the carrier frequency fluctuation is eliminated with the new combined S-DPLL + BPS carrier recovery technique.
Figure 12 presents the tolerance of BPS and S-DPLL + BPS architectures to the laser phase noise in the presence of a frequency modulation tone with   = 140 MHz, Δ  = 35 KHz.These models were compared with the BPS algorithm without influence of frequency fluctuations (i.e.,   = 0).The last mentioned scheme is used as a benchmark.It is interesting to highlight the important degradation caused by the frequency fluctuations in the solution solely based on the BPS algorithm.Again notice that the effects of the carrier frequency fluctuations are mitigated by using the proposed S-DPLL + BPS carrier recovery algorithm.

New Low Latency Parallel DPLL for M-QAM
Maximum clock frequency of complex digital signal processors for the state of the art 28 nm CMOS technology is limited to less than 1 GHz.Thus, the use of parallel processing techniques for the implementation of multigabits per second receivers is mandatory.Unfortunately, the nonlinear filter nature of the DPLL impedes the use of the unfolding techniques [20].Since low latency is a key factor to track frequency fluctuations, then we develop a new approach to reduce the latency in the parallel implementation of DPLL.

Parallel
Type II DPLL for M-QAM.From (13) it is possible to show that  where with  +1 given by ( 16) and For the type II second-order DPLL, the steady-state error is zero (i.e., lim  → ∞   → 0) [19].Thus, assuming that the bandwidth of the loop is low-to-moderate such  () ≪1, the contribution of the term  ()   can be neglected; therefore the phase error (21) results in where ρ+1 is given by ρ+1 =  (      +1     , θ+1 ) with Furthermore, since the accumulated phase error varies slowly with time (i.e.,   ≈  −1 ), from ( 20) and ( 23), we can obtain where ρ+ is given by ρ+ =  (      +     , θ+ ) with   Let  be the parallelization factor.Following a similar analysis, it is possible to derive that A type II DPLL can be considered as two separate feedback loops: the proportional and integral loops (see Figure 13).Thus, the NCO output ( 29) can be rewritten as where + and  () + are the NCO components due to the proportional and integral paths, respectively.(31) From ( 12) and ( 17), note that Thus, expression (31) can be rewritten as From ( 32) and (34), note that (28) can be rewritten as For example, from ( 18) and (33) the NCO output (33) for QPSK reduces to [17]  ()  + ≈ Unfortunately, it is still highly complex for -QAM (33) to be implemented with digital signal processors for the state of the art 28 nm CMOS technology as a result of the complexity required to carry out in one clock cycle the computation of the function ρ+ = (| + |, θ+ ) and then the last summation in (33).This problem can be mitigated if terms (| + |, θ+ ) are precomputed by using the NCO output of the previous clock cycle; that is, As we shall show later, the performance degradation caused by (37) is negligible in practical situations (e.g., 16-QAM with  ≤ 80).For 16-QAM, this behavior can be understood from the facts that (i) only the nondiagonal symbols use θ+ (see (19)) and (ii) laser frequency fluctuations are slow compared to the baud rate.

Integral Loop.
On the other hand, from (29) and Figure 13, we can also derive the NCO component due to the integral path as follows: The accumulated phase error can be expressed as Based on ( 12), ( 14), ( 30), (34), and (38), the accumulated phase error can be evaluated as ε = ( φ −  (40)

Parallel Architecture of the New DPLL.
A parallel implementation of the type II DPLL can be easily achieved as depicted in Figure 15.Term  =  with  being a positive integer represents the latency required to compute all the operations of the integral path (e.g., the phase error computation (PEC) defined in (40)).Since the latency in this path is not as critical as in the proportional loop, its effect on the DPLL performance will be negligible, as we will show in the next section.Similarly to  () , the integral gain  () is assumed to be a power of 2 (i.e.,  () = 2 −  with   being a positive integer).Figure 16(a) shows a possible implementation of the block " −1 ", and Figure 16(b) depicts an example of a tentative implementation of the "  " block based on look-up tables for 16-QAM.

Numerical Results
In this section we evaluate the effectiveness of the proposed two-stage CPR.We use 16-QAM differentially encoded in quadrant on a nondispersive noisy channel with 1/ = 32 Gs/s.The OSNR at a given bit-error-rate (i.e., BER of 10 −3 )   17.The loop filter gains were selected in order to obtain maximum bandwidth with 0.5 dB maximum peaking (see Table 1).For the optical system considered here, these values of bandwidth and peaking provide a good tradeoff between capture range and the residual phase noise power at the input of the slicer (see Figure 1).
Due to the fact that frequency offset values in intradyne receivers exceed the maximum theoretical limit of 1/(8) [27] that can be reached by decision directed algorithms at the considered symbol rate (i.e., ±5 GHz; see [28]), typical intradyne coherent optical receivers are provided with a coarse carrier frequency recovery (CCFR) stage [2] that minimizes or reduces to zero this frequency gap to values in the theoretical range.However, residual frequency offset after CCFR can surpass the tolerance of CPR algorithms like the VV and the one considered in this work, that is, BPS.The capture range for the proposed P-DPLL is ∼ ±4 GHz, which is close to the maximum theoretical frequency offset value for the given symbol rate (i.e., 1/(8) = 4 GHz).Gear shifting is applied into the proportional and integral gains during the capture period.
Figure 18(a) shows the BPS CPR tolerance to the joint effect of the laser phase noise and the sinusoidal frequency tone amplitude,   .At the same time Figure 18(b) depicts the performance of the combined architecture P-DPLL + BPS with  = 64 under the same conditions as the ones already mentioned.It is interesting to note in Figure 18(b) the significant improvement in terms of sinusoidal frequency tolerance of the combined architectures in relation to the single stage CPR solely based on BPS.In other words, this improvement is evidenced in the increase of the contour line slope, getting parallel (i.e., independent) to the   axis.
Figure 19 complement the current study for several values of the parallelization factor under the same conditions earlier detailed.Particularly, Figure 19 shows the performance of the two stage CPR architecture DPLL in conjunction with BPS algorithm using 16-QAM scheme.From the present study it is possible to derive Figure 20 where the efficiency of the proposed approximation for the parallelization of the DPLL is evidenced.Even though the 16-QAM format seems to be sensible to the effect of the parallelization factor, it is possible to highlight that the performance remains constant in a wide range of the parallelization axis and solely increases the penalization for large values of laser linewidth (i.e., Δ]).

Impact of Decision Errors.
The impact of the decision errors in terms of the variance of the estimated phase is analyzed for two different PLLs with the same bandwidth against the modified Cramer-Rao bound (MCRB) [29].The Cramer-Rao lower bound (CRLB) can be considered as a fundamental limit on the performance that a linearized system can reach in the absence of decision errors [30].In other words, the optimum theoretical bound is achieved under the simplifying assumption that the additive noise does not affect the receiver decisions about the data symbols.Simulation results for (i) the serial DPLL (S-DPLL) and (ii) the parallel DPLL (P-DPLL) with a parallelization factor of  = 80 are shown in Figure 21(a).
At the OSNR regime of interest in the application considered in our work (i.e., PDM-16-QAM, 1/ = 32 Gs/s, BER < 10 −2 → OSNR > 18dB), it can be observed that the phase noise variance in the proposed parallel DPLL is sliglthy higher than that experienced in a serial DPLL.Nevertheless, notice that the impact of this phase variance increase on the performance in terms of bit-error-rate (BER) is practically negligible (see Figure 21(b)).Finally, it is important to highlight that catastrophic errors caused by cycle slips are avoided in the proposed carrier recovery architecture by using differential 16-QAM [11].

Conclusion
A new DPLL-based carrier recovery architecture for high speed optical coherent receivers has been introduced in this paper.The proposed parallel scheme builds upon a novel DPLL computation, which breaks the bottleneck of the feedback path.We have shown here a novel approach that leads to a simple parallel implementation.Furthermore,  it has also been demonstrated that the new parallel DPLL can provide a bandwidth and capture range similar to those achieved by the serial DPLL.
The proposed two-stage carrier recovery architecture based on a low-latency parallel DPLL and a feedforward phase estimator BPS offers a low complexity, high performance, integral solution to the frequency, and phase compensation in coherent optical systems.This solution outperforms previously proposed architectures when all optical channel impairments present in real applications, including laser phase noise, sinusoidal frequency jitter, and frequency offset, are accounted for in the modeling.

Figure 1 :
Figure 1: Simplified block diagram of the coherent receiver with equalization.

Figure 4 :
Figure 4: Block diagram of the blind phase search algorithm.

Figure 7 :
Figure 7: Simplified block diagram of the two-stage carrier recovery technique.

Figure 14 :
Figure 14: Implementation of the low latency parallel proportional loop for -QAM.

Figure 15 :Figure 16 :
Figure 15: Implementation of the low latency parallel type II DPLL.

Figure 17 :
Figure 17: Frequency response of the serial and low-latency parallel DPLL.

Figure 21 :
Figure 21: (a) Decision errors impact on the P-DPLL performance against the modified Cramer-Rao lower bound, 1/ = 32 Gs/s, using 16-QAM scheme.(b) Bit-error-rate of the DPLLs under analysis.