Maximum Likelihood Sequence Detection Receivers for Nonlinear Optical Channels

The space-timewhitenedmatched filter (ST-WMF)maximum likelihood sequence detection (MLSD) architecture has been recently proposed (Maggio et al., 2014). Its objective is reducing implementation complexity in transmissions over nonlinear dispersive channels. The ST-WMF-MLSD receiver (i) drastically reduces the number of states of the Viterbi decoder (VD) and (ii) offers a smooth trade-off between performance and complexity. In this work the ST-WMF-MLSD receiver is investigated in detail.We show that the space compression of the nonlinear channel is an instrumental property of the ST-WMF-MLSD which results in a major reduction of the implementation complexity in intensity modulation and direct detection (IM/DD) fiber optic systems. Moreover, we assess the performance of ST-WMF-MLSD in IM/DD optical systems with chromatic dispersion (CD) and polarization mode dispersion (PMD). Numerical results for a 10Gb/s, 700 km, and IM/DD fiber-optic link with 50 ps differential group delay (DGD) show that the number of states of the VD in ST-WMF-MLSD can be reduced ∼4 times compared to an oversampledMLSD. Finally, we analyze the impact of the imperfect channel estimation on the performance of the ST-WMF-MLSD. Our results show that the performance degradation caused by channel estimation inaccuracies is low and similar to that achieved by existingMLSD schemes (∼0.2 dB).


Introduction
Maximum likelihood sequence detection (MLSD) receivers for nonlinear channels have been extensively investigated in the literature (e.g., [1,2] and references therein).Their ability to achieve optimal performance in the presence of additive white Gaussian noise (AWGN) has always been of great theoretical and practical interest.The theoretical interest lies in that it provides a performance bound for any reception scheme.The practical interest stems from its ability to actually achieve optimal or nearly optimal performance in transmissions over nonlinear channels.
Much of the early work in this area has been focused on the compensation of nonlinearities in satellite communications [1][2][3][4][5].A traditional architecture for the optimal nonlinear receiver consists of a matched-filter bank (MFB) followed by a maximum likelihood sequence detector (MLSD) [1].
Owing to the correlation among the spatial noise components at the MFB outputs, an MLSD with non-Euclidean metrics must be used.The use of oversampling in combination with MLSD (OS-MLSD) has also been proposed to implement the optimal receiver in the presence of nonlinearities (see [2] and references therein).Since the complexity of both MFB and OS MLSD-based receivers grows exponentially with the channel memory, their practical application in transmissions over highly dispersive channels has been limited.Despite this fact, MLSD-based receivers are still preferred over decision feedback equalizers (DFE) in applications such as multigigabit intensity modulation/direct detection (IM/DD) fiber optic systems for the two following reasons.
(i) Performance: DFE suffers from severe performance limitations in moderate to high dispersion singlemode fiber links as a result of its inability to compensate nonlinear ISI [6,7].Although nonlinear DFE (NL-DFE) structures have also been considered in the literature (e.g., [8,9]), their performances still degrade significantly at long fiber lengths (e.g., ≥500 km).On the other hand, MLSD can operate with a constant penalty around 3 dB with respect to backto-back (B2B) at virtually any distance [10,11].
(ii) High-speed implementation: future generation of communication systems will operate at multigigabit per second data rates on highly dispersive channels [12].
In commercial applications, the digital receiver is often implemented as a monolithic chip in CMOS technology [12].Maximum clock frequency of stateof-the-art complex digital signal processors in 28 nm CMOS technology is limited to frequencies lower than ∼1 GHz.Therefore, in order to achieve multigigabit per second data rates, parallel processing techniques are required [12].Although the complexity of serial implementations of DFE grows linearly with channel memory, all presently known parallel processing implementations require that the bottleneck created by the feedback loop be broken using techniques such as the ones proposed by [13][14][15], whose complexity grows exponentially with the channel memory.Therefore, in high-speed applications, complexity of both DFE and MLSD increases with the channel memory in a similar way.
From the above it is clear that complexity reduction of MLSD is crucial for many practical applications.In IM/DD optical channels, the receiver must compensate the linear fiber dispersion as well as nonlinearities caused by lasers, optical modulators, the fiber Kerr effect, photo-detectors, and other components of the link.Chromatic dispersion (CD) and polarization-mode dispersion (PMD), in combination with the quadratic response of the photo-detector, are major factors that limit the reach and drive the complexity of optical-transmission systems at data rates ≥10 Gb/s [16,17].With traditional implementations based on oversampling techniques, an 8192-state MLSD is required to compensate 700 km of fiber at 10 Gb/s [10].This is prohibitive in current CMOS technology 1 .
A new MLSD receiver architecture for nonlinear channels has been proposed in [18].The major breakthrough of this proposal consists in a novel representation of the received signal obtained by a Gram-Schmidt-like orthogonalization of the kernels of a Volterra series expansion of the channel.This procedure yields a special form of space-time whitened matched filter (ST-WMF) [19] whose baud-rate-sampled outputs are sufficient statistics with independent noise components in both space and time.Combined with the minimum phase property of the response of each branch, the ST-WMF provides an effective way to reduce the complexity of MLSD in nonlinear channels.The ST-WMF MLSD technique offers a smooth trade-off between performance and complexity.As complexity is progressively reduced, performance degrades in a graceful manner.Numerical results in [18] demonstrate that the number of states of the VD in ST-WMF-MLSD required on a 10 Gb/s, 700 km, and IM/DD fiber-optic link without PMD can be reduced 8 times compared to an oversampled MLSD.
Further contributions on the ST-WMF-MLSD receiver and its performance on IM/DD optical links are provided in this work.First, the space channel compression achieved by the ST-WMF-MLSD is analyzed in detail.Space channel compression is important because it is the property that enables the major complexity reductions of the proposed architecture.Second, the performance evaluation of the ST-WMF-MLSD receiver is extended by addressing the combined effect of CD and PMD.Numerical results confirm that the ST-WMF-MLSD remains an attractive solution in the presence of these combined impairments.Finally, the impact of channel estimation inaccuracies on the ST-WMF-MLSD performance is assessed.Accuracy and speed of channel estimation are particularly important when tracking nonstationary channels.Nonstationarity in optical channels results from PMD and random rotations of the laser state of polarization.Our results show that the performance degradation caused by an imperfect channel estimation is low and similar to that achieved by existing MLSD schemes (∼0.2 dB).
This paper is organized as follows.The nonlinear channel model and the ST-WMF-MLSD architecture are described in Sections 2 and 3, respectively.Performance evaluation of the ST-WMF-MLSD under different channel conditions, as well as its robustness in the presence of imperfect channel knowledge, are presented and discussed in Sections 4 and 5. Finally, conclusions are drawn in Section 6.

Nonlinear Channel Model
The noisy received signal is given by where () is the noise-free signal and () is the noise component, which is assumed to be a white Gaussian process with power spectral density  0 .Component () can be expressed in terms of its Volterra-series expansion [20,21].For example, neglecting the DC term of the expansion we get where  0 () is the linear kernel,   () with  > 0 is the th second-order kernel,   is the th symbol at the input of the nonlinear channel, 1/ is the symbol rate, and  is the total number of kernels 2 .

Space Channel Compression.
The model orthogonalization procedure described in Section 2.1 can be performed in ! different ways with  being the number of kernels.For example, consider a Volterra model with  = 3.One way to obtain the orthogonalized model may be realized by selecting the kernels as shown in Figure 2. In this case, the pulse  0 () is selected as the pivoting ℎ 0 () for the first orthogonalization step.Then,  (0) 1 and  (0) 2 are obtained as the components of  1 () and  2 () orthogonal to ℎ 0 () =  0 ().For the second step,  (0) 1 () is selected as the pivoting response, so  (1)  2 () results as the component of 1 ().This unique ordering could be identified by the following set of indexes I = {0, 1, 2}.
A different result is achieved by selecting the pivoting responses as illustrated in Figure 3.In this situation,  1 () is selected as the pivoting response for the first step, and  (0) 1 () and  (0) 2 () are the components of  0 () and  2 () orthogonal to ℎ 0 () =  1 ().According to Figure 3, 2 () shown in Figure 2) is selected as the pivoting response for the second orthogonalization step.Then,  (1)  2 () will be obtained as the component of 2 ().The index set corresponding to the procedure depicted in Figure 3 is f 0 (t) selected as the pivot for 1st step: 1st orthogonalization step: 1 (t) selected as the pivot for 2nd step:  1st orthogonalization step: 2 (t) selected as the pivot for 2nd step: Note that the set of responses {ℎ  ()} 2 =0 and sequences { ()  } 2 =0 resulting from the procedures described in Figures 2 and 3 are two different expansions of the same signal ().Any one of the ! possible sets can be selected in the orthogonalization process.Space compression of the channel can be achieved concentrating the maximum of the channel energy on a reduced set of paths, independent of the distribution of their individual energy, as explained in the following discussion.
Let  be a given number of paths used to model the channel (e.g., if  = , note that any orthogonalization process I shall represent exactly the behavior of the channel).To reduce the complexity of the receiver, it is desirable for  to be as small as possible.On the other hand, to minimize the performance degradation (or the inaccuracy of the channel representation), the orthogonalization process should be carried out in such a way that the most part of the signal energy is concentrated on these  paths.From the above, for a given value of , the optimum set I op for space compression (i.e., minimal channel distortion modeling) should meet the following condition: where is the energy in the th path for the set I, while {⋅} denotes expectation.If  = , notice that any orthogonalization process I will satisfy condition (17).Otherwise, if  <  the criterion guarantees that the signal energy contained in the complete  paths shall be maximum, independent of the individual energy distribution among them.The optimum channel expansion that satisfies (17) can be found by an exhaustive search.Instead, we propose to achieve the orthogonalization process to meet the following criteria: with 19) can be achieved by selecting at each orthogonalization step (th step, e.g., with  ∈ {1, 2, . . ., }) the pivoting response (ℎ −1 ()) as the response with highest energy among the remaining responses (those orthogonal to ∩ −2 =0 H  for  = , ∀).That is, at the first step we select the pivoting response ℎ 0 () as the Volterra kernel with the highest energy in {  ()} −1 =0 .At the second step we select the pivoting response ℎ 1 () as the response with highest energy in the set of responses orthogonal to H 0 (i.e.,  (0)  (), with  = 1, 2, . . .,  − 1).This procedure is repeated in the same way at each step, where the best pivoting response at each stage is selected by an exhaustive search among all the pivot candidates at that stage.The minimization of the energy of the orthogonal responses (see ( 3), (4), and associated discussion) ensures that condition (19) is met.As we shall show later, condition (19) gives rise to space channel compression, which can be exploited to reduce complexity.

Example.
As an example, we present results for a 10 Gb/s, 700 km, and IM/DD fiber-optic link (details about the simulated system are given in Section 4.1).Without loss of generality, we assume that   ∈ {±1}.Let S  be the normalized cumulative path energy of the traditional Volterra-series expansion (see ( 2)) given by where   is the total energy of the signal component (): On the other hand, the normalized cumulative path energy of the orthogonalized Volterra-series expansion is defined as where E () I is given by ( 18) with I obtained according to the criteria defined by (19).In all cases, note that S  ≤ 1 ∀, with S −1 = 1.
Figure 4 shows the normalized cumulative path energy S  for the traditional (20) and orthogonalized (22) Volterraseries representation with  = 8.For the orthogonal representation, the pivoting responses are selected according to the criteria in (19).We observe that most energy of the nonlinear signal is concentrated on the first two paths for the orthogonalized Volterra-series expansion (i.e., E (0) I + E (1)   I represents ∼99.4% of the total signal energy).The traditional Volterra model requires five paths to capture a similar level of nonlinear signal energy.We highlight that the increase of S  from −0.33 dB to −0.12 dB at  = 0 is due to the factor {| (0)  | 2 } ≥ 1 in (18).Notice that this factor is a measure of the correlation between all nonlinear kernels   () with  > 0 and the linear kernel  0 () (e.g., {| (0)  | 2 } = 1 () if  0 () and   () are orthogonal (same) pulses ∀).Thus, we conclude the following for the channel analyzed in the example.
(i) Most energy of the nonlinear kernels with the traditional Volterra model is contained in their projection onto the linear signal space spanned by the set {ℎ 0 ( − )} = { 0 ( − )}.
(ii) A channel model with 2 paths captures practically all the signal energy.
(iii) Since E (0) I is ∼97.5% of the total signal energy (see Figure 4).Therefore, as we shall show later, a receiver that considers the first channel path should be able to achieve a good performance.

MLSD Receiver for Nonlinear Channels with AWGN
The MLSD receiver chooses the sequence {  } that minimizes the metric Using ( 13) and ( 16) in ( 23), we get where R denotes the real part, while From (24) notice that the computation of the cost function  for every candidate symbol sequence {  } can be achieved from the samples at the outputs of the matched filters given by (25).
From (1), equation ( 25) can be rewritten as where From (15) note that where Since the noise is assumed Gaussian, from (28) we conclude that the noise components at the output of the proposed MFB are spatially independent.Therefore, the bank of matched filters {ℎ *  (−)} −1 =0 followed by a baud rate sampler is called the space-whitened matched filter (S-WMF) bank.

Space-Time Whitened Matched Filter MLSD Receiver.
From the above, we observed that the matched filter bank derived from the new expansion of the nonlinear channel gives rise to spatially independent noise components.In order to simplify the implementation of the sequence detector, a space and time whitening filter bank is derived.

𝑘
]  (see (31)).From ( 32), (36), and (39) it can be shown that MLSD reduces to minimize Let z be the -dimensional vector with the noise components of the baud rate samples r .From ( 15) and ( 30), the power spectral density of z results in S z =  0 I, where I is the identity matrix.Therefore, the minimization of (40) can be easily implemented by using a Viterbi detector with multidimensional Euclidean branch metrics.Figure 5 shows a block diagram of the ST-WMF-MLSD receiver with dimension  (i.e.,  is the number of filters in the bank).If  = , all the paths of the nonlinear channel are used by the receiver.

Complexity Considerations.
The ST-WMF-MLSD consists of a filter bank followed by a Viterbi decoder (VD) and a channel estimation stage.Although the computational load of the latter may be important, its complexity is not critical since it can be implemented at a low frequency rate 3 .On the other hand, the dimension of the front-end () is expected to be very low in general as a result of the spatial energy compression achieved by the ST-WMF-MLSD.Then,  ≤ 2 is a reasonably good estimation for applications in IM/DD optical systems and the computational complexity of the filter bank is reduced to the one of 2 linear filters.Therefore, the implementation complexity of ST-WMF-MLSD is dominated by the VD.Practical aspects related to highspeed implementations of VD have been widely investigated in the past literature (e.g., see [12,22] for more details).The computational load (usually measured in number of multiplications and comparisons) and storage requirements of the VD depend directly on the number of states.Therefore, we adopt the number of states of the VD as the measure of complexity in order to compare the MLSD architectures investigated in this work.

Performance Evaluation in IM/DD Optical Systems
Next we analyze the proposed ST-WMF-MLSD receiver in transmissions over IM/DD fiber-optic systems with on-off keying (OOK) modulation.We focus on two key aspects of ST-WMF-MLSD: its performance (in comparison with current solutions based on OS-MLSD), and its ability to reduce complexity (e.g., number of states of VD).Complexity reduction is possible owing to (i) the minimum-phase property of the equivalent channel response provided by ST-WMF and (ii) condition (19).The latter gives rise to space compression, which reduces the ST-WMF dimension, .This is achieved by using the most significant  paths of the nonlinear channel.Figure 6 depicts the optical system under consideration.The transmitter modulates the intensity of the transmitted signal using NRZ-OOK modulation.The standard single mode fiber (SMF) introduces CD and PMD, as well as attenuation.Optical amplifiers are deployed periodically along the fiber to compensate the attenuation, also introducing amplified spontaneous emission (ASE) noise in the signal.ASE noise is modeled as AWGN in the optical domain.The received optical signal is filtered by an optical filter and then converted to a current with a PIN diode or avalanche photodetector.The resulting photocurrent is filtered by an electrical filter.The noise component after the electrical filtering is non-Gaussian and signal-dependent [16].Therefore, the electrical signal is first processed by a memoryless nonlinear transformation.It has been found that after a square root transformation, the noise can be assumed Gaussian and signal-independent [23,24].Furthermore, channel nonlinearities can also be reduced by using the square root transformation [25], which improves the space compression used to reduce the receiver dimension (i.e., most of the channel energy is concentrated on the linear kernel).The split-step Fourier method [26] is used to compute the propagation of optical signals through the fiber.Oversampled linear and nonlinear kernels are extracted from the electrical signal after the square root transformation.The oversampling factor  = /  depends on various parameters of the communication system (e.g., optical power, fiber length, etc.).In our case, we have found that an oversampling factor of  = 16 is good enough to accurately model the system; that is, no improvement was appreciated by increasing the sampling rate for the entire set of conditions considered for this work.Then, we compute ℎ  (  ) and  (,)  according to (17), and the symbol rate channel response matrix M  can be easily obtained from (38).Since the noise after the square root transformation is approximately Gaussian and signal-independent [27], the theory proposed in [28] is used to evaluate the bit error probability 4 .All the kernels of the nonlinear channel are used to compute the error probability, independent of the receiver dimension, .Data rate is 1/ = 10 Gb/s and the transmitted pulse shape has an unchirped Gaussian envelope  − 2 /2 2 0 with  0 = 36ps.We use a Lorentzian optical filter and a fourth-pole Butterworth electrical filter with bandwidths of 15 and 10 GHz, respectively.The fiber dispersion is  = 17 ps/(nm-km).

IM/DD Systems with CD and PMD.
Typically, the performance of equalization stages in fiber optic transmissions is evaluated by using the optical signal-to-noise ratio (OSNR) required to achieve a given BER (e.g., see [10,11]).The target BER is around 10 −3 , which corresponds to the value required at the input of the forward error correction (FEC) code to achieve the error rate expected by the application (e.g., ∼ 10 −15 ) [29].  Figure 7 depicts the OSNR penalty with respect to a B2B system at a bit-error rate (BER) of 10 −3 versus the number of states of the VD for  = 700 km.We present results for ST-WMF-MLSD with  = 1 (i.e., only one filter in the bank).For OS-MLSD with 2 samples/bit, the reduction of states is achieved by truncation and optimization of the sampling phase in order to minimize BER (8 uniformly distributed phases in the interval /2 were tested).From Figure 7, we verify that the number of states of the VD at a penalty of 4.6 dB can be reduced from 2048 to 256 with ST-WMF-MLSD and  = 1.Notice that this performance is achieved by using a VD with one sample per bit [11].Furthermore, we emphasize that these benefits greatly outperform the extra complexity required by the linear filter and the channel estimator (implementation details are omitted here due to space limitations).From Figure 7 we can also see that the performance degradation achieved with  = 1 with respect to the ideal MLSD is only ∼0.3 dB.This result can be understood from the analysis of Section 2.2, where it has been observed that 97.5% of the total signal energy is contained in the first channel path.
The performance of ST-WMF-MLSD in the presence of CD and first-order PMD (i.e., differential group delay or DGD) is investigated.Figure 8 depicts the OSNR penalty versus the number of states of the VD for a 700 km fiber link with two values of DGD: 25 and 50 ps.We evaluate the OS-MLSD with 2 samples/bit and ST-WMF-MLSD with  = 2.As expected, both receivers tend to the same performance as the number of states of the VD increases.We also note that the benefits of the ST-WMF-MLSD reduce when the DGD increases 5 .Nevertheless, from Figure 8 notice that the number of states of the VD at a penalty of 6 dB can be reduced 8 and 4 times with ST-WMF-MLSD at 25 and 50 ps DGD, respectively.These results show that the ST-WMF-MLSD is still an attractive solution to reduce complexity in transmissions over fiber-optic channels in the presence of PMD.

Impact of the Imperfect Channel Knowledge
Performance evaluation of the ST-WMF-MLSD has been achieved by assuming a perfect knowledge of the channel.In the following, we analyze the impact of the channel estimation inaccuracy on the performance of the ST-WMF-MLSD architecture in transmissions over IM/DD optical systems.This study will show that the performance degradation in ST-WMF-MLSD receivers, with  = 1, caused by an imperfect channel estimation, is low (∼0.2 dB) and similar to that achieved by oversampled OS-MLSD receivers in the  = 700 km fiber link used in the example of Figure 7.
The estimation of the oversampled linear and nonlinear kernels is required to implement both MLSD-based receivers.Let  = /  be the oversampling factor.Based on the polyphase filter representation of the oversampled channel response (see Figure 9), the received samples can be expressed as where  ()  = ( +   ), From (43), a simple estimator of the oversampled linear and nonlinear kernels can be implemented with an averaging filter as follows: where   is the length of the averaging filter,  0 is an arbitrary time index, and â is the detected symbol 6 .The accuracy of the channel estimation given by (44) depends on the precision of the decisions â , the length of the averaging filter   , and the channel noise power.We consider that decisions provided by the forward error correction (FEC) decoder are available; therefore the effect of decision errors can be n s (1)   n s (R−1) n 1/T 1/T s s (0) n , s (1)  n , . . ., s (R−1) neglected (i.e., â =   ).We highlight that this assumption is still valid if pre-FEC decisions are used as a result of the low bit error rates experienced in this link (e.g., ∼10 −3 ).From the above, we conclude that the goodness of the estimates (44) shall mainly depend on the filter length   and the channel noise power.The precision of (44) improves as the value of   increases.On the other hand, the maximum value of   shall be imposed by the speed of temporal variations of the fiber optic channel.As a result of its dependence on stress and vibrations, as well as on random changes in the state of polarization of the laser, PMD is nonstationary.Fluctuations with a time scale of a hundreds of microseconds have been considered in previous works (e.g., [30]).Therefore, the response time of channel estimation algorithms for PMD mitigation must be less than 1 ms (in practice a response time less than 100 s is required [27]).This imposes, for example, that the bandwidth of the averaging filter (∼1/(4  )) should be ≥20 kHz in order to efficiently track the channel variation.
The received signal seen by an MLSD receiver in the presence of imperfect knowledge of the channel dispersion can be expressed as where  ()  = ( +   ) and ẑ()  = ŝ( +   ) − ( +   ) is the estimation error component, while ŝ( +   ) is the synthesized signal obtained from (44).
Figure 10(a) shows the SNR penalty caused by the imperfect channel estimation as a function of   obtained from computer simulations.We consider 1/ = 10 GHz,  = 16, and the fiber link with  = 700 km, as used in Figure 7.The SNR penalty caused by an imperfect channel estimation is computed as where  2  is the channel noise power required to achieve a BER = 10 −3 with an unconstrained complexity OS-MLSD receiver (i.e., the VD uses as many states as required) and  2 ẑ is the variance of the estimation error component 7 .Notice that the penalty is ∼0.4 dB for   = 10 5 .This value of   represents a BW of ∼1/(4  ) = 25 kHz (see Figure 10(b)).Assuming that the estimation error is white Gaussian noise with power  2 ẑ , from Figure 10(a) we infer that the SNR penalty caused by an imperfect channel knowledge in OS-MLSD receivers with   = 10 5 should be ≲ΔSNR ∼ 0.4 dB.
Figure 11 depicts the OSNR penalty at BER = 10 −3 versus the number of states of the Viterbi detector (VD) with  = 700 km.We present results with perfect knowledge of the fiber dispersion (denoted as   = ∞), and for imperfect channel estimation with   = 10 5 .We see that the mean penalty caused by inaccuracies of channel estimation agrees with that expected from Figure 10 with   = 10 5 (i.e., ∼0.14 dB < ΔSNR ∼ 0.4 dB).Furthermore, we observe that the impact of imperfect channel knowledge on the  performance is similar in both MLSD receivers (i.e., ∼0.14 dB and ∼0.18 dB for OS and ST-WMF, resp.).This result can be understood from the fact that the filters ℎ * 0 (−) and  (0)  are computed from the samples of the estimated linear kernel f0 ().Taking into account that the energy of the linear component is significantly higher than the nonlinear kernels [25] (see Section 2.2), an accurate estimation of ℎ 0 () can be achieved for the channel considered.Then, we infer that the energy loss of the signal component at the output of m0 () will be small.Therefore, and based on (45), notice that the performance of OS-MLSD and ST-WMF-MLSD with  = 1 should degrade in a similar way.

Conclusions
New results on the recently proposed ST-WMF-MLSD nonlinear receiver have been presented in this paper.These results are the following: (1) the space compression property of the factorization introduced in [18] has been analyzed in detail; (2) the performance of the ST-WMF-MLSD in IM/DD fiber optic systems in the combined presence of CD and PMD has been evaluated, and (3) it has been shown that the performance degradation caused by an imperfect channel estimation and tracking is low and similar to that achieved by existing MLSD schemes.These features make the ST-WMF-MLSD a good architecture for receivers for long distance IM/DD fiber-optic links.

Endnotes
1. Some commercial implementations of VD at 10 Gb/s with 4, 8, and 16 states are available in 90 nm CMOS technology [31][32][33].It should be possible to implement VD's with 64 or 128 states by using the available 28 nm technology.
2. The Volterra model can be easily extended to include higher order terms (e.g., third order kernels   (), with input sequence    −  − and  > 0, could be included in the expansion).To keep the notation simple, only second order kernels are used in the derivations throughout this paper.However, numerical results incorporate nonlinear kernels of order higher than two.
3. As a result of its dependence on stress and vibrations, as well as on random changes in the state of polarization of the laser, PMD is nonstationary.Fluctuations with a time scale of a few milliseconds have been observed in PMD measurements [27].Thus, the response time of the channel estimation schemes for PMD mitigation must be less than 1 ms.In practice, a response time less than 100 s is required.Therefore, the channel estimation stage could be easily implemented by using current technology.
4. We did not run Monte Carlo simulations of the communication systems.The numerical results are semianalytic, in the sense that they are theoretic estimations assisted by numerical simulations.The probability of error is estimated according to the theory proposed in [28]; that is, the first terms of the union bound (those with the lowest distance) are used to estimate the probability of error.Numerical simulations are used to find the distances of the error-events, their Hamming weights, and a priori probabilities.These results are then introduced in the formulas for the probability of error estimation as reported in [28]. 5.In order to explain this result, consider a fiber channel with DGD = 100 ps only (i.e., no CD).Since DGD ∼ , the IM/DD system approximately behaves as a duobinary channel; therefore the time compression achieved by the ST-WMF-MLSD will be negligible.6.In order to improve the channel tracking capability, an efficient implementation of the channel estimator with the well-known LMS algorithm may be preferred [12].Nevertheless, the objective is to shed light on the impact of imperfect knowledge of the fiber dispersion on the orthogonalized Volterra model.Therefore, practical aspects of the receiver architecture (e.g., buffers, number of taps of the WF, finite precision arithmetic effects, etc.) are not considered.7. The mean penalty of 20 runs with different seeds of the random number generator is presented.

Figure 1 :
Figure 1: MISO model of the nonlinear channel: (a) based on a Volterra representation; (b) based on the spatially compressed orthogonal representation.

f 1 (
t) selected as the pivot for 1st step:

Figure 4 :
Figure 4: Normalized cumulative path energy of the nonlinear IM/DD fiber-optic link with  = 700 km and 1/ = 10 Gb/s.

Figure 7 :
Figure 7: OSNR penalty with respect to B2B at BER = 10 −3 versus number of states of the VD for  = 700 km and DGD = 0 ps.

Figure 9 :
Figure 9: Example of the polyphase filter representation of an oversampled linear channel with  = /  .

Figure 10 :
Figure 10: SNR penalty caused by imperfect channel estimation versus length of the averaging filter (a) and the bandwidth of the averaging filter (b).

Figure 11 :
Figure 11: OSNR penalty at BER = 10 −3 versus number of states of the VD with  = 700 km.