Reflection Reduction on DDR3 High-Speed Bus by Improved PSO

The signal integrity of the circuit, as one of the important design issues in high-speed digital system, is usually seriously affected by the signal reflection due to impedance mismatch in the DDR3 bus. In this paper, a novel optimization method is proposed to optimize impedance mismatch and reduce the signal refection. Specifically, by applying the via parasitic, an equivalent model of DDR3 high-speed signal transmission, which bases on the match between the on-die-termination (ODT) value of DDR3 and the characteristic impedance of the transmission line, is established. Additionally, an improved particle swarm optimization algorithm with adaptive perturbation is presented to solve the impedance mismatch problem (IPSO-IMp) based on the above model. The algorithm dynamically judges particles' state and introduces perturbation strategy for local aggregation, from which the local optimum is avoided and the ability of optimization-searching is activated. IPSO-IMp achieves higher accuracy than the standard algorithm, and the speed increases nearly 33% as well. Finally, the simulation results verify that the solution obviously decreases the signal reflection, with the signal transmission quality increasing by 1.3 dB compared with the existing method.


Introduction
As the most popular memory, DDR3 illustrates faster speed, higher data rate, and lower operating voltages than DDR2, with the data rate up to 1.6 Gbps or even higher at the operating voltage of 1.5 V. However, DDR3 memory requires more in its interconnect interface design while bringing higher data transmission rate. During the high-speed signal transmission, sudden changes of transient impedance will lead to discontinuity of signal line impedance, which results in signal reflections and thus substantial overshoot, undershoot, and ringing. Therefore, researching the signal reflection of DDR3 bus has become a key component in the design of high-speed digital system.
There has been considerable research on optimization of DDR3 bus signal quality. Jagdale et al. [1] researched on the signal reflection in high-speed PCB design and discussed several main factors leading to discontinuous signal. The conducted experiments concluded that the ODT and characteristic impedance of transmission line play an important role in reducing signal reflection. Considering the influence of power consumption, timing, and voltage amplitude, Mintarno and Ji [2] proved that in high-speed memory bus design, ODT optimization can substantially reduce power consumption while drastically improve the signal integrity. For the multiple DDR3 modules extension in computer motherboard, Lin et al. [3,4] applied particle swarm optimization (PSO) algorithm to motherboard routing and selection of ODT value, taking into account both impedance discontinuity caused by multiple ports and crosstalk caused by adjacent transmission lines. The experiments show that transmission quality of signal is enhanced while signal reflection is reduced.
The research above shows that in DDR3 bus design, the discontinuity of characteristic impedance is the primary factor causing signal reflection, which indicates that an appropriate ODT value could effectively decrease impedance discontinuity of signal transmission line. Jagdale et al. [1] and Mintarno and Ji [2] verified the effort of ODT value, which is of important significance in this field, yet the specific optimization strategies are not explicitly given. Thus, the key point in high-speed bus design of DDR3 is how to 2 The Scientific World Journal accurately evaluate the impedance matching between ODT value and routing. Through theoretical derivation, Lin et al. [3,4] proposed an approach to optimize ODT and routing parameters using PSO, which obtains ideal results. Nevertheless, such researches usually attach importance to the effects of transmission line itself on the signal but neglect discontinuity problems caused by the via. Via is a typical discontinuity for high-speed signal transmission in printed circuit boards [5,6]. The via basically does not affect signal transmission in low frequency. Nevertheless, as the frequency goes up to GHz, the impact of the via on signal integrity must be considered [7].
Multiple factors being considered are bound to bring greater difficulties in the circuit optimization. Particle swarm optimization (PSO) [8], as a random search algorithm based on group cooperation, could be used to solve multidimensional complex optimization problems in various fields [9]. Each particle in the swarm represents a possible solution to the optimization problem. During the calculation of iterations, particle constantly adjusts its position according to local optimal solution obtained from its motion and global optimal solution obtained from group interaction and gradually closes to the optimal solution [10]. The PSO algorithm introduced in [4] provided a feasible scheme to solve the optimization problem of multiple parameters. However, as an evolutionary optimization algorithm, the standard PSO has two operations, exploration and exploitation, which makes it easy to involve into local optimum with the iterations increasing. And consequently, the convergence speed is reduced.
Li et al. [11] proposed an enhanced PSO algorithm specifically for electromagnetic field. To increase the convergence speed, the PSO algorithm is improved in many aspects, including the perturbation strategy for global optimal value to avoid the local optimum. Yet, the solution reduces the initial efficiency to some extent. Melin et al. [12] proposed an improvement to the convergence and diversity of the swarm in PSO using fuzzy logic, which improves the performance of PSO. Maldonado et al. [13] described the design of a type-2 average fuzzy system on FPGA and its optimization using particle swarm optimization for the speed regulation of a DC motor, which implements the PSO optimization of interval tye-2 fuzzy controllers for FPGA applications. By analyzing the movement behavior of the particles, Ying-Qiu et al. [14] utilized a method, which dynamically adjusts the boundary of search space, to trace the particle position in order to prevent particles from gathering locally. This strategy relieves the premature convergence while maintaining initial efficiency, improving the accuracy of algorithm. But the algorithm adopts a random processing mechanism to handle the stagnating particles in which possibility brings some uncertain factors to optimization process and it is easy to introduce invalid operations.
In this paper, how to improve the efficiency while maintaining initial high performance is the key point. The major contributions of this paper are as follows: (1) an improved optimization strategy, which combines the characteristics of embedded hardware, is proposed in this paper for design Figure 1: Transmission line equivalent model.

From line 1
To line 2 (2) we propose an improved particle swarm optimization algorithm with adaptive perturbation, which optimizes routing length of DDR3 signal, characteristic impedance, via parasitic impedance, and the impact of ODT on high-frequency signal quality; (3) the experiments demonstrate that the strategy proposed could improve effectively impedance continuity of transmission line and reduce reflection effects of high-speed signals. The rest of this paper is organized as follows. In Section 2, the transmission line and via model are described. In Section 3, firstly, by applying via parasitic, an equivalent model of DDR3 high-speed signal transmission is presented. Then, an improved particle swarm optimization algorithm with adaptive perturbation is presented to solve the impedance mismatch problem (IPSO-IMp) based on the above model. The experimentation and simulation are shown in Section 4, and the conclusions are finally summarized in Section 5.

Transmission Line Model.
In high frequency, interconnection lines exhibit characteristics of transmission lines. Thus, distributed model of lossy transmission line cascaded by RLCs can be used to approximate interconnection lines, as shown in Figure 1, where , , and , respectively, represent resistance, inductance, and capacitance of transmission line per unit length and and denote the driver impedance and load impedance, respectively.
The driver and the load do not have signal reflection when = = 0 , which is difficult to achieve in practical applications. Actually, there exists reflected signal in interconnection line, which leads to overshoot and undershoot on output voltage. The overshoot decreases the stability of the circuit, while undershoot may lead to slight pulse interference, aggravating energy dynamic distribution and even false triggering, which may result in serious logical error and timing error.

Via Model.
Via is the conductor connecting the lines of different signal layers in multilayer PCB. Studies [5,15,16] have shown that both via diameter and pad size affect impedance continuity.    via , respectively, represent the parasitic resistance, parasitic inductance, and parasitic capacitance. Their values mainly depend on via radius via and via length ℎ via , with the specific calculation formula given in [17].
Since parasitic capacitance, parasitic inductance, and parasitic resistance exist in via, via leads to impedance discontinuities on high-frequency signal transmission line, which results in signal reflection. Figure 3 illustrates via effects on signal reflection.
It is easily seen that via effects on signal reflection are even obvious when the frequency exceeds 1 GHz. The line between DSP processor and DDR3 pins inevitably has at least two vias, and the data rate can reach 1.6 Gb/s or even higher; therefore, the effect of via's parasitic resistance, parasitic inductance, and parasitic capacitance on high frequency signal cannot be ignored. And the discontinuity occurred leads to issue of signal reflection, which should be considered in design of DDR3 high-speed bus interconnection [17].

High-Speed Bus Structure and Model.
In this section, transmission coefficient and reflection coefficient for DDR3 bus transmission line are obtained through parameter . For general design, we suppose that transmission line has two vias and Figure 4 shows high-speed interconnect structure between DSP processor and DDR3. For simplicity, the parameters are denoted as follows: (i) the two vias connecting the three transmission lines are presented by Via 1 and Via 2 ; (ii) the length of the three transmission lines is, respectively, denoted by 1 , 2 , and 3 ; (iii) the characteristic impedance of the three transmission lines is, respectively, denoted by 1 , 2 , and 3 ; (iv) the internal impedance of the source and load is presented by and ; (v) the power of source end and load end is, respectively, denoted by and .
According to the electromagnetic theory, each transmission line is subject to interference from surrounding transmission lines, especially the adjacent lines. Therefore, we take three adjacent transmission lines as study object in this paper in order to approximate the actual circuit, provided that (1) the length of a transmission line is and (2) resistance, inductance, capacitance, mutual inductance, and mutual capacitance for per unit length is , , , , and , respectively. According to transmission lines equivalent model given in Figure 2, the driver and victim transmission model could approximate to the model shown in Figure 5, where and represent driver lines and represents victim line, respectively.
The first RLC unit of victim line is analyzed first. According to the basic circuit theorem, we obtain the following: (3) and (4) the 0 and 0 could be defined as When = ((( + ) 2 − 2( ) 2 )/( + ( − 2 )))Δ , = [ + 2 (1 − )] Δ , according to (5), we get 0 and 0 as follows: 4 The Scientific World Journal The voltage and current of victim lines could be expressed as Formula (7) is an ABCD matrix revealing the relationship between input and output on the first RLC unit of victim lines. A victim line with length could be seen as cascade of RLC units, with the relationship between input and output demonstrated as follows: When tends to positive infinity, parameters of matrix ABCD for the whole line are obtained from (8). We have where 0 and 0 are, respectively, characteristic impedance and propagation constant of transmission lines when mutual interaction is considered. And they are expressed as follows: Given that (1) the via impact on high-speed bus is considered, (2) the impact among vias is ignored, and (3) Figure 4 is replaced by via equivalent model in Figure 2 and transmission line equivalent model in Figure 1, the relationship between input and output for the entire highspeed line could be expressed as follows: The parameter could be obtained by matrix ABCD. Hence, when writing DDR3, we can get from formula (11) the reflection coefficient and transmission coefficient for the whole transmission line: The Scientific World Journal 5 Similarly, reflection coefficient 12 and transmission coefficient 22 for the whole transmission line could be gained when reading DDR3.
Equation (12) showed that the impedance matching optimization for DDR3 high-speed bus involves many parameters. If the PSO algorithm is used, we need to establish constraints on these parameters, namely, constructing fitness function.

Fitness Function Construction.
Due to the lossy transmission, signal of sending end cannot be sent to the receiving end completely. Moreover, due to the coexistence of transmission signal and reflection signal, improving quality of signal received needs to reduce the reflection signal and enhance the transmission signal. And thus the fitness function for DDR3 bus should be defined according to both the transmission signal and the reflection signal. Provided that the transmission line is lossless when writing to DDR3, we can obtain the following according to conservation of energy: Equation (13) indicates that signal transmitted to the receiving end achieves maximum energy when the reflection coefficient 11 tends to 0 and 21 tends to 1. Thus, fitness function could be constructed as Similarly, when reading from DDR3, the fitness function could be obtained as follows: We get from (14) and (15) the fitness function for both read and write states as follows: where is the signal sampling frequency. When = 2.0 GHz and (16) gains the minimum value, we can get minimum reflection signal and implement the impedance matching at 2.0 GHz. To achieve impedance matching for bandwidth sequence from 0 to 2.0 GHz, the fitness function [4] could be acquired from (16) as follows: where denotes frequency point chosen from 0 to 2.0 GHz. On the premise of bandwidth impedance matching, this paper takes = 80 and 0 = 25 MHz to guarantee the accuracy and operation efficiency. We can get the following: Equation (18) is the fitness function for DDR3 signal optimization in bandwidth 0 ∼ 2.0GHz. The function contains 12 parameters, as listed in Table 1.
It should be noted that values of 1 , 2 , and 3 depend on the size of PCB. In this paper, the maximum value is 2000 mil and the minimum value is 0 mil. In order to facilitate the initialization of the optimization, the range of 1 , 2 , 3 , via , and ℎ via is defined in Table 1. The specifications for DDR3 SDRAM were specified by the Joint Electron Device Engineering Council (JEDEC). The values of , , , and follow the JEDEC standard. Equation (18) could gain a minimum value by these parameters' optimization, which enables impedance matching for DDR3 bus transmission in bandwidth 0 ∼ 2.0 GHz. In this case, the reflection signal is minimum and the transmission signal is maximum; namely, the optimal solution is obtained.
The above discussion tells us that the problem is transformed into the optimization of nonlinear function with multiple parameters. Exactly speaking, 12 parameters of them need to be optimized in this problem. Due to the premature convergence of standard PSO algorithm which is easy to fall into local optimal, we present an improved particle swarm optimization algorithm for solving impedance mismatch problem (IPSO-IMp).

IPSO for Impedance Mismatch Problem (IPSO-IMp).
In this section, the IPSO-IMp is proposed and described. By analyzing the particle movement during PSO algorithm optimization, it can be easily seen that the main reason leading to particle local optimum is that global optimal particle is too dependent on the individual optimal solution [14]. When the individual particles gather in a relatively concentrated area, the impact on the global optimal solution will become slight. Actually, the leader should be responsible for the swarm movement except for considering the individual influence. Based on this, the algorithm proposed introduces disturbance mechanism mentioned in [11] to handle particle swarm Perturbation. Yet, disturbance is carried out through the entire evolutionary process, which not only increases the computation complexity but also affects the initial evolution speed. Therefore, this paper proposes an adaptive method to judge dynamically whether particles are in local aggregation state. On basis of local aggregation, perturbation strategy is introduced to optimize the intervention, so that the local optimum is avoided and the ability of optimization-searching is activated. Furthermore, the convergence speed and accuracy are improved. The improved method we proposed is as follows.

Particle Representation
Position. In the IPSO-IMp, the position of a particle is represented by where is defined as 12-dimensional space that is composed of the parameters in Table 1; The Scientific World Journal parameters and is the number of the parameter. In order to ensure the local searching behaviors and the population diversity, the initial values are randomly generated based on the range in Table 1. In addition, because four parameters of 9 ≤ ≤ 12 are discrete, each particle is defined as By the definitions of the particle position, each particle represents a feasible solution of the impedance mismatch problem in the IPSO-IMp.
Velocity. The velocity of particles is defined as the change of particle position. The velocity vector of each particle is represented by where V is defined as the change of (1 ≤ ≤ 12).

Velocity
Updating. The range of each parameter is different, where 1 ≤ ≤ 8 are successive, while the four parameters of 9 ≤ ≤ 12 are discrete. Thus, the velocity updating is according to the following: if ( best < < best ) ‖ ( best < < best ) , 9 ≤ ≤ 12. (22) In (22), denotes iteration ⋅ V and V +1 represents the velocity of particle at iteration and + 1, respectively. 1 and 2 are the weight of local optimal and global optimal. 1 and 2 are the random numbers. best denotes the local optimal found by particle until iteration ⋅ best denotes the global optimal by the neighbors of particle ⋅ represents the position of particle at iteration .

Position Updating.
According to velocity updating, the position updating in this paper adopts Algorithm 1, where denotes iteration .

Avoid Local
Optimal. According to dynamic behavior of the particles, the optimal solution will tend to a specific value after several iterations, which indicates that particle swarm has been or will be trapped into local optimal state according to the existing trajectory. The global optimal value, which reflects the current state of the particle swarm, could be adopted to conduct adaptive perturbation. Since the fitness function proposed aims to find the minimum value after successive iterations, there is where best, is the global optimum in iteration . If best, +1 = best,avg , the algorithm is probably stagnant, which means particles cannot escape from the local optimum. And thus particle perturbation mechanism is required for intervention.
In summary, the algorithm IPSO-IMp includes the following steps.
Step 1. Define individual particles of 12-dimensional space by taking the 12 parameters of fitness function as elements, and particle swarm size is set to .
Step 2. Initialize population particles randomly according to the constraint range of 12 parameters.
Step 3. Calculate adaptive value for each particle according to the fitness function established by (18) and record the individual and global optimal value.
Algorithm 1: Pseudocode for the position update procedure.
Step 4. Update velocity and position for each particle in accordance with (22) and Algorithm 1, aiming to seek the minimum value for the fitness function.
Step 5. Calculate adaptive value for each particle again according to (18) and update the global optimal value.
Step 6. According to the global optimal value, judge the particle aggregation state by (23). If best, +1 = best,avg is satisfied, particles are trapped into local optimum and then the perturbation mechanism [11] is adopted to stimulate particle energy dynamically. If best, +1 ̸ = best,avg is satisfied, the individual optimum is update.
Step 7. Judge whether the iterative precision or the number of iterations are reached. If not, turn to Step 4. If so, the optimal solution is obtained and the optimization process is finished.

Experimentation and Simulation
The simulation experiment is conducted on multicore DSP processor and DDR3 interconnect bus, taking IBIS model of TI TMS320C6678 and Samsung K4B4G1646B as simulation models, respectively. Considering typical routing of PCB and the signal quality, the transmission line connecting DSP and DDR3 is abstracted into 2 vias and 3 sections of transmission line. The length of these transmission lines is 1 , 2 , and 3 , respectively, and their impedance is 1 , 2 , and 3 , respectively.
To be consistent with the actual design, the specific parameters for simulation are listed in Table 2.

Optimization Result.
To illustrate the performance of IPSO-IMp algorithm, the standard PSO, IPSO [11], and IPSO-IMp are, respectively, used to optimize the 12 parameters of the fitness function established in this paper. The comparison is shown in Table 3. The IPSO-IMp and IPSO reveal obvious advantage in both speed and accuracy compared to standard PSO. The standard PSO and IPSO-IMp roughly perform equal in early stage of the optimization. Yet, standard PSO is in a state of local aggregation during later stage. In that case, particle movement is restricted in a small local area, so the optimal regional value cannot be searched. Because of effective perturbation, the IPSO algorithm makes the particles still maintain a certain activity in later stage. Performance comparison chart of the three optimization algorithms is shown in Figure 6.
IPSO-IMp algorithm guarantees the efficiency of early optimization. In later optimization, once the particles are found in local optimal aggregation, the movement perturbation is launched, from which particles are activated, founding the better value around and obtaining global optimum. Although the optimization results of IPSO and IPSO-IMp   are approximately equal, the iteration speed of IPSO-IMp is faster nearly 12% than IPSO. That is because the latter introduces perturbation too early, resulting in increasing of the number of iterations. Compared to the standard PSO algorithm IPSO-IMp improves the accuracy significantly and the speed increases nearly 33%.  In addition, the same circuit is optimized using the approach given in [4] in order to illustrate the advantages of our method. Similarly, the routing between DSP and DDR3 is divided into 3 sections. The optimization results are shown in Table 4.
When the radius and the length of via are less than or equal to 6 mil and 1.6 mm, respectively, the influence of continuity in signal transmission changes slightly. For the actual design of 12 layers when considering the plate making technology, the via radius via is assigned to 6 mil and the via length ℎ via is assigned to 1.6 mm; namely, the thickness of board is 1.6 mm.

Validations and Analysis.
ADS simulation software is used to verify validity of the optimized parameters from two aspects, frequency domain and time domain. Reading and writing circuits need to be designed, respectively, in our experiments, with parameters configured according to Table 4. Values of routing parameters and via parameters in both circles are identical, while the source resistance and the The Scientific World Journal  load resistance are assigned to their best values when reading and writing.

Frequency Domain Analysis.
During the transmission of high-speed signal, transmission constant should be close to 0 dB if impedance of transmission path is completely matched. Yet, due to the influence of many factors such as routing density and via parasitic effect, impedance discontinuity in transmission path emerges, leading to signal reflection and thus signal loss. Simulation results in frequency domain are shown in Figure 7. The preoptimized design (before OPT) does not consider various factors causing impedance discontinuity, which mainly comes from improper selection of values in transmission lines, vias parameters, and ODT; thus, the signal at the receiving end suffers from serious loss. According to simulation result after optimization, we observe that our method (proposed) obviously reduces signal loss compared with the strategy presented in [4]. Our method takes the parasitic effect of via as one of optimization parameters, while [4] considers transmission line only without taking via into account. Hence, via effect is not obvious and the signal quality is roughly equal in low frequency (<1 GHz). Yet, optimization strategy we proposed displays distinct advantages with the increasing of frequency, especially when the frequency is beyond 1.5 GHz. When the frequency is 1.6 GHz, the method we proposed achieves better signal quality than [4] by 1.3 dB. Furthermore, the experimental results show that the advantage will be more obvious as the signal frequency increases.
The main reason involves two aspects. Firstly, the parasitic effect of via is not obvious in low frequency. Yet, the impact of via on signal reflection starts manifesting with the increasing of frequency. Due to proper selection of via parameters in the signals optimization, the characteristic impedance of the transmission line achieves matching, which improves the impedance continuity. Secondly, IPSO-IMp obtains better accuracy than standard PSO.

Time Domain Analysis.
In order to analyze the effectiveness of our optimization strategy in the time domain, this section achieves the eye diagram simulation of DDR3 reading and writing circuits. The data rate is set to 1.6 Gbps and each data line transmits random number. The simulation eye diagram before optimization is shown in Figure 8. It can be observed that there are overshoots and undershoots, which indicates that signal reflection exists due to impedance discontinuities and the quality of eye diagrams is poor. Figure 9 is the simulation eye diagram when using the strategy in [4]. The quality gains effectively improvement compared with that before optimization. The signal reflection is significantly weakened, with the eye diagram higher by 155 mV averagely when reading and writing. When adopting the approach we proposed, the simulation eye diagram of DDR3 reading and writing circuits is given in Figure 10. As is shown, the eye quality is improved even more obviously. Compared to the former optimization, the height of the eye diagram improved 173 mV averagely. The detailed result of the eye diagrams simulation is listed in Table 5.

Conclusions
This paper presented an enhanced optimization strategy for DDR3 high-speed bus design, which aims to reduce the signal reflection. The strategy obtains parasitic effects of via specifically for equivalent circuit model through theoretical derivation. Additionally, we proposed an improved particle swarm algorithm to optimize the parameters in highspeed bus design. The experiments of frequency domain and time domain demonstrate that the strategy proposed could improve effectively impedance continuity of transmission line and reduce reflection effects of high-speed signals. The superiority is even more obvious with the increasing data transfer rate. It provides the referential meaning for the design of DDR4 and even higher speed bus.