An Implement of FPGA Based PCI Controller Device and Improvement of DDA Arc Interpolation

The main purpose of this paper is to develop a new kind of PCI slave device serving as a motion controller for a biaxial motion control system. This kind of controller device is a new realization scheme of PCI devices, which is embedded with a deeply customized PCI interface block instead of traditional PCI interface chips, which will greatly promote the comprehensive performance of the device. Besides, we improved the popular and widely used DDA arc interpolation algorithm, promoting its performance in both accuracy and stability, and integrated it into our device, allowing the ability of the moving parts to move along nonlinear curve paths. Currently, this kind of controller device has been successfully applied on a surface mount machine which is also developed by our lab. As a result, the controller device performs well and is able to satisfy the requirement of accuracy and velocity of the surface mount machine. And its reliability and stability are also remarkable.


Introduction
Biaxial motion control system is a kind of electromechanical system widely used in various areas such as industrial manufacturing and commodity production [1].For example, plane coordinate plotter and surface mount machine are two representative kinds of biaxial systems.In general, one of the common points of this kind of systems is the requirement of high speed and accuracy.
For example, a plane coordinate plotter mentioned in [2] is a typical kind of biaxial system.The system consists of several components including a motion platform, turn-screws, stepping motors, plotting cursor, microcontroller, and computer interface.During a plotting process, the cursor is moved to a given position under the control of the microcontroller according to the instructions from the computer [2].In order to draw a wanted curve quickly and accurately, high speed of data transaction must be guaranteed as well as the reliability.Surface mount machine is also a kind of sophisticated biaxial motion system [3], the index requirement of which is even higher.
Here in this paper, an implementation of motion control board specially intended for biaxial motion systems is proposed.This control board is designed as a slave device abiding PCI bus protocol, allowing fast data transactions between the upper control master and the slave device, possibly reaching to a peak speed of 132 MB/s (32-bit in 33 MHz clock).
In terms of board level design, traditional scheme usually includes microcontroller, FPGA, and an extra PCI interface device, such as PLX9054 [4,5].However, this kind of design is not compact enough.We embedded the PCI protocol decoding block, written in Verilog HDL, into a FPGA device on board instead of independent PCI interface chips, reducing a large quantity of routes on board, thus promoting the reliability of the board system and cutting down the total cost.More importantly, the board is convenient to update to a new scheme version and can be deeply customized according to the demand of customers.Furthermore, the compact design is easy to be protected from imitation and malicious plagiarism.
Another important issue which ought to be carefully considered for biaxial motion system is arc motion, that is, how to move the cursor along a curve rather than a line [6], which requires a systematic moving algorithm to arrange the velocities and positions of the two motors of the two axes.To solve this problem, an algorithm called digital differential analyser (DDA for short) has been proposed [7,8].DDA is often used as a motion interpolation algorithm.When moving the cursor of a biaxial motion system, an X-Y plotter, for example, DDA breaks the track into several micro parts.Each of the parts can be covered by one step of the cursor, which is realised by moving the cursor to a micro distance along one axis.As a result, a curve can be covered within several micro steps.The basic principle of DDA arc interpolation will be discussed later.
Although the basic problem of arc motion has been solved by means of DDA arc interpolation algorithm, some other problems such as moving stability and smoothness still affect the performance of the algorithm.For example, this kind of DDA algorithm, which we call traditional DDA, tends to cause unwanted sawteeth on the moving track and often produces huge errors when arc radius is large.In order to overcome its defects and improve its performance, we modified the algorithm, allowing the moving parts of a motion system to track along a given curve within tolerable errors.
The improved DDA algorithm is rewritten in Verilog HDL, and embedded in an FPGA device on board.

Device System Structure
The entire system structure of the motion control board is shown in Figure 1.This board communicates with the upper system through PCI system bus, which offers a rather large bandwidth of data transmission [9].Meanwhile, the board works under the control of the microcontroller unit (MCU for short) and drives the motors with pulse signals produced by the DDA interpolation block or distributed by the upper system.
2.1.FPGA Device.As shown in Figure 1, most of the functioning logic modules are implemented in FPGA device and written in Verilog HDL.By doing this, the out-chip routing is greatly reduced and the reliability of the entire board system is deeply enhanced.

PCI Protocol Decoding
Block.PCI is a multiplexing bus, which makes it relatively sophisticated to decode the PCI protocol.In general, the main task of the protocol decoding block is to separate address and data from the multiplexed AD pins [10], which will be explained in detail in the following contents.
2.1.2.DDA Interpolation Block.DDA interpolation block used in this board is written in Verilog HDL and modularized in FPGA device.Thus, the speed of operation is higher and fewer resources are occupied.This part is also going to be amply discussed in the following contents.

Functioning Registers.
By writing data to the functioning registers during I/O transactions, the system is able to control the board device in different modes, which makes the board system rather flexible to use.

DATA FIFO.
Since transactions on PCI bus are far faster than those on back-end bus [11], a FIFO or RAM block is necessary to serve as a buffer in order to balance the speed difference.

Instruction Register and Pulse Output
Register.The instruction register conserves the control instructions delivered from the system and thus produces a series of frequencycontrolled pulses according to the instructions to X-axis and Y-axis motor drivers, driving the cursors to move to the designated position.

Level Conversion
Chips.The signal level on PCI bus is 5 V-TTL, while it is 3.3 V-CMOS on pins of FPGA device.Thus, bidirection bus switches such as 74CBT3384 are needed to serve as level converters between the two different levels.

Microcontroller Unit. Microcontroller unit (MCU)
serves as a center controller, making the system function according to the program written inside the chip.

Other Essential Components.
Components on board also include CAN controller and other bus connectors.

PCI Bus Protocol Decoding Block
Although some PCI interface chips, such as PCI9054 [12], are available for this system, we use the PCI protocol decoding block embedded in FPGA device as the PCI bus interface.

Functions of Decoding Block
3.1.1.Device Configuration.PCI protocol decoding block provides necessary information to PCI master system when the system starts and raises itself, including device ID, vendor ID, resource requirement, and function options [13].After this, the base address of memory resource assigned by system is written back into PCI device, which will be conserved by the decoding block.

Address Decoding.
When an access occurs on the PCI bus, the PCI device should check out whether it is being called by the system [14].If the access address on the bus hits the range of the address space of the back-end device, it should respond to the system immediately (usually within 3 cycles according to PCI protocol [15]).And wait cycles are inserted for data latency [16].

Protocol Timing
There are 3 types of transactions on PCI bus.A configuration transaction usually occurs as soon as the control board is inserted to the system motherboard, while I/O transactions are used for parameter settings.And a memory transaction takes place during a data transition operation.

Embedded DDA Interpolation Algorithm
Digital differential analyser (DDA), usually serving as an interpolation algorithm, is widely applied in modern numerical control systems [17].It is used when shifting the moving parts, or cursors of a system, to a designated position along given tracks, especially curve tracks.Generally speaking, the paths of the cursors controlled by numerical signals will not perfectly match the given continuous track.Therefore, the main issue lies in how to plan a path for a cursor to approach the given track as closely as possible.Based on integral theory, DDA arc interpolation algorithm breaks a continuous track into a series of discrete points that the cursor of a numerical system is able to reach [18].

Basic Principle of DDA Arc
Interpolation.DDA interpolation algorithm for an arc in Cartesian coordinate is shown in Figure 3 [19].To make it easy to analyse, we take the 1st quadrant for example.According to Figure 3, we have [19] where ( 0 ,  0 ) is the starting point and (  ,   ) is the ending point, while (  ,   ) stands for the current position of the cursor. is radius of the arc, while V is the tangential velocity.
V  and V  , respectively, stand for velocity along -axis and axis.And  is a proportional constant if we assume that the tangential velocity V of the moving part is constant.Therefore, we have [18] Considering this we have [17] where  is the number of steps it takes for the cursor to reach the ending point (  ,   ) starting from ( 0 ,  0 ).According to formula (4), we get the DDA arc interpolation algorithm [18].In order to describe the algorithm briefly, we define two arithmetic expressions as follows: where  is a logic expression, and  equals "1" when  is true and "0" when  is fault.Also, we define where  is the remainder when  is divided by .
At the beginning, the -axis integrand register is loaded with   , while the -axis integrand register is loaded with   .At the same time, the accumulators of the two axes are usually half-loaded [20].In other words, the highest bit is set to "1, " while other bits remain "0." Therefore, the initial conditions can be written as and then the integral clock starts to drive the accumulators to add to the values conserved in the corresponding integrand registers, producing overflow pulses, which drive the integrand registers to update their values with new current coordinate (  ,   ).Thus, we have the recursion formulae as The algorithm keeps on conducting until the error check registers indicate that the cursor has reached the ending point, or within tolerable errors, after which the iteration stops.Thus, the ending condition can be expressed as End =  (  =   ,   =   ) .
When End equals "1, " the recursion stops.And the points (  ,   ), forming the path of the cursor, are just what we want.
It is worth mentioning that left-shifting normalizing is often used to maintain velocity stability [21], which will not be deeply discussed here.
As mentioned in Figure 4, we call this kind of DDA algorithm as traditional DDA algorithm.And the logic structure of traditional DDA arc interpolation block embedded in FPGA is shown in Figure 5.

Improved DDA Arc Interpolation.
For the traditional DDA arc interpolation algorithm mentioned in Figure 4, there are some problems.The most fatal one is that when the radius of the arc is far larger than the step length, the errors can be intolerable.For instance, when step length is 1, while radius is 100, the simulation result is shown in Figure 6.
One way to solve this problem is to select an appropriate weighting factor to be multiplied by the integrand before being added into the corresponding accumulator.Here, we  define the weighting factor as .Thus, the recursion formulae ( 8) can be changed into When  = 0.125, while other conditions are the same as Figure 6, the simulation result is shown in Figure 7, from which it can be concluded that a properly small  can improve the performance of DDA when radius is large.We call this kind of DDA algorithm as weighted DDA arc interpolation.
Another problem is that even though the errors are small enough, a great number of "sawteeth" can be seen on   the path as shown in Figure 8, which may cause constant mechanical shocks on the system.The main reason leading to this problem is that the accumulators of the 2 axes function separately.Thus, the motion on each axis proceeds separately as well, unless both the accumulators overflow at the same time.
Generally, a sawtooth consists of a y-axis step motion and an x-axis step motion closely following as shown in Figure 9.It is obvious that a sawtooth always occurs when accumulator  of one axis has overflown, while the other is going to overflow in the next step.
Therefore, we can do the addition to the latter accumulator in advance in order to produce an advanced overflow.Thus, the recursion formulae can be rewritten as As a result, the cursor will take one "combined" step instead of two separate steps, thus, eliminating the sawtooth in advance.With the same conditions as Figure 7, the simulation result of the improved DDA arc interpolation is shown in Figure 10.And the detailed comparison of the simulation results between the typical DDA algorithm and the improved one as shown in Figure 11, from which it is palpable that most of the sawteeth on the path are eliminated.We call this kind of algorithm improved DDA algorithm.In order to implement this improved DDA algorithm on FPGA device, the overflow conditions have to be changed, and the accumulator registers should add to 2 extra bits.One is to save the carry bit caused by overflow, while the other is sign bit, since the value of an accumulator can be negative.And the logic structure of the improved algorithm is shown in Figure 12.
In order to measure the performance of the two algorithms, we define path variance  arc as where  is the radius of the curve path and  stands for the number of the total interpolating steps from the start point to the end point.
For the typical DDA algorithm, with conditions in Figure 6, we have While for the weighed DDA algorithm as shown in Figure 7, with the same conditions, we have And for the sawteeth-eliminating DDA as shown in Figure 10, with the same conditions, we have From the data, we can see that the improved DDA algorithm has the smallest path variance.Thus, it can be concluded that the performance of the sawteeth-eliminating DDA algorithm improved by us is better than the traditional one.

Pull-Up Resistors.
Every control signal pin should be assigned with a pull-up resistor in case that these pins will not float when not driven.A PCI-slave-device developer need not care about this since it has been done on the motherboard of system.

Signal Integrity Test on PCI Pins.
A simple series of tests on signal integrity of waveforms on PCI pins (the golden fingers) has been conducted on the PCI device board [22,23].The first test is to export a 20 MHz (20 Mhz is the maximum frequency that can be generated by the board, which is still close to 33 MHz, PCI clock signal) square wave to one golden finger from the board device and test it using an oscilloscope.And the test result is shown in Figure 13, from which it can be seen that the waveform is rather integrated, with steep rising edge and proper overshoot.The second test is to export a 20 MHz square wave to one finger while testing that on an adjacent finger.This test is to judge how much interference one high frequency signal   on one pin can cause on other PCI pins, especially on the adjacent ones.And the result is shown in Figure 14.It is clearly revealed that the waveform on the tested finger (the lower one) is much like that on the exported finger (the upper one), but the amplitude is far smaller, not enough to reach the threshold level.Therefore, it can be concluded that even though electromagnetic interference shows up on adjacent pins of the output, it does no harm to the functions of the system.
The third one is to export 2 waveforms to 2 adjacent fingers and observe both of them on signal integrity in order to measure the coupling interference.And the result is shown in Figure 15.It is obvious that both waveforms are highly integrated and without big distortion and interference.After the three tests, we can make a conclusion that the property of signal integrity of the board is rather remarkable.Thus, the harmful impact of electromagnetic interference is negligible.

Conclusion
In this paper, a kind of motion control board is discussed.And the design scheme of the control board is reasonable and able to satisfy the requirements of biaxial motion systems.The PCI protocol decoding block is self-designed and functions well.And more importantly, we improved the typical DDA arc interpolation algorithm, broadened its application, and reduced its negative effect: the sawteeth.Currently, it has been put in use by our lab, and the result is rather remarkable.

FPGAFigure 1 :
Figure 1: Systematic structure of the control board.

Figure 2 :
Figure 2: State transition of the PCI protocol decoding block.

Figure 6 :
Figure 6: Huge errors of traditional DDA algorithm.

Figure 7 :
Figure 7: Better performance of weighted DDA algorithm.

Figure 12 :
Figure 12: Logic structure of improved DDA arc interpolation.
The device PCB board has four layers: top-layer, bottom-layer, power plane, and ground plane.Power plane, especially, should be divided into several power districts.If possible, high-speed signal wires will not go across two different power districts.Otherwise, adjust the direction of the slit to minimize the impact.5.1.3.Power Decoupling.Every Vcc pin of every digital chip is assigned a decoupling capacitor connected to the ground.And every power pin is allocated a 0.047 F electrolytic capacitor and a 0.01 F nonpolar capacitor.What's more, pads and vias of the decoupling capacitors will not be 0.25 inches farther from corresponding Vcc pins or "golden-fingers, " and routing wide shall be larger than 0.02 inches.