The main purpose of this paper is to develop a new kind of PCI slave device serving as a motion controller for a biaxial motion control system. This kind of controller device is a new realization scheme of PCI devices, which is embedded with a deeply customized PCI interface block instead of traditional PCI interface chips, which will greatly promote the comprehensive performance of the device. Besides, we improved the popular and widely used DDA arc interpolation algorithm, promoting its performance in both accuracy and stability, and integrated it into our device, allowing the ability of the moving parts to move along nonlinear curve paths. Currently, this kind of controller device has been successfully applied on a surface mount machine which is also developed by our lab. As a result, the controller device performs well and is able to satisfy the requirement of accuracy and velocity of the surface mount machine. And its reliability and stability are also remarkable.
Biaxial motion control system is a kind of electromechanical system widely used in various areas such as industrial manufacturing and commodity production [
For example, a plane coordinate plotter mentioned in [
Here in this paper, an implementation of motion control board specially intended for biaxial motion systems is proposed. This control board is designed as a slave device abiding PCI bus protocol, allowing fast data transactions between the upper control master and the slave device, possibly reaching to a peak speed of 132 MB/s (32-bit in 33 MHz clock).
In terms of board level design, traditional scheme usually includes microcontroller, FPGA, and an extra PCI interface device, such as PLX9054 [
Another important issue which ought to be carefully considered for biaxial motion system is arc motion, that is, how to move the cursor along a curve rather than a line [
Although the basic problem of arc motion has been solved by means of DDA arc interpolation algorithm, some other problems such as moving stability and smoothness still affect the performance of the algorithm. For example, this kind of DDA algorithm, which we call traditional DDA, tends to cause unwanted sawteeth on the moving track and often produces huge errors when arc radius is large. In order to overcome its defects and improve its performance, we modified the algorithm, allowing the moving parts of a motion system to track along a given curve within tolerable errors.
The improved DDA algorithm is rewritten in Verilog HDL, and embedded in an FPGA device on board.
The entire system structure of the motion control board is shown in Figure
Systematic structure of the control board.
As shown in Figure
PCI is a multiplexing bus, which makes it relatively sophisticated to decode the PCI protocol. In general, the main task of the protocol decoding block is to separate address and data from the multiplexed AD pins [
DDA interpolation block used in this board is written in Verilog HDL and modularized in FPGA device. Thus, the speed of operation is higher and fewer resources are occupied. This part is also going to be amply discussed in the following contents.
By writing data to the functioning registers during I/O transactions, the system is able to control the board device in different modes, which makes the board system rather flexible to use.
Since transactions on PCI bus are far faster than those on back-end bus [
The instruction register conserves the control instructions delivered from the system and thus produces a series of frequency-controlled pulses according to the instructions to
The signal level on PCI bus is 5 V-TTL, while it is 3.3 V-CMOS on pins of FPGA device. Thus, bidirection bus switches such as 74CBT3384 are needed to serve as level converters between the two different levels.
Microcontroller unit (MCU) serves as a center controller, making the system function according to the program written inside the chip.
Components on board also include CAN controller and other bus connectors.
Although some PCI interface chips, such as PCI9054 [
PCI protocol decoding block provides necessary information to PCI master system when the system starts and raises itself, including device ID, vendor ID, resource requirement, and function options [
When an access occurs on the PCI bus, the PCI device should check out whether it is being called by the system [
Timing decoding of PCI protocol is the key function of the decoding block. PCI device behaves strictly according to PCI timing.
Protocol decoding block will avoid the occurrence of bus collision by handling enable signals of PCI bus appropriately.
The kernel of PCI protocol decoding block is a state machine. Here are the states involved during a data transaction. And the state transitions of the decoding block are shown in Figure
State transition of the PCI protocol decoding block.
The slave device is idle, waiting for an access initiated by the system.
The system has initiated a configuration access to PCI device, including configuration read and write operations, and is waiting for respond.
The system has initiated an I/O access to PCI device, including I/O read and write operations, and is waiting for respond.
The system has initiated a memory access to PCI device, including memory read and write operations, and is waiting for response.
This state is specially inserted between address cycle and the first data cycle in a read access on PCI bus to avoid bus collision.
A configuration transmission is taking place.
A nonconfiguration transmission is taking place.
A PCI access is ending. All control signals will be disabled and all S/T/S signals will be released in one cycle.
An access on PCI bus consists of three parts, one address cycle, at least one cycle, and several wait cycles. Address appears on AD pins during the address cycle while the data appears during data cycles. And wait cycles are inserted for data latency [
There are 3 types of transactions on PCI bus. A configuration transaction usually occurs as soon as the control board is inserted to the system motherboard, while I/O transactions are used for parameter settings. And a memory transaction takes place during a data transition operation.
Digital differential analyser (DDA), usually serving as an interpolation algorithm, is widely applied in modern numerical control systems [
DDA interpolation algorithm for an arc in Cartesian coordinate is shown in Figure
DDA arc interpolation in
Considering this
According to formula (
At the beginning, the
The algorithm keeps on conducting until the error check registers indicate that the cursor has reached the ending point, or within tolerable errors, after which the iteration stops. Thus, the ending condition can be expressed as
When End equals “1,” the recursion stops. And the points
It is worth mentioning that left-shifting normalizing is often used to maintain velocity stability [
Given starting point (8,0), ending point (0,8), and
Example of DDA arc interpolation.
As mentioned in Figure
Logic structure of traditional DDA arc interpolation.
For the traditional DDA arc interpolation algorithm mentioned in Figure
Huge errors of traditional DDA algorithm.
One way to solve this problem is to select an appropriate weighting factor to be multiplied by the integrand before being added into the corresponding accumulator. Here, we define the weighting factor as
When
Better performance of weighted DDA algorithm.
Another problem is that even though the errors are small enough, a great number of “sawteeth” can be seen on the path as shown in Figure
The sawteeth of DDA.
Generally, a sawtooth consists of a
Sawtooth along the arc path.
Therefore, we can do the addition to the latter accumulator in advance in order to produce an advanced overflow. Thus, the recursion formulae can be rewritten as
As a result, the cursor will take one “combined” step instead of two separate steps, thus, eliminating the sawtooth in advance. With the same conditions as Figure
Excellent performance of improved DDA algorithm.
Detailed comparison between traditional and improved DDA.
In order to implement this improved DDA algorithm on FPGA device, the overflow conditions have to be changed, and the accumulator registers should add to 2 extra bits. One is to save the carry bit caused by overflow, while the other is sign bit, since the value of an accumulator can be negative. And the logic structure of the improved algorithm is shown in Figure
Logic structure of improved DDA arc interpolation.
In order to measure the performance of the two algorithms, we define path variance
For the typical DDA algorithm, with conditions in Figure
While for the weighed DDA algorithm as shown in Figure
And for the sawteeth-eliminating DDA as shown in Figure
From the data, we can see that the improved DDA algorithm has the smallest path variance. Thus, it can be concluded that the performance of the sawteeth-eliminating DDA algorithm improved by us is better than the traditional one.
The maximum power allowed for a 32-bit PCI device is 25 W. And the power system of PCI bus is more complex than other kinds of buses, which has 6 different power connectors (+3.3 V, +5 V, +
The device PCB board has four layers: top-layer, bottom-layer, power plane, and ground plane. Power plane, especially, should be divided into several power districts. If possible, high-speed signal wires will not go across two different power districts. Otherwise, adjust the direction of the slit to minimize the impact.
Every Vcc pin of every digital chip is assigned a decoupling capacitor connected to the ground. And every power pin is allocated a 0.047
The clock signal of PCI device is based on reflected wave effect rather than incident wave effect. Therefore, the trace length of the PCI clock signal is 2.5 inches and
The maximum length of signal routing of 32-bit PCI slave device will not exceed 1.5 inches.
Every control signal pin should be assigned with a pull-up resistor in case that these pins will not float when not driven. A PCI-slave-device developer need not care about this since it has been done on the motherboard of system.
A simple series of tests on signal integrity of waveforms on PCI pins (the golden fingers) has been conducted on the PCI device board [
No. 1 SI test on PCI pins.
The second test is to export a 20 MHz square wave to one finger while testing that on an adjacent finger. This test is to judge how much interference one high frequency signal on one pin can cause on other PCI pins, especially on the adjacent ones. And the result is shown in Figure
No. 2 SI test on PCI pins.
The third one is to export 2 waveforms to 2 adjacent fingers and observe both of them on signal integrity in order to measure the coupling interference. And the result is shown in Figure
No. 3 SI test on PCI pins.
In this paper, a kind of motion control board is discussed. And the design scheme of the control board is reasonable and able to satisfy the requirements of biaxial motion systems. The PCI protocol decoding block is self-designed and functions well. And more importantly, we improved the typical DDA arc interpolation algorithm, broadened its application, and reduced its negative effect: the sawteeth. Currently, it has been put in use by our lab, and the result is rather remarkable.
The authors declare that there is no conflict of interests regarding the publication of this paper.