A Low Jitter, Low Power, CMOS 1.25-3.125Gbps Transceiver Ahmed Younis, Charlie Boecker, Kazi Hossain, Firas Abughazaleh, Bodhi Das, Yiqin Chen, Moises Robinson, Scott Irwin and Bernie Grung RocketChips, a Xilinx Company 7901 Xerxes Avenue South, Suite 316 Minneapolis, MN 55431 Phone: 512-306-7292 x549 [email protected]
Abstract This paper describes a high-speed CMOS transceiver that can run at a rate of up to 3.125Gbps, from a 1.8V power supply. The chip includes 10/20:1 full duplex Serializer/Deserializer, (SERDES), novel clock and data recovery circuits, and high-speed differential I/Os. Special techniques have been used to increase the jitter tolerance as well as to reduce the amount of output jitter. The chip has been fabricated in TSMC 0.18u 1P6M digital process and consumes less than 175mW when running at 2.5Gbps with a 26ps deterministic jitter and less than 3.9ps random jitter.
The receiver has also a programmable 10/20 bit parallel interface. It accepts a high-speed serial data stream, recovers the clock and data, and outputs a 10-bit or 20-bit parallel word and the recovered clock at 1/10th or 1/20th of the input data rate. It also provides symbol (byte) alignment based on the comma symbol of positive disparity as defined in 8B/10B encoded data. The receiver is capable of operating at serial data rates of 1.25, 2.5, and 3.125Gbps, which can be programmed to be as 10x or 20x REFCLK. The receiver has a fully monolithic PLL − requiring no external filter components and uses innovative techniques and circuits to reduce the jitter and increase the jitter tolerance. TX_VDDD TX_VSSD
Many applications require high bandwidth, low cost, low power transceivers, such as point-to-point internet connection, serial back plane connections over FR4 material, hard disk drives and many others. This paper describes a low power and low jitter CMOS transceiver that is capable of running at a rate of up to 3.125Gbps, and is compatible with many industry standards.
2. Design The transmitter, shown in Figure 1.a), has a programmable 10/20 bit parallel interface. It accepts a 10-bit or 20-bit parallel data word on each rising edge of a reference clock (REFCLK) and serializes the data at 10 or 20 times REFCLK, by using the two edges of a Voltage Controlled Oscillator (VCO) clock that is 5x or 10x REFCLK. A duty cycle distortion correction circuit (DCDC) that is shown in Figure 1.a) is used to correct any distortion in the VCO clocks that might exist due to process, voltage, or temperature variations. The device is capable of operating at serial data rates of 1.25, 2.5, and 3.125Gbps. It uses a fully monolithic Phase Locked Loop (PLL) − requiring no external filter components. The output driver generates differential serial data outputs by using pre-emphasis to improve the intersymbol interference (ISI) in cable environments.
TX_DP TX_DN TX_EMP TX_EMN
DCD_EN FB_SEL TX_PD
a) RX_VDDD RX_VSSD
RX_DIV RX_CD_EN RX_D[0..19] RX_CLK1 RX_CLK0
Comma Detect / Symbol Align
Freq. Diff. Detector REFCLK
Fine Loop Bias Generator
b) Figure 1. Block diagram of the a) transmitter and the b) receiver. There are two loops in the receiver; the coarse loop and the fine loop. The coarse loop PLL locks to
REFCLK, whereas the fine loop PLL locks to the input serial data. The fine loop PLL is enabled after the receiver has locked to REFCLK. A Frequency Difference detector circuit (FD) is used not only to detect locking of the PLL to REFCLK, but also to enable the fine loop and disable the coarse loop when the VCO frequency is within a certain percentage of REFCLK frequency − 2% in this design. While in operation, if the PLL loses lock to the input data the FD will enable the coarse loop to lock to REFCLK, after which, the FD will again disable the coarse loop and enable the fine loop to drive the receiver to lock to the input data. The design of the coarse loop is similar to a conventional PLL. It consists of a Phase-Frequency Detector (PFD), a Charge Pump (CP), a 2nd order Loop Filter (LF), a 10-stage VCO and a divider as shown in Figure 1.b) inside the solid box. The 10-stage VCO generates a set of 20 clocks, that are time-shifted by 1/20th of the VCO period. Only one clock is used as an input to the PFD that compares this clock with the REFCLK. Based on the difference between these two clocks, the PFD generates up and dn signals that drive the CP which will in turn sink/source a current from/to the LF, which will average this current on a capacitor, and will generate a voltage signal that adjusts the frequency of the VCO. The fine loop employs the 2x oversampling method as indicated in Figure 2. This method is similar to that used by Fiedier et al.  and Gu et al. . The fine loop also employs an analog phase correction method, which is unique to our design . The fine loop VCO operates at 1/5th of the input serial data rate. For example, when the data rate is 2.5Gbps or 1.25GHz, the VCO outputs will run at 250MHz or 1.25/5 GHz. The phase detector has 10 data output pairs and 10 phase output pairs where the former provide full digital signal levels and the latter provide analog signals.
3 00 n
3 01 n
3 02 n
3 03 n
a) b) Figure 2. a) 1010 pattern. b) 8B10B pattern. Large dots denote Data Samples and small dots denote Phase Samples. A block diagram of the fine loop is given in Figure 1.b) inside the dashed box. The basic elements of the fine loop are: (1) a VCO, (2) a Phase Detector (PD) circuit, (3) a transconductance gm circuit, and (4) a Loop Filter (LF) circuit. The VCO and LF are shared with the coarse loop as shown in Figure 1.b). The fine loop is used to track incoming phase variations in the received coded Non Return to Zero
(NRZ) signal. This tracking occurs after the coarse loop has brought the VCO close to the correct frequency. The fine loop is a type that integrates phase error and forces it toward zero. The coded NRZ can have a run length as large as 5, which means that there can be 5 bit times between transitions of the incoming signal or 5 bit times between phase detector updates. A positive phase error means that the VCO is lagging behind the input. If the transitions are perfect ramps between −V volt and +V volt, then the samples taken at or near transitions measure the time error in the VCO. The time error is proportional to the phase error. The gm circuit converts this error into a proportionate current that feeds the loop filter, which will in turn, increase or decrease the VCO frequency.
3. Jitter and Noise Minimization Special techniques were used to reduce the power supply noise and the jitter of the chip.
Power Supply Noise Reduction
A decoupling capacitor structure has been used in this SERDES in order to eliminate oscillations on the power supply that are caused by the combination of power supply inductance and on-chip capacitance; and to provide an instantaneous current for device switching needs. From the point of view of power supply noise, the circuit-blocks under the entire transceiver were grouped into various circuit types based on a) how much noise the individual circuit-blocks generate, b) how susceptible they are to noise, and c) how close they are to each other in terms of signal flow and layout. Each of these groups is provided with an individual decoupling filter structure as well as a power supply Kelvin connection. An array of NMOS capacitors in parallel was used as the decoupling capacitor structure. They were placed under the power buses to avoid occupying any extra space. A parasitic gate resistance and a parasitic channel resistance will come in series with each of the transistors . The transistor models do not represent these parasitic resistances reliably; hence these resistances have been inserted as parasitic resistances in series with the transistors used as decoupling capacitances. Other parasitics, such as the metal routing resistance as well as the metal-metal capacitance between the two power supplies, have been added to the simulation. Figure 3.a) shows the simulation results for one of the circuit groups before adding the decoupling capacitors structure; it shows a noise of 200mV on the power supply. Figure 3.b) shows the results after the addition of the decoupling capacitors, which reduced the noise range to less than 20mV.
mismatch. Assuming that the x dimension is much larger than the y dimension in the wire routing, (which is the case in our design), Figure 4.a) shows a mismatch ratio of 10:1, while Figure 4.b) shows a mismatch ratio of 2:1.
Duty Cycle Distortion Correction Circuit (DCDC)
The transmitter includes a DCDC circuit to correct the duty cycle distortion in the VCO clocks. It works by integrating the voltage waveform out of a buffer. If the waveform spends more time high than low, a feedback loop corrects this by shifting the common mode input to the buffer. Things that are important when designing the DCDC circuit:
1.7 1.5 1.3
b) Figure 3. Power supply simulation a) without the decoupling structure, and b) with the decoupling structure.
1.1 B (V)
700m 500m 300m
-100m 440.0n A: (440.397n 713.886m) B: (440.797n 1.01315)
440.6n time ( s )
delta: (399.549p 299.266m) slope: 749.01M
a) clk+ clk-
a) I O
I O 800m
b) Figure 4. a) Traditional VCO layout and b) New VCO layout.
440.6n time ( s )
b) Figure 5. Effect of DCDC circuit. a) DCDC circuit is OFF. b) DCDC circuit is ON.
VCO Jitter Minimizatio
As the major jitter contributor to the overall system, special techniques were followed in the design and layout of the VCO blocks. Mismatch among the loads of the VCO delay lines will result in jitter. Traditionally, a VCO delay line is laid out as shown in Figure 4.a) where the cells of the VCO are arranged in ascending or descending order and the output of a certain delay cell goes to the input of the next delay cell, except for the last delay cell in the line whose output goes to the input of the first delay cell. From Figure 4.a) we can see that the length of the signal path between any two consecutive cells is the same, except between cell 1 and cell 10, which is 10 times longer than the rest. Figure 4.b) shows the new VCO layout where the signal path is more uniform among the cells and hence will have less
A: (440.251n 828.355m)delta: (399.685p -64.8638m) B: (440.651n 763.491m)slope: -162.287M
The feedback loop gain determines how close the duty cycle of the corrected clocks can be to an ideal duty cycle (50%). If the gain is too low, then the correction of the DCD will be small. If the gain is high, the DCD can be almost entirely corrected. 2. The bandwidth of the DCDC circuit should be kept much lower than the PLL loop bandwidth, so it doesn't interfere with the locking behavior of the PLL. If the DCDC circuit responds too fast, it can look like a phase shift in the clock that is being used to lock the PLL, which may cause the loop to fail to lock. Figure 5 shows the simulation results for a VCO clock running at 2.5Gbps when a) the DCDC circuit is OFF and b) when the DCDC is ON.
4. Testing Results A die photo of the SERDES is shown in Figure 6. Figure 7 shows the eye diagram of the transmitter running at 2.5Gbps. Eye width results have been consistently measured at nearly 330ps. Figure 8 shows that the chip has around 3.9ps of random jitter when it is running at 2.5Gbps. At 3.125Gbps the chip had 2.5ps of rms jitter. Lab measurements show that the SERDES exhibits less than 30ps p-p of deterministic jitter under all speeds. The SERDES has been tested at higher speeds and was found working properly up to 3.45Gbps, although the transmitter could run up to 4.3Gbps. At 2.5Gbps, Two SERDES chips were able to talk to each others reliably over a 60” FR4 trace with 30% preemphasis. The pre-emphasis can be set to 0, 10%, 20% or 30%. Periodic jitter tolerance measurements at 2.5Gbps show that the SERDES has a tolerance of 1.0UI up to 5MHz, 0.85UI at 10MHz and 0.8UI at 20MHz. Table 1 shows a summary of the measurement results of the SERDES under two speeds; 2.5Gbps and 3.125Gbps.
Table 1. Performance Summary Power Supply 1.8V Core Area Jitter
rms: 3.9ps, p-p: 26ps
1.25, 2.5 and 3.125 Gbps
Figure 8. Random Jitter measurement at 2.5Gbps operation.
Figure 6. Die photo for the Transceiver.
The authors would like to thank Matthew Shafer, Mike Bohnert and Jari Vahe for testing the chip, and Stephen Anderson, Shahriar Rokhsaz, Jinghui Lu, Mike Gaboury, Michael Nix, Eric Groen, Matthew Bibee and Wayne Walters for system level and design discussions.
Figure 7. Transmitter eye at 2.5Gbps.
 A. Fiedier, R. Mactaggart, J. Welch, and S. Krishnan, “A 1.0625Gbps Transceiver with 2xOversampling and Transmit Signal Pre-Emphasis”, ISSCC Digest of Technical Papers, FP15.1, Feb. 1997.  R. Gu, J.M. Tran, H.-C. Lin, A.-L. Yee, and M. Izzard,” A 0.5-3.5Gb/s Low-Power Low-Jitter Serial Data CMOS Transceiver”, ISSCC Digest of Technical Papers, WA20.4, Feb. 1999.  U.S. Patent Application entitled “Phase Lock loop and Transconductance Circuit for Clock Recovery”, submitted Dec. 19, 1999.  P. Larsson, "Parasitic Resistance in an MOS Transistor Used as On-Chip Decoupling Capacitance", IEEE Journal of Solid-State Circuits, Vol. 32, April 1997.