## A 90mW 4Gb/s Equalized I/O Circuit with TP 15.3 Input Offset Cancellation Ming-Ju Edward Loo, William Dally, Patrick Chiang Computer Systems Laboratory, Stanford Univ., Stanford, CA Recently-described CMOS serial links operate at multiple gigabits/s signaling rates over several meters of cable [1,2]. However, these previous links require large amounts of power and chip area, making them unsuitable for applications requiring hundreds of I/Os per chip. The best previously-published power and area above 4Gb/s in CMOS are 310mW and 0.6mm2 [3]. Integration of a hundred of these I/Os would burn more than 30W of power and consume 60mm<sup>2</sup> of chip area. The 4Gb/s transceiver described here dissipates only 90mW and requires less than 0.1mm2 chip area. This transceiver achieves low-power and low area using an input-multiplexed transmitter architecture, a regulated CMOS inverter-based delay-locked loop (DLL), and receiver offset calibration. The transmitter, shown in Figure 15.3.1, consists of a 4:1 input multiplexer, a source-coupled preamplifier, a source-coupled current driver, and an integral termination resistor. The series nFET multiplexer, sequenced by 4 evenly-spaced clocks generated by transmitter DLL, serializes a four-bit-wide input stream onto the input of the preamplifier. A pFET clamp on the multiplexer output limits its swing to 0.6V, and extends its bandwidth beyond the unitygain bandwidth of the process. The preamplifier provides a 3x impedance decrease and the output stage generates a balanced differential current into the line with 20mA peak magnitude. The transmitter includes a two-tap finite impulse response (FIR) preemphasis filter to overcome frequency-dependent attenuation of the transmission medium [1,3]. The equalizer handles up to 10dB of frequency-dependent attenuation, enough to drive over 10m of 24AWG twisted pair at 4Gb/s. This input-multiplexed transmitter has two main advantages compared to the conventional outputmultiplexed designs [1,2]. First, it greatly reduces chip area since only a single large output driver is required. Second, the clock load and hence clock buffer area, power, and jitter are greatly reduced. The clock load of the transmitter is 60fF per phase. An outputmultiplexed architecture with comparable drive requires more than 10x this number. The transceiver uses a CMOS inverter-based DLL with a regulated power supply as shown in Figure 15.3.2. In addition to lower power consumption (no static current in the delay line), this DLL offers lower supply sensitivity of jitter than one using source-coupled stages due to the regulated supply. The true and complement delay lines are cross-coupled with weak invertors to minimize skew. Delay is controlled by adjusting the supply voltage through the linear regulator. A sequential phase-only detector is used to lock the loop to 180° phase shift. The 1GHz input clock is AC-coupled to center the common-mode input to the delay line and avoid phase imbalances. Thereceiver, shown in Figure 15.3.3, uses the StrongARM sense amplifier [4] with capacitively trimmed offset voltage. Serial data is demultiplexed directly at the receiver input with 4 sense amplifiers, which are sequenced by 4 clock phases generated by receiver DLL. The sense amplifiers are trimmed by placing binary-weighted pMOS capacitors on the two integrating nodes. Digitally adjusting the capacitance while shorting the inputs unbalances the amplifier to cancel the offset voltage. Canceling the offset in this manner saves power by allowing the input amplifiers to use small devices and by enabling the link to operate reliably with a low signal swing. Simulation shows that the capacitors introduce up to $\pm 120 \text{mV}$ offset in 8mV steps. With bang-bang control, the worst-case offset after cancellation is 8mV with any untrimmed offset <120mV. The simulated input sensitivity of the receiver is 0.8mV. The transceiver uses a 0.25µm CMOS technology, Figure 15.3.7 shows the chip micrograph. The test chip contains a single link with one DLL each on the transmitter and receiver side and supporting test circuits. Full clock recovery is not implemented. The active area of the transceiver and DLLs is 0.08mm2. An additional 0.13mm2 is occupied by the offset-calibration control logic, Operating from 2.5V supply at 4Gb/s with a swing of 20mV and bit error rate (BER) well below 10-14, the entire link dissipates 88mW without pre-emphasis and 98mW with pre-emphasis. Figure 15.3,6b shows the power consumption vs. signal swing of the transmitter. Simulation indicates that 7mW of this is due to PRBS generator and checker. Figure 15.3.8 summarizes measured performance of the test chip. Figure 15.3.4a shows the eye diagram for the transmitter sending a 220-1 pseudo-random bit sequence (PRBS) at 4Gb/s, no preemphasis, and terminations at both ends of the line. Figure 15.3.4b shows that the frequency-dependent attenuation of a 1m by 7mil 0.5oz. stripguide with GETEK dielectric (about 10dB frequencydependent attenuation) causes sufficient ISI to completely close the eye. Enabling pre-emphasis cancels this ISI opening the eye at the far end of this line as shown in Figure 15.3.4c. Figure 15.3.4d shows the jitter histogram of the transmitter with quiet supply (16.4ps pp, 2ps RMS), and a jitter histogram with a 500mV 1MHz square wave superimposed on the supply is shown in Figure 15.3.4c (56.9ps p-p, 18ps RMS). The measured jitter sensitivity is 0.08 ps/mV p-p (simulated sensitivity to supply noise with 200ps edge rate is 0.12ps/mV in typical corner). Figure 15,3.5 shows the voltage swing vs. timing margin for the receiver. The solid line is with offset calibration and the dashed line without. The plot shows that the untrimmed receiver offset of the particular chip tested is only 20mV. However, calculated 3σ offset is 60mV. Figure 15.3.5 shows that offset calibration reduces the offset to <8mV and adds 40ps of timing margin. Figure 15,3,6a shows power vs. maximum speed for the transmitter and the DLL with supplies ranging from 2.5V to 2.85V. The transmitter and DLL operate up to 4.6Gb/s at 2.5V, and 5.3Gb/s at 2.85V. The receiver, not shown on the plot, is limited to 4Gb/s because of a speed path in the PRBS checker. ## Acknowledgements: The authors thank J. Poulton, M. Horowitz, G.-Y. Wei and K.-Y. Chang for discussions and D. Liu and J. Kim for CAD support. This work was supported in part by the Defense Advanced Research Projects Agency under ARPA order E253 and monitored by the U.S. Army under contract DABT63-96-C-0039. ## References: [1] Dally, W. and J. Poulton, "Transmitter Equalization for 4Gb/s Signaling," [16] Interconnects, 1996. [2] Gu, R. et al., "A 0.5 – 3.5Gb/s Low-Power Low-Jitter Scrial Data CMOS Transceiver," ISSCC Digost of Technical Papers, pp. 362-353, Feb., 1997. [3] Fiedler, A. et al., "A 1.0625 Gb/s Transceiver with 2x-Oversampling and Transmit Signal Pre-Emphasis," ISSCC Digest of Technical Papers, pp. 238-239, Feb., 1997. [4] Montanaro, J. et al., "A 160MHz, 32b, 0.5W CMOS RISC Microprocessor," [EEE JSSC, vol. 31, no. 11, Nov., 1996. 0.25µm CMOS nwell Process Nominal Supply 2.5V TX: 5.3Gb/s, DLL: 5.3Gb/s, Rx, 4Gb/s Max Speed Tx Driver & Test Logic (w/out Eq.): 20mW Tx Driver & Test Logic (w/ Eq.): 30mW at 2.5V. 4Gb/s, 20mV Swing 2 DLL: 50mW BER at 4Gb/s, 20mV Swing <10.14 Tx Driver & Test Lagle: 160 x 200µm² Active Area DLL: 120 x 200μm² Rx Input Samplers & Test Logic: 250 x 100µm² OC Synthesized Logic: 820 x 160µm² 8mV (for BER <10.3) Min Swing at Gb/s Jitter (quiet supply) at 4Gb/s 16.5ps (p-p) Jitter Supply Sensitivity at 4Gb/s 0.08os/mV Figure 15.3.8: Performance summary. Figure 15.3.1; Transmitter. Figure 15.3.3: Receiver. Figure 15.3.5: Rx timing margin. Figure 15.3.2: Delay-locked loop. Figure 15.3.4: Eye diagrams & jitter histograms. Figure 15.3.6: Power consumption. Figure 15.3.7: See page 463.