## 29.4 A 16Gb/s 3.6pJ/b Wireline Transceiver with Phase Domain Equalization Scheme: Integrated Pulse Width Modulation (iPWM) in 65nm CMOS

Ashwin Ramachandran, Arun Natarajan, Tejasvi Anand

Oregon State University, Corvallis, OR

Asymmetric links such as memory interfaces and display drivers require the transmitter to perform necessary equalization, while the receiver remains simple and has minimal or no equalization capability. Traditionally, FFE-based equalization techniques on power-efficient voltage-mode drivers have been used on the transmit end. Based on the FFE tap resolution requirement, the output driver and pre-driver are divided into multiple segments. Although such a segmented FFE implementation helps to maintain a constant output termination impedance (50 $\Omega$ ) across all tap settings, it comes at the cost of (a) increased signaling power, and (b) increased switching power since multiple segments are required to achieve desired linearity [1]. Phase domain equalization techniques, such as pulse width modulation (PWM), can equalize the channel without increasing signaling power or segmenting the output driver. However, PWM encoding requires the insertion of a precise narrow pulse in every data bit, which necessitates very wide bandwidth in the high-speed data path, resulting in poor energy efficiency [2] and difficulty in scaling PWM encoding to higher data rates [3]. For example, creating a 10% duty cycle on a 64Gb/s PWM data stream would require a pulse width of 1.5ps with less than 1ps of rise/fall time at the transmitter output. Other phase domain pre-emphasis techniques are ineffective for high-loss channels [4]. In view of these limitations, we present a new phase-domain equalization technique: integrated pulse width modulation (iPWM) in a 16Gb/s transceiver, which can equalize 19dB of channel loss, while consuming 57.3mW power. Compared to state-of-the-art PWM designs, the proposed iPWM scheme achieves 36× better energy efficiency for the same data rate [2], and 3.2× higher data rate for the same energy efficiency [3].

Figure 29.4.1 shows the proposed iPWM concept. The iPWM modulation is built on the observation that it is possible to reduce post-cursor ISI by reducing the pulse width of the transmitted data bits. Pulse width reduction can be achieved by scaling and adding up the widths by which individual bits are reduced, and appending them at the end of consecutive identical digits (CIDs). Consequently, narrow pulse widths do not need to be precisely reproduced at the transmitter output. Importantly, the bandwidth requirement on the high-speed data path in case of iPWM is the same as that of NRZ modulation, which helps in increasing the data rates without exponentially increasing the switching power. Furthermore, the conventional PWM scheme has a lower limit on the minimum pulse width that can be transmitted because of the bandwidth limitation on the data path, which often results in over equalizing a low loss channel. On the other hand, the proposed iPWM scheme can change the pulse width of CIDs with high precision (~3ps in this work), which helps to equalize a wide range of channel losses. The equalizing nature of iPWM can be visualized from the measured power spectral density of transmitted data (see Fig. 29.4.1), which demonstrates the amplification of the high-frequency component of the spectrum.

Figure 29.4.2 shows the block diagram of the proposed transceiver. The transmitter consists of a 32b wide parallel PRBS generator, a 32:4 multiplexer, a 4-tap integrated pulse width modulator, and a source-series terminated output driver. The receiver consists of a two-stage continuous time linear equalizer (CTLE), amplifiers, quarter rate samplers, a 4:32 de-multiplexer, and a PRBS checker. An off-chip PLL is used to provide a half-rate clock to the transceiver. The iPWM consists of 3 independently tunable coefficients:  $\alpha_1$ ,  $\alpha_2$  and  $\alpha_3$ . The XOR and XNOR logic to generate necessary control signals for iPWM are implemented at lower data rates (before the 4:1 mux), which helps to reduce the switching power.

Figure 29.4.3 shows the block diagram of the proposed integrated pulse width modulation circuit. Tunable coefficients  $\alpha_1$ ,  $\alpha_2$  and  $\alpha_3$ , are implemented in 3 independent iPWM logic stages and are enabled with control signals, Ctrl1, Ctrl2, and Ctrl3, respectively. These stages are cascaded together, due to which the pulse width modification by the n<sup>th</sup> stage can be accumulated with the modification done by the n+1<sup>th</sup> stage. Such cascading results in a modular design, which can be scaled to accommodate more taps to equalize even higher loss in the channels. The Ctrl1, Ctrl2, and Ctrl3 signals are asserted high in the presence of 2, 3 and 4 consecutive identical digits in the data stream, respectively. When the Ctrl signal goes high, it limits the maximum output swing on the Dint node by pulling it towards V<sub>DD</sub>/2 (see Fig. 29.4.3). As a result, Dint settles to either V<sub>DD</sub> –  $\delta V$  (for logic 1) or  $\delta V$  (for logic 0). A lower swing on Dint helps to reduce the time to transition from 0-to-1 or 1-to-0, resulting in reduced CID pulse width. The CID pulse width reduction can be controlled using a binary weighted 4b control word ( $\alpha_n$ [n]) with a resolution of approximately 3ps.

Figure 29.4.4 shows the iPWM in action with measured data at the channel output. Without iPWM, the ISI caused due to 2, 3 and 4 CIDs results in undetected bits by the receiver. As the values of coefficients  $\alpha_1$ ,  $\alpha_2$  and  $\alpha_3$  are increased, the width of CIDs is reduced. Consequently, the ISI caused by CIDs is also reduced, resulting in horizontal and vertical eye opening.

Figure 29.4.5 shows the far-end output eye openings and BER bathtub curve with iPWM for two different channel loss profiles with PRBS7 data pattern. Operating at 16Gb/s, the proposed transmitter achieves  $80 \text{mV}_{\text{pp}}(\text{single-ended})$  and  $28 \text{mV}_{\text{pp}}(\text{single-ended})$  vertical eye opening with channels 1 and 2, respectively. For BER<10<sup>-12</sup>, the proposed transmitter achieves a horizontal eye opening of 24ps (0.38UI) and 16ps (0.26UI) with channels 1 and 2, respectively. The receiver performance is measured with various CTLE equalization settings. Operating at 16Gb/s, the receiver alone can equalize more than 3dB channel loss and achieves a horizontal eye opening of 42ps (0.67UI) and 12ps (0.19UI) with and without CTLE equalization for BER<10<sup>-12</sup>, respectively.

Operating at 16Gb/s, the complete transceiver consumes 57.3mW of power from 0.9V/1V/1.1V supply, can equalize more than 19dB loss and occupies an active area of 0.21mm<sup>2</sup>. The performance of the proposed transceiver is compared with the state-of-the-art in Fig. 29.4.6. This work proposed a new phase domain equalization technique and demonstrates a working prototype at 16Gb/s. The micrograph of the chip, which is fabricated in 65nm CMOS, is shown in Fig. 29.4.7.

## Acknowledgment:

We thank Berkeley Design Automation for providing Analog Fast Spice (AFS) simulator and Kai Zhan at the Oregon State University for help in testing.

## References:

[1] Y. Lu, et al., "Design and analysis of energy-efficient reconfigurable preemphasis voltage-mode transmitters," *IEEE JSSC*, vol. 48, no. 8, pp. 1898–1909, Aug. 2013.

[2] H. Cheng, A. C. Carusone, "A 32/16 Gb/s 4/2-PAM transmitter with PWM preemphasis and 1.2 Vpp per side output swing in 0.13-µm CMOS," in Proc. *IEEE CICC*, Sept. 2008, pp.635-638.

[3] S. Saxena, et al., "A 5 Gb/s energy-efficient voltage-mode transmitter using time-based de-emphasis," *IEEE JSSC*, vol. 49, no. 8, pp. 1827-1836, Aug. 2014.
[4] J. F. Buckwalter, et al., "Phase and amplitude pre-emphasis techniques for low-power serial links," *IEEE JSSC*, vol. 41, no. 6, pp. 1391-1399, June 2006.

[5] T. Musah, et al., "A 4–32 Gb/s bidirectional link with 3-tap FFE/6-tap DFE and collaborative CDR in 22 nm CMOS," *IEEE JSSC*, vol. 49, no. 12, pp. 3079-3090, Dec. 2014.



and mathematical representation.



Figure 29.4.3: Block diagram of the proposed integrated pulse width modulation logic and timing diagram in the presence of consecutive identical digits.





Figure 29.4.5: Insertion loss profiles of the channels. BER bathtub curves at the channel outputs with equalized 16Gb/s PRBS-7 data. Equalized eye diagrams at the far-end channel output.













Figure 29.4.4: Measured channel output demonstrating ISI reduction in the presence of CIDs as the iPWM is turned on and coefficients are increased.



Figure 29.4.6: Performance summary and comparison with state-of-the-art designs.

| 1.2mm         Image: Constraint of the proposed transceiver. |  |
|--------------------------------------------------------------|--|
|                                                              |  |
|                                                              |  |