## 8.7 A 4-to-10.5Gb/s 2.2mW/Gb/s Continuous-Rate Digital CDR with Automatic Frequency Acquisition in 65nm CMOS

Guanghua Shu<sup>1</sup>, Woo-Seok Choi<sup>1</sup>, Saurabh Saxena<sup>1</sup>, Tejasvi Anand<sup>1</sup>, Amr Elshazly<sup>2</sup>, Pavan Kumar Hanumolu<sup>1</sup>

<sup>1</sup>University of Illinois, Urbana, IL, <sup>2</sup>Intel, Hillsboro, OR

Continuous-rate clock-and-data recovery (CDR) circuits with automatic frequency acquisition offer flexibility in both optical and electrical communication networks, and minimize cost with a single-chip multi-standard solution. The two major challenges in the design of such a CDR are: (a) extracting the bit-rate from the incoming random data stream, and (b) designing a wide-tuning-range low-noise oscillator. Among all available frequency detectors (FDs), the stochastic divider-based approach has the widest frequency acquisition range and is well suited for sub-rate CDRs [1]. However, its accuracy strongly depends on input transition density (0  $\leq \rho \leq$  1), with any deviation of  $\rho$ from 0.5 (50% transition density) causing  $2\times(\rho$ -0.5)×10<sup>6</sup>ppm of frequency error. In this paper, we present an automatic frequency-acquisition scheme that has unlimited range and is immune to variations in transition density. Implemented using a conventional bang-bang phase detector (BBPD), it requires minimum additional hardware and is applicable to sub-rate CDRs as well. Instead of using multiple LC oscillators that are carefully designed to cover a wide frequency range [2,3], a ring-oscillator-based fractional-N PLL is used as a digitally controlled oscillator (DCO) to achieve both wide range and low noise, and to decouple the tradeoff between jitter transfer (JTRAN) bandwidth and ringoscillator-noise suppression.

Figure 8.7.1 depicts the developed digital CDR architecture. It is composed of a frequency-locked loop (FLL), and a delay- and phase-locked loop (D/PLL). Both the FLL and D/PLL are updated using early/late (E/L) signals provided by the BBPD. In the FLL, frequency-detection logic block (FDL) operates on E/L signals and drives the DCO to within the pull-in range of the D/PLL through accumulator ACC<sub>F</sub>. Both loss-of-lock detection (LOLD) and lock detection (LD) needed to ensure seamless switching between data-rates are also implemented in the FDL. LOLD triggers a new frequency acquisition when the error ( $\Delta F = F_{DCO} - F_{DIN}$ ) between DCO frequency and data rate exceeds 1000ppm. Lock is declared when  $\Delta F$  is smaller than 500ppm.

The D/PLL is composed of a digital DLL and a digital PLL, and can be viewed as the digital equivalent of the architecture reported in [2]. Similar to its analog counterpart, the digital D/PLL features a decoupled JTRAN bandwidth and jitter tolerance (JTOL) corner frequency, and exhibits well-controlled JTRAN bandwidth even in the presence of BBPD gain variations caused by input jitter [3]. Unlike [2], our D/PLL does not need large on-chip capacitors and the DCO is implemented using a fractional-N PLL employing a single ring oscillator instead of multiple LC oscillators. Additionally, to maximize JTOL, the digitally controlled delay line (DCDL) is biased at its mid-delay point in steady state by the path containing gain block  $K_0$  and accumulator ACC<sub>0</sub>. The path containing divide-by-H and accumulator ACC<sub>H</sub> is used to prevent false locking as discussed later.

The principle behind the BBPD-based frequency detector (FD) is illustrated in Fig. 8.7.2. Consider the transfer function (TF) of a conventional BBPD in which the output changes sign at  $\Delta \Phi = n\pi$  for all integer values of *n*. Due to this, BBPD output is typically considered to be valid only if  $\Delta \Phi$  lies between  $-\pi$  and  $\pi$ . This condition is violated in the presence of frequency error since the phase error accumulates indefinitely, causing BBPD to periodically switch between consecutive E and L signals as indicated in Fig. 8.7.2. However, the number of consecutive E (or L) is dictated by the magnitude of  $\Delta F$ , with a larger  $\Delta F$  resulting in a fewer number of consecutive E (or L) signals and vice versa. Based on this behavior, we seek to estimate the magnitude of  $\Delta F$  by measuring the number of consecutive E (or L) signals. Inside FD, an accumulator,  $ACC_{E/L}$ , integrates the BBPD output and is reset whenever the BBPD output changes sign. The peak value of ACC\_{\text{E/L}} can be calculated to be  $N_{\text{P}}$  =  $\rho F_{\text{DIN}}/2\Delta F$  and therefore,  $\Delta F = \rho F_{DIN}/2N_{P}$ . During frequency acquisition,  $\Delta F$  is reduced to be within the pull-in range ( $\Delta F_P$ ) of the D/PLL by increasing the DCO frequency until N<sub>P</sub> exceeds the desired threshold  $N_{TH} = \rho F_{DIN}/2\Delta F_P$ . This is implemented by incrementing the DCO frequency control accumulator,  $ACC_{F}$ , if  $ACC_{E/L}$  is less than  $N_{TH}$  when the BBPD output changes sign (see Fig. 8.7.2). Lock is declared as soon as  $ACC_{E/L}$  exceeds  $N_{TH}$ . Note that this frequency detection scheme does not provide the sign of  $\Delta F$ . The DCO is reset to its lowest frequency at the start of acquisition process, so that  $\Delta F$  is guaranteed to be always negative. This also prevents harmonic locking.

The accuracy of the frequency detection scheme depends on  $\rho$  and data/clock jitter ( $\Phi_j$ ), as quantified by the tabulated frequency error in Fig. 8.7.2. However, setting N<sub>TH</sub> corresponding to  $\rho = 1$  (i.e., N<sub>TH</sub> =  $F_{DIN}/2\Delta F_P$ ) ensures that residual frequency error will always be smaller than  $\Delta F_P$  for any  $\rho$ . For example, N<sub>TH</sub> = 500 ensures that the DCO is always locked within 1000ppm to target data rate. Interestingly,  $\Phi_j$  improves accuracy as it is equivalent to setting a larger N<sub>TH</sub> with  $\Phi_j = 0$ . Very large  $\Phi_j$  can cause false updates of the DCO frequency, which can be prevented by not incrementing ACC<sub>F</sub> when the peak value of ACC<sub>E/L</sub> is smaller than its previous peak. Potential false locking caused by some degenerate input patterns manifests as reduced ACC<sub>E/L</sub> count. Therefore, separately counting the number of transitions using divider H and ACC<sub>H</sub>, and comparing to ACC<sub>E/L</sub> can detect false locking. Under this condition, incrementing the DCO frequency will pull the CDR away from false locking.

The DCO is implemented using a fractional-N PLL as shown in Fig. 8.7.3. Frequency control word (FCW), provided by the CDR logic, tunes the fractional-N PLL output frequency by varying its feedback divider from 4 to 15. When operated with a 500MHz reference clock, this translates to a wide DCO tuning range of 5.5GHz (2 to 7.5GHz). Since more than 2× frequency range is achieved, lower data rates can be easily accommodated using a divider chain [2]. Further, using a high reference clock extends the PLL bandwidth to adequately suppress ring-oscillator phase noise while maintaining the same quantization error of the fractional divider [4], and provides the freedom to use small JTRAN to filter input noise without degrading performance due to DCO phase noise. An on-chip digital multiplying DLL (MDLL) generates the 500MHz reference clock from a 50MHz crystal. It is important to note the crystal oscillator does not aid frequency acquisition, as its frequency has no relation to the input data rate. Fractional-N PLL helps suppress oscillator phase noise and can be eliminated if the ring oscillator meets JGEN specification (FCW drives ring oscillator directly in that case). Compared to using multiple LC oscillators [2,3], this approach covers a wide range with only one ring oscillator and has a linear relationship between FCW and data rate. A second-order  $\Delta\Sigma$  modulator is used to truncate FCW and drive the feedback divider. A 2<sup>nd</sup>-order loop filter along with the 3<sup>rd</sup> pole located at the drain of current-source transistor,  $M_1$ , is used to suppress  $\Delta\Sigma$ truncation error.

The prototype CDR is implemented in a 65nm CMOS process, occupies an active area of 1.63mm<sup>2</sup>, and is packaged in QFN88 package. At 10Gb/s, the CDR consumes 22.5mW and achieves a BER <  $10^{-12}$ . Measured residual frequency error versus locking threshold N<sub>TH</sub> (Fig. 8.7.4) shows that the FLL is immune to transition density. With N<sub>TH</sub> > 600, the frequency error is always less than 500ppm. Jitter transfer curves measured with different input jitter amplitudes illustrate that JTRAN bandwidth is independent of jitter amplitude even when using a BBPD (Fig. 8.7.5). The measured JTOL plot in Fig. 8.7.5 indicates a corner frequency of about 9MHz, which is much larger than JTRAN bandwidth of 0.2MHz. From 1.1 to 2.5MHz, JTOL is limited by DCDL range, and low frequency JTOL is restricted to 2Ul<sub>pp</sub> due to instrument limitation. Figure 8.7.6 tabulates the performance summary and the comparison. Compared to the results cited in the table, this work achieves the highest power efficiency, and lowest jitter while using ring-based oscillators. The die micrograph is shown in Fig. 8.7.7.

## Acknowledgment:

Intel Labs University Research Office, Kawasaki Microelectronics America, Inc., and NSF under CAREER Award EECS-0954969 supported this work. Berkeley Design Automation provided Analog Fast Spice (AFS) simulator. Twisted Traces Inc. and Seong-Joong Kim provided testing assistance.

## References:

[1] R. Inti, et al., "A 0.5-to-2.5-Gb/s reference-less half-rate digital CDR with unlimited frequency acquisition range and improved input duty-cycle error tolerance," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2011, pp. 438-439.

[2] D. Dalton, et al., "A 12.5-Mb/s to 2.7-Gb/s continuous-rate CDR with automatic frequency acquisition and data-rate read back," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2005, pp. 230-231.

[3] J. Kenney, et al., "A 9.95-11.1-Gb/s XFP transceiver in 0.13-µm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2006, pp. 232-233.

[4] D. Park, S. Cho, "A 14.2mW 2.55-to-3GHz cascaded PLL with reference injection, 800MHz delta-sigma modulator and  $255f_{rms}$  integrated jitter in 0.13µm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2012, pp. 344-346.



and jitter tolerance with PRBS7 data (BER threshold of 10°). Figure 8.7.6: CDR performance summary and comparison.

