# A 4-to-10.5 Gb/s Continuous-Rate Digital Clock and Data Recovery With Automatic Frequency Acquisition

Guanghua Shu, *Student Member, IEEE*, Woo-Seok Choi, Saurabh Saxena, Mrunmay Talegaonkar, Tejasvi Anand, *Student Member, IEEE*, Ahmed Elkholy, *Student Member, IEEE*, Amr Elshazly, *Member, IEEE*, and Pavan Kumar Hanumolu, *Member, IEEE* 

Abstract-A continuous-rate digital clock and data recovery (CDR) with automatic frequency acquisition is presented. The proposed automatic frequency acquisition scheme implemented using a conventional bang-bang phase detector (BBPD) requires minimum additional hardware, is immune to input data transition density, and is applicable to subrate CDRs. A ring-oscillatorbased two-stage fractional-N phase-locked loop (PLL) is used as a digitally controlled oscillator (DCO) to achieve wide frequency range, low noise, and to decouple the tradeoff between jitter transfer (JTRAN) bandwidth and ring oscillator noise suppression in conventional CDRs. The CDR is implemented using a digital D/PLL architecture to decouple JTRAN bandwidth from jitter tolerance (JTOL) corner frequency, eliminate jitter peaking, and remove JTRAN dependence on BBPD gain. Fabricated in a 65 nm CMOS process, the prototype CDR achieves error-free operation  $(BER < 10^{-12})$  from 4 to 10.5 Gb/s with pseudorandom binary sequence (PRBS) data sequences ranging from PRBS7 to PRBS31. The proposed automatic frequency acquisition scheme always locks the CDR loop within 1000 ppm residual frequency error in worst case. At 10 Gb/s, the CDR consumes 22.5 mW power and achieves a recovered clock long-term jitter of 2.2  $\rm ps_{rms}/24.0 \; ps_{pp}$ with PRBS31 input data. The measured JTRAN bandwidth and JTOL corner frequencies are 0.2 and 9 MHz, respectively.

*Index Terms*—Active repeater, automatic frequency acquisition, continuous-rate receivers, decouple jitter transfer (JTRAN)/jitter generation (JGEN), decouple JTRAN/jitter tolerance (JTOL), digital clock and data recovery (CDR), fractional-N phase-locked loop (PLL), high-speed serial link, jitter peaking, multiplying delay-locked loop, optical links, reference-less frequency-locked loop, supply regulator, wide-range digitally controlled oscillator (DCO).

## I. INTRODUCTION

**C** ONTINUOUS-RATE clock-and-data recovery (CDR) circuits capable of operating across a wide range of data rates offer flexibility in both optical and electrical communication networks. They can help satisfy specifications of

Manuscript received July 12, 2015; revised September 26, 2015; accepted October 20, 2015. Date of publication December 22, 2015; date of current version January 29, 2016. This paper was approved by Associate Editor Jack Kenney.

G. Shu, W.-S. Choi, S. Saxena, M. Talegaonkar, T. Anand, A. Elkholy, and P. K. Hanumolu are with the Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Champaign, IL 61801 USA (e-mail: gshu2@illinois.edu).

A. Elshazly is with Intel Corporation, Hillsboro, OR 97124 USA (e-mail: shazly@ieee.org).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2015.2497963

multiple standards using a single chip solution and can reduce cost when implemented using minimal number of external components such as capacitors and voltage-controlled crystal oscillators. However, it is very difficult to meet these requirements using a classical analog CDR architecture depicted in Fig. 1 [1], [2]. First, extracting the bit rate (frequency information) from incoming random data stream is difficult because of the limited range of conventional frequency detectors. Second, the design of a wide-tuning-range low-noise oscillator in a power and area-efficient manner is challenging. Third, jitter transfer (JTRAN) and jitter tolerance (JTOL) characteristics are set by the same loop parameters (as explained below), which complicates the CDR design, especially in the context of repeater applications. Stringent jitter peaking requirements in such applications also mandate a large loop filter capacitor that is difficult to integrate on chip [3]. Finally, low JTRAN required in many standards such as SONET increases jitter generation (JGEN) due to inadequate suppression of oscillator phase noise. Alternatively, this translates to increased oscillator power dissipation. These issues are further elaborated starting with frequency acquisition.

Automatic frequency acquisition loops are typically implemented using either a rotational frequency detector (RFD) or Quadri-correlator frequency detector (QFD) [3]–[7]. The main limitation of these frequency detectors is their limited frequency acquisition range, which is usually less than 50% of the target frequency. Therefore, dedicated coarse frequency detectors are necessary to extend the range for continuous-rate applications [3]. Recently, a divider-based stochastic reference clock generator (SRCG) approach that provides unlimited frequency acquisition range (can lock to any frequency within the tuning range of oscillators) was reported in [8] and [9]. However, the accuracy with which the oscillator is tuned to the data rate strongly depends on input data transition density  $\rho$  where  $0 \le \rho \le 1$ . Any deviation of  $\rho$  from 0.5 (a transition density of 50%) causes  $2 \times (\rho - 0.5) \times 10^6$  ppm residual frequency error. For instance, a 7 bit of pseudorandom binary sequence (PRBS7) data pattern (with  $\rho \approx 0.504$ ) causes about 8000 ppm frequency error, which is larger than the pull-in range of most conventional CDRs. In this paper, we present an automatic frequency acquisition scheme that: 1) is insensitive to transition density; 2) can achieve unlimited frequency acquisition range; and 3) is amenable for subrate CDR architectures.

0018-9200 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.



Fig. 1. Block diagram of a continuous-rate CDR with automatic frequency acquisition.



Fig. 2. (a) Analog D/PLL-based CDR using a large loop filter capacitor. (b) JTRAN and JTOL charactersitics.

Achieving wide tuning range and low noise simultaneously is a challenging design task. Ring oscillators can provide wide frequency range, but their phase noise is not adequate for highperformance CDR applications [8]. On the other hand, LC oscillators offer excellent phase noise performance, but their tuning range is limited. Carefully designed multiple LC tanks can cover a wide frequency range [3], [10] at the expense of excessive power and area consumption. In this paper, we embed a wide tuning range ring oscillator in fractional-N phase-locked loop (FNPLL) and use the FNPLL as a digitally controlled oscillator (DCO) to achieve both wide range and low noise. The FNPLL-based DCO also helps decouple the tradeoff between JTRAN bandwidth and JGEN due to ring oscillator noise in conventional CDRs.

In addition to limited frequency acquisition range and finite tuning range of the oscillator, classical CDRs also suffer from two other design tradeoffs. On one hand, the JTRAN bandwidth and JTOL corner frequency of a classical second-order CDR cannot be chosen independently as both of them are dictated by the higher of the two closed loop poles [3]. This is undesirable because JTRAN cannot be lowered without degrading JTOL. Also, intrinsic peaking resulting from placing the loop stabilizing zero in the feed-forward path is also problematic, especially in repeater applications. Delay/PLL (D/PLL) architecture reported in [3], [10]-[14] and shown in Fig. 2(b) removes the closed-loop zero and avoids jitter peaking. Furthermore, JTRAN bandwidth and JTOL corner frequency are decoupled with the JTRAN bandwidth governed by the low pole (mainly from PLL), and the JTOL corner frequency decided by the higher pole (mainly from DLL) [3]. On



Fig. 3. BBPD behavior in the presence of a frequency error.

the other hand, classical CDRs suffer from conflicting bandwidth requirements to meet JGEN and JTRAN specifications. Minimizing the amount of input jitter transferred to CDR output (recovered clock, RCK) requires low JTRAN, while a high JTRAN is needed to suppress oscillator noise, which is a major contributor of CDR JGEN. Hence, improving JGEN with low JTRAN requires a low-noise oscillator that consumes significant power and occupies large area [3], [10]. In this paper, a digital D/PLL architecture is proposed to overcome JTOL/JTRAN/JGEN tradeoffs.

The rest of this paper is organized as follows. The automatic frequency acquisition is detailed in Section II. The overall digital CDR architecture with proposed wide-range low-noise DCO is discussed in Section III followed by circuit implementation details of the proposed CDR in Section IV. The measured results are presented in Section V, and a summary of the key contributions is given in Section VI.

### II. AUTOMATIC FREQUENCY ACQUISITION

## A. Review of BBPD Operation

The proposed frequency detection scheme uses the properties of a conventional bang-bang phase detector (BBPD). So, it is instructive to first review the basic operation of a BBPD. A BBPD detects the sign of the phase error  $\Delta \Phi$  between incoming random data DIN and the RCK. Based on the sign of the phase error, BBPD provides early or late (E/L) information for the CDR loop to achieve phase locking. The input-output transfer function of a BBPD, depicted in Fig. 3, illustrates that the output changes sign whenever the input phase error crosses  $n\pi$ radians. Due to this behavior, BBPD output is usually considered to be valid only when  $\Delta \Phi$  lies between  $-\pi$  and  $\pi$ . This condition is violated in the presence of frequency error since the phase error accumulates indefinitely, causing BBPD to produce E/L signals alternatively.

However, taking a closer look at the BBPD behavior reveals some interesting properties (Fig. 3). We note that within each  $\pi$ interval of  $\Delta\Phi$ , BBPD outputs either consecutive E or L signals and the number of consecutive E (or L) signals  $N_P$  is inversely proportional to the frequency difference ( $\Delta F$ ) between DIN and RCK. In other words, if the number of consecutive E/L signals  $N_P = N_{P1}$  when  $\Delta F = \Delta F_1$ ,  $N_P = N_{P2} > N_{P1}$  when the frequency error  $\Delta F_2$  is slightly smaller than  $\Delta F_1$ . This



Fig. 4. Principle of proposed frequency acquisition scheme. (a) Block diagram of the BBPD-based FLL. (b) Illustration of frequency acquisition process.

is simply because it takes longer for the phase error to accumulate  $\pi$  radians with smaller frequency error. Similarly, an even smaller frequency difference  $\Delta F_3$  results in even larger number  $N_{P3}$  that is greater than both  $N_{P1}$  and  $N_{P2}$ . The key observation is that the frequency difference  $\Delta F_n$  is inversely proportional to the number of consecutive E/L signals  $N_{Pn}$ . This relationship is used in the proposed frequency acquisition scheme as discussed next.

## B. Principle of Proposed Frequency Acquisition

The block diagram of the proposed BBPD-based frequency locking loop (FLL) is shown in Fig. 4(a). Using E/L outputs of the BBPD, frequency detection logic (FDL) generates frequency error information, which is integrated by the accumulator  $ACC_F$  and used to update DCO frequency ( $F_{DCO}$ ). The process of frequency acquisition is illustrated in Fig. 4(b). At the beginning of frequency acquisition, DCO is reset to its lowest frequency. Using an accumulator,  $ACC_{E/L}$ , FDL accumulates E/L signals from BBPD until the sign of BBPD output changes polarity. When the sign changes,  $ACC_{E/L}$  resets and starts accumulating a new set of consecutive E/L information again. FDL increments accumulator  $ACC_F$  and updates the DCO frequency  $F_{\rm DCO}$  when BBPD output changes sign and  $N_P < N_{\rm TH}$  (the locking threshold). Lock detector declares frequency lock when  $N_P$  becomes greater than or equal to  $N_{\rm TH}$ . After that, the phase tracking loop takes over and achieves phase locking.

In practice, jitter  $(\Phi_i)$  may cause false updates of the DCO frequency since the sign of BBPD output is alternating when the phase relationship between DIN and RCK is within the jittery region (Fig. 5(b)). However, the jittery region provides no valid information about the frequency error; thus, the false update can be prevented by not increasing  $ACC_F$  when the peak value of  $ACC_{E/L}$  is smaller than its previous peak. Another common

|                                                                                                                                                                                              | ΔF/F <sub>DIN</sub><br>(ppm) | ρ                                   | 0.1 | 0.5  | 1.0  | )    |  |  |  |  |  |  |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------|-------------------------------------|-----|------|------|------|--|--|--|--|--|--|
| $N_{\rm TH} = N_{\rm P} = \rho \frac{F_{\rm DIN}}{\Delta F} \frac{\pi}{2\pi}$ $\Rightarrow \frac{\Delta F}{F_{\rm DIN}} = \frac{\rho}{N_{\rm P}} \frac{\pi}{2\pi} = \frac{\rho}{2N_{\rm P}}$ | N <sub>TH</sub> = 250        |                                     | 200 | 1000 | 200  | 2000 |  |  |  |  |  |  |
|                                                                                                                                                                                              | N <sub>TH</sub> = 500        |                                     | 100 | 500  | 100  | 1000 |  |  |  |  |  |  |
|                                                                                                                                                                                              | N <sub>тн</sub> =            | 66                                  | 330 | 660  | 0    |      |  |  |  |  |  |  |
| (a)                                                                                                                                                                                          |                              |                                     |     |      |      |      |  |  |  |  |  |  |
|                                                                                                                                                                                              |                              | ρ                                   | 0.1 | 0.5  | 1.0  |      |  |  |  |  |  |  |
| $2\pi$ $\Phi_j$                                                                                                                                                                              | N                            | $\Phi_{j} = 0$ $\Delta F / F_{DIN}$ | 100 | 500  | 1000 |      |  |  |  |  |  |  |
| $N_{\text{TH}} = N_{P} = \rho \frac{r_{DIN}}{\Delta F} \frac{\pi \Phi_{j}}{2\pi}$                                                                                                            | N <sub>TH</sub> = 500        | (ppm)                               |     |      |      | 1    |  |  |  |  |  |  |
| $\Rightarrow \frac{\Delta F}{F_{DIN}} = \frac{\rho}{N_{P}} \frac{\pi - \Phi_{j}}{2\pi}$                                                                                                      |                              | $\Delta F/F_{DIN}$<br>(ppm)         | 75  | 375  | 750  |      |  |  |  |  |  |  |
|                                                                                                                                                                                              | (b)                          |                                     |     |      |      | -    |  |  |  |  |  |  |

Fig. 5. Residual frequency error dependence on transition density. (a) w/o jitter. (b) w/ iitter.

issue in automatic frequency acquisition is harmonic locking where the steady-state DCO frequency equals K times the data rate. In this design, starting the DCO from its lowest frequency ensures that the DCO locks to the target frequency before it reaches any harmonic frequencies, thus avoiding the harmoniclock problem.

## C. Analysis of Proposed Frequency Acquisition

The number of consecutive E/L signals  $(N_P)$  not only depends on the frequency error  $\Delta F$  but also on transition density  $\rho$  and jitter  $\Phi_i$ . First, consider the case without jitter as shown in Fig. 5(a), where  $F_{\text{DIN}}$  is input data rate. One data bit of DIN spans  $2\pi$  radians, and the BBPD output changes sign when RCK and DIN phase difference exceeds  $\pi$  radians. In each  $\pi$  radians, the number of consecutive E/L signal is

$$N_P = \rho \frac{F_{\rm DIN}}{\Delta F} \frac{\pi}{2\pi}.$$
 (1)

Therefore, the relative frequency error  $\left(\frac{\Delta F}{F_{\text{DIN}}}\right)$ ,  $N_P$ , and  $\rho$ are related by

$$\frac{\Delta F}{F_{\rm DIN}} = \frac{\rho}{2N_P}.$$
(2)

Tabulating the above equation for different values of  $N_P$  and  $\rho$  reveals that the relative frequency error is bounded within 1000 ppm for any transition density  $\rho$  between 0 and 1 when the locking threshold  $N_{\rm TH} = N_P$  is set to 500. In other words, residual frequency error in the proposed frequency acquisition scheme can be made to be well within the pull-in range of a CDR, independent of the input transition density.

As shown in Fig. 5(b), the effect of input data jitter  $\Phi_i$  can be incorporated into the relative frequency error expression as follows:

$$\frac{\Delta F}{F_{\rm DIN}} = \frac{\rho}{N_P} \frac{\pi - \Phi_j}{2\pi}.$$
(3)



Fig. 6. Residual frequency error comparison between proposed scheme and SRCG in [8].



Fig. 7. Digital implementation of a D/PLL CDR.

Interestingly, as long as the jitter is not so large as to close the eye, jitter reduces residual frequency error compared to the case when there is no jitter. In other words, increasing jitter has the same effect as making the locking threshold larger.

Compared to the frequency acquisition based on SRCG in [8], the proposed scheme is much less sensitive to input transition density as shown in Fig. 6. With PRBS7 input data ( $\rho \approx 0.504$ ), the residual frequency error is as high as 8000 ppm in [8], while the error is stable around 500 ppm (with  $N_{\rm TH} = 500$ ) for any PRBS sequence in the proposed scheme.

## III. OVERALL CDR ARCHITECTURE

A simplified block diagram of the proposed digital D/PLL CDR architecture is shown in Fig. 7 [15]. It consists of three loops: 1) a frequency-locked loop (FLL); 2) a delay-locked loop (DLL); and 3) a PLL. Using the half-rate BBPD outputs, as described earlier, FLL brings the DCO frequency to be within 500 ppm of the target frequency (half of the data rate). The DLL adjusts phase of the input data using a digitally controlled delay line (DCDL) and locks it to that of the RCK. In other words, the DLL in itself can be viewed as a Type-I CDR. The PLL integrates the BBPD output using accumulator  $ACC_I$  and drives the DCO toward frequency lock. This behavior is analogous to that of integral control path in a classical Type-II CDR. In other words, the DLL and PLL implement the proportional and integral control portions of the CDR, respectively.

Similar to its analog D/PLL counterpart shown in Fig. 2, the proposed digital CDR also decouples the tradeoff between JTRAN bandwidth and JTOL corner frequency. However, implementing the loop filter in digital domain eliminates large loop filter capacitor needed in the analog D/PLL. It is also interesting to note that JTRAN bandwidth of the D/PLL is



Fig. 8. Detailed block diagram of the proposed CDR.

governed only by the ratio of DCO and DCDL gains [10], [12]. As a result, JTRAN is independent of BBPD gain and hence it does not depend on input jitter. This is a considerable advantage compared to conventional bang–bang CDRs.

The detailed schematic of the proposed CDR is shown in Fig. 8 [15]. Input data DIN is buffered using a two-stage limiting amplifier before feeding it to the DCDL. BBPD output is demultiplexed by a factor of four in the DLL after carefully evaluating the tradeoff between increased loop delay caused by larger demultiplexing factor and increased power dissipation of  $ACC_P$  at smaller demultiplexing ratio. It is important to reduce loop latency because large loop delay severely limits JTOL performance [16]. By contrast, the loop latency is not as critical in the PLL. Therefore, the BBPD output is demultiplexed by a factor of 32 in the integral path and the FDL to reduce digital logic power. The outputs of  $ACC_I$  and  $ACC_F$  are summed to generate FCW for the DCO. The fractional-N PLL-based DCO provides four equally spaced sampling clock phases (RCK) for half-rate BBPD.

Because the CDR is designed to operate across a very wide range of data rates, it is susceptible to false locking. We propose a false-locking prevention scheme that is based on the observation that the sum of early and late outputs of the BBPD must equal the number of input data transitions in the frequency-locked state. The number of data transitions  $(N_{\rm DT})$ counted using divider H and accumulator  $ACC_H$  is compared to the number of E/L outputs  $(N_{\rm E/L})$  provided by ACC<sub>E/L</sub>. If  $N_{\rm DT} \neq N_{\rm E/L}$ , FDL logic continues to increase the frequency and drives the DCO away from false locking. Both loss-of-lock detection (LOLD) and lock detection (LD) are implemented to ensure seamless switching between data rates. Furthermore, in order to maximize JTOL performance, the DCDL is biased at its mid-delay point in steady state by the path containing gain block  $K_O$  (with a value of 1/16) and accumulator ACC<sub>O</sub>. Since in steady state, the average input to  $ACC_O$  is zero, the DCDL operates around its mid-delay point and provides a maximum possible delay range of about  $\pm 100$  ps. This technique is fairly straightforward to realize in digital implementation compared to an analog D/PLL [3], where an extra  $g_m$  control path is required to properly bias the delay line and has to be always on to compensate the capacitor leakage.

#### **IV. CIRCUIT IMPLEMENTATION**

Thanks to the mostly digital nature of the proposed CDR, a large number of circuit blocks are fully synthesized using



Fig. 9. Schematic of the digitally controlled delay line (DCDL).

standard cells. The half-rate BBPD is implemented using a conventional Alexander phase detector with improved senseamplifier flip-flops as data and edge samplers [12], [17]. The front-end limiting amplifier incorporates two CML stages and a CML-to-CMOS conversion stage [12]. Offset correction is performed by independently controlling positive/negative-side termination voltages. A minimum input swing of 15 mV is required to achieve BER <  $10^{-12}$ . The design details of other critical analog building blocks including the DCDL and the ring-oscillator-based fractional-N PLL used as the DCO are presented next.

## A. Digitally Controlled Delay Line

The schematic of DCDL is shown in Fig. 9. A two-stage limiting amplifier converts low-swing input data to full-swing CMOS levels and feeds it to delay line controlled by code  $D_P$ . The delay line is implemented using a cascade of 16 pseudodifferential CMOS delay stages that provide a total delay of about 200 ps, which is 2  $\mathrm{UI}_\mathrm{pp}$  at 10 Gb/s input data rate. Delay tuning is performed by varying the output capacitance of delay stages. The DCDL control encoder is designed to distribute the desired delay equally among all delay stages to improve the digital control to delay output linearity [18]. Compared to CML-based delay buffers used in [10], the CMOS delay stages consume lower power and occupy smaller area. For instance, 17-stage CML-based delay line in [10] consumes about 60 mW while achieving a delay of about 150 ps, whereas the proposed CMOS delay line dissipates only about 5 mW while providing 200 ps delay. However, finite bandwidth of CMOS delay stages adds intersymbol interference (ISI) to the input data and their poor power supply noise sensitivity increases jitter. Extensive transistorlevel simulations indicated that, with 16-stage DCDL, the ISI degradation can be limited to be within 5% UI with 10 Gb/s PRBS31 input data at worst-case process, supply voltage, and temperature (PVT) condition (about 1% UI additional ISI in nominal condition). Supply noise sensitivity is reduced by powering the delay line using a linear low dropout regulator operating from a 1.2 V supply voltage. Simulated power supply rejection ratio of the regulator is about -20 dB at 10 MHz.



Fig. 10. Schematic of ring oscillator-based DCO implemented using a fractional-N PLL.

## B. Digitally Controlled Oscillator

Ring oscillators have wide tuning range and can provide multiple phases but their relatively poor phase noise limits their usage in many applications. This is especially the case in a D/PLL-based CDR because DCO phase noise suppression bandwidth (which is equal to the JTRAN bandwidth) is much lower than that of a conventional CDR. In view of this, we seek to use a ring oscillator-based fractional-N PLL as a DCO wherein the output frequency is varied by controlling the feedback division ratio using the FCW as illustrated in Fig. 10. Since ring oscillator is embedded inside the PLL, its phase noise is suppressed by the feedback loop with much higher bandwidth. The FCW is equal to the sum of control words generated by frequency acquisition control path  $D_F$  and the integral path  $D_I$ . Because clock domain (CLK<sub>CDR</sub>) in which FCW is generated has no fixed phase relationship with the clock domain (CLK<sub>FB</sub>) in which  $\Delta\Sigma$  modulator operates, FCW is synchronized to  $CLK_{FB}$  by the synchronization block shown in Fig. 11. Metastability is mitigated as long as  $CLK_{FB}$  is higher than twice the frequency of  $CLK_{CDR}$ .

The fractional-N PLL is implemented using the chargepump-based delta-sigma ( $\Delta\Sigma$ ) architecture [19]. In addition to



Fig. 11. FCW synchronization. (a) Schematic of synchronizer. (b) Timing diagram.

a phase frequency detector (PFD), loop filter, charge-pump, and a voltage-controlled oscillator (VCO), it consists of a 4-15 multimodulus divider that is dithered by a  $\Delta\Sigma$  modulator. The  $\Delta\Sigma$ modulator truncates 17-bit FCW<sub>SYN</sub> (which is equal to the sum of FLL and integral control words,  $D_F$  and  $D_I$ , respectively) and generates a sequence of integers ranging from 4 to 15, with a running average equal to the desired fractional division ratio. The quantization error introduced by the  $\Delta\Sigma$ modulator is suppressed by low-pass filtering action of the PLL feedback loop. While it is possible to reduce the impact of quantization error on output phase noise to negligible levels by reducing the PLL bandwidth, the contribution of VCO phase noise increases resulting in a conflicting noise bandwidth tradeoff. Consequently, choosing the PLL bandwidth that suppresses both the  $\Delta\Sigma$  quantization error and VCO phase noise adequately becomes very challenging.

In this work, a two-stage architecture is employed to alleviate this tradeoff [20]. The first stage implemented using a digital multiplying DLL (MDLL) [21] multiplies a 50 MHz crystal oscillator output and generates a 500 MHz output clock that acts as the reference clock to the second stage  $\Delta\Sigma$  fractional-N PLL. Because oversampling ratio of the  $\Delta\Sigma$  modulator is increased by a factor of 10, the PLL bandwidth can be increased to adequately suppress ring oscillator phase noise without increasing the contribution of  $\Delta\Sigma$  truncation error to output jitter [20], [22]. An additional pole located at the drain of current-source transistor is introduced to further suppress the  $\Delta\Sigma$  truncation error. It is important to note that the crystal oscillator does not aid frequency acquisition, as its frequency has no relation to the input data rate.

The digital MDLL is adopted for reference multiplication due to its superior phase noise performance compared to a conventional PLL [21], [23]. As shown in Fig. 12, every rising edge of the input reference clock ( $F_{\text{REF}}$ ) replaces 10th rising edge

of the VCO output to reset phase noise accumulation and thus achieves good phase noise performance. The frequency of the VCO is tuned by a integral path consisting a BBPD that detects the phase difference between oscillator output and input reference clock, an accumulator, ACC, and a  $\Delta\Sigma$  digital-to-analog converter (DAC) clocked at 125 MHz that drives the oscillator. A fourth-order low-pass filter is used to suppress truncation error of digital  $\Delta\Sigma$  modulator.

In the fractional-N PLL, a single four-stage pseudodifferential ring oscillator is chosen to support a data rate range from 4 to 10.5 Gb/s. Since more than  $2 \times$  range is achieved, lower data rates can be supported by using dividers [3]. The control voltage  $V_{\rm C}$  needs to swing by more than 300 mV to support such a wide frequency tuning range. In order to improve the linearity of charge pump across a large control voltage range, a feedback loop is used to adjust the bias for the up current source adaptively. This adaptive biasing control reduces reference spur by about 3 dB, and is also effective in suppressing in-band fractional spurs. With a PLL bandwidth of about 5 MHz, a minimum of 7 dB in-band fractional spur suppression is observed as shown in Fig. 13. The intuition behind this improvement is that the adaptation loop is fast enough to track the control voltage variation caused by in-band fractional spur, so as to suppress the spur level, whereas for high-frequency perturbations, the adaptation loop cannot respond fast enough, so the spur levels remain the same. Further, transistors  $M_1$  and  $M_2$  are included to minimize the current mismatch due to charge sharing [24]. To account for the drop across  $M_3$ ,  $M_4$ , and  $M_5$  are introduced, which also improve the current-mirroring accuracy [25]. The loop filter shares the same supply with oscillator to improve the supply noise sensitivity. The overall power consumption of the DCO is about 7.5 mW, of which MDLL and PLL consume 2.5 and 5 mW, respectively.

## V. EXPERIMENTAL RESULTS

The prototype CDR was fabricated in a 65 nm CMOS process and it occupies an active area of 1.63 mm<sup>2</sup>. The chip micrograph is shown in Fig. 14. The die was packaged in a 88 pin QFN (QFN88) package. The area and power breakdown of the prototype CDR are shown in Fig. 15. The DCO, including MDLL and fractional-N PLL, takes about one half the area and one third the power at 10 Gb/s input data rate. Compared to using multiple LC tanks, the proposed DCO is more efficient in both area and power [3], [10]. Because the area of the DCO is dominated by the loop filter capacitors in MDLL and fractional-N PLL, recently reported digital implementations could further reduce DCO area. In this section, we report the performance of a standalone DCO followed by complete CDR results.

## A. DCO Results

The fixed 50 MHz reference clock to the DCO was provided by an off-chip crystal with RMS jitter of 813 fs integrated from 1 kHz to 20 MHz. A power spectrum analyzer (PSA E4440A) and a signal source analyzer (SSA E5052B) were used to measure spectrum and phase noise performance, respectively. The measured operating range of the DCO is 2 to 7 GHz. We present



Fig. 12. Block diagram of digital MDLL.



Fig. 13. Charge pump with adaptive biasing. (a) Circuit schematic. (b) Measured in-band fractional spur performance.



Fig. 14. Die micrograph.

measurement results obtained at an output frequency of 5 GHz, which corresponds to 10 Gb/s CDR operation. Fig. 16 illustrates the power spectrum of the MDLL at an output frequency of 500 MHz. The reference spur is about -57 dB, which translates to a deterministic jitter of 0.28 ps [26]. The measured MDLL and DCO output phase noise plots are shown in Fig. 17. The phase noise of the MDLL at 1 MHz frequency offset from 500 MHz carrier frequency is -126 dBc/Hz and the integrated



Fig. 15. Power and area breakdown of the prototype CDR.



Fig. 16. Measured MDLL output power spectrum.

jitter from 1 kHz to 40 MHz is 1.06  $\rm ps_{rms}$ . The phase noise of the overall DCO (measured at the output of FNPLL) at 1 MHz frequency offset is  $-104\,\rm dBc/Hz$  and the integrated jitter from 1 kHz to 40 MHz is 1.41  $\rm ps_{rms}$ . With a fractional division ratio of 99.998 (output frequency at 4.9999 GHz), the worst-case integrated jitter of the DCO is 2.30  $\rm ps_{rms}$ . The 20 dB increase in phase noise from the MDLL output to DCO output is due to frequency multiplication by about 10 in the FNPLL.

## B. FLL Results

The transient behavior of the frequency acquisition process is captured with the SSA E5052B and the result is shown in Fig. 18. Note that DCO resets to its lowest frequency at the beginning of the acquisition and the FLL monotonically



Fig. 17. Measured phase noise performance of FNPLL (DCO).



Fig. 18. Measured frequency acquisition behavior from initial reset condition to 6 Gb/s data rate.

increases the DCO frequency until it acquires locking to the desired data rate of 6 Gb/s. The update step size of the DCO frequency in this design is fixed to about 50 ppm, which resulted in the frequency acquisition time of about 230  $\mu$ s. Faster acquisition can be achieved by controlling the update step size adaptively according to residual frequency error, which is readily available in the form of digital code.

The lock detector declares frequency locking when the number of consecutive E/L signal reaches the locking threshold  $N_{\rm TH}$ . Thereafter, D/PLL takes over the control and achieves phase locking. The seamless data rate switching capability of the CDR is verified by changing the input data rate from 6 to 9.5 Gb/s and measuring the acquisition behavior (see Fig. 19). When the data rate is switched, loss of lock detector (LOLD) detects the frequency difference, and triggers a new frequency acquisition process by reseting the DCO frequency to its lowest frequency and activating the FLL. As illustrated in Fig. 19, the FLL relocks to the new data rate (9.5 Gb/s), thus validating the proposed continuous-rate CDR's ability to detect data



Fig. 19. Measured frequency acquisition behavior when the data rate is switched from 6 to 9.5 Gb/s.



Fig. 20. Measured residual frequency error versus locking threshold  $N_{\rm TH}$  at different transition densities.

rate switching automatically. Note that the transient time while locking to a new data rate is dominated by the loss of lock detection time. This long time is due to the LOLD choice in this particular design, which adopts a 27-bit counter for better detection accuracy of frequency error before initiating a reacquisition. Fig. 6 suggests a possible method to reduce LOL detection time. Note that a frequency error of about 1000 ppm leads to a peak  $ACC_{E/L}$  value of about 250. Therefore, reacquisition can be initiated when this condition is detected, thereby drastically reducing LOLD time to the order of few microseconds. Under this condition, transient time for locking to a new data rate will be dominated by reacquisition time, which is about 600 µs in this design.

The sensitivity of the proposed frequency acquisition scheme to variations in input transition density is quantified by plotting the residual frequency error  $\Delta F$  versus locking threshold  $N_{\rm TH}$ for different transition densities ranging from  $\rho = 1$  to  $\rho = 0.32$ (see Fig. 20).  $\Delta F$  is equal to the frequency difference between the DCO frequency after the FLL has locked and the desired DCO frequency (equal to half the data rate). As expected, based on the analysis in Section II,  $\Delta F$  is maximum when  $\rho = 1$  and monotonically decreases for smaller values of  $\rho$ . Furthermore,



Fig. 21. Measured residual frequency error versus locking threshold  $N_{\rm TH}$  at different input jitter amplitudes with PRBS7 input data.



Fig. 22. Measured JTRAN with different input jitter amplitudes.

for  $N_{\rm TH}$  greater than 500,  $\Delta F$  is less than 1000 ppm, independent of the transition density. Because the pull-in range of D/PLL is more than 1000 ppm, the proposed CDR's frequency acquisition behavior is not affected by the transition density as compared to [8]. While it may appear that  $\Delta F$  can be reduced to arbitrarily small values simply by setting  $N_{\rm TH}$  to be very large, in practice, FLL may not achieve locking for too large a  $N_{\rm TH}$ since there may not be  $N_{\rm TH}$  number of consecutive E/L signals within the frequency update period. To avoid this,  $N_{\rm TH}$  must be set large enough such that the resulting  $\Delta F$  is well within the pull-in range of the CDR. Fig. 21 shows the residual frequency error  $\Delta F$  versus locking threshold  $N_{\rm TH}$  at different input jitter amplitudes with PRBS7 input data. With  $N_{\rm TH}$  = 500, residual frequency error is less than 500 ppm for input jitter less than 0.3 UI. Note that, with 0.3 UI of input jitter, the frequency acquisition process is not so robust when  $N_{\rm TH}$  is 700, because the region for consecutive E/L signal is greatly reduced.

## C. CDR Results

The bit error rate (BER) performance of the CDR was characterized with different PRBS sequences using Agilent BERT N4901B. Input phase modulation needed to measure JTRAN and JTOL was provided by Agilent E4433B RF signal generator and the RCK jitter was measured using sampling



Fig. 23. Measured JTOL with PRBS7 input data at 10 and 4 Gb/s.



Fig. 24. Measured RCK jitter with PRBS31 input data at: (a) 4 Gb/s and (b) 10 Gb/s.

oscilloscope DSA8200. The CDR achieves error-free operation  $(BER < 10^{-12})$  across data rates ranging from 4 to 10.5 Gb/s. The channel used for characterizing the CDR contains 1 m coaxial SMA cable, 2 inch on-board FR4 PCB trace, and parasitics associated with QFN88 package. The overall loss is about  $\Phi_{\rm DIN}(s)$ 5-6 dB at 5 GHz. The measured JTRAN function  $\Phi_{\rm REF}(s)$ magnitude response is shown in Fig. 22. Because JGEN due to oscillator phase noise is greatly suppressed by wide bandwidth fractional-N PLL, a very low JTRAN bandwidth was chosen to suppress input jitter. The measured JTRAN bandwidth is about 0.2 MHz. JTRAN was also measured with different input jitter amplitudes ranging from 0.01 UI to more than 0.2 UI (more than 20x variation) and the results are shown in Fig. 22. As expected, JTRAN bandwidth is almost independent of input jitter even while using a BBPD [10], [27], [28]. No JTRAN peaking was observed at any input jitter amplitude.

Measured JTOL plot at 10 and 4 Gb/s with PRBS7 input data is shown in Fig. 23. JTOL corner frequency is about 9 MHz at 10 Gb/s (4 MHz at 4 Gb/s), which is much larger than JTRAN bandwidth of 0.2 MHz. Thus, the proposed digital D/PLL preserves the benefit of decoupled JTRAN bandwidth and JTOL corner frequency present in its analog counterpart [10]. JTOL is limited by DCDL range in 1.1–2.5 MHz frequency band at 10 Gb/s (0.8 to 2.0 MHz at 4 Gb/s) [10], while the lowfrequency JTOL is restricted to 2 UI<sub>pp</sub> at 10 Gb/s (1.2 UI<sub>pp</sub> at 4 Gb/s) due to instrument limitation. Measured long-term absolute jitter of the RCK when the CDR is operating with PRBS31

|                             | (3)        | (10)       | (29)            | (7)             | (30)            | (8)       | This      |
|-----------------------------|------------|------------|-----------------|-----------------|-----------------|-----------|-----------|
|                             |            |            |                 |                 |                 |           | work      |
| Technology                  | 0.35 µm    | 0.13 μm    | $0.18\mu{ m m}$ | $65\mathrm{nm}$ | $65\mathrm{nm}$ | 0.13 μm   | 65 nm     |
| Supply (V)                  | 3.3        | 3.3/1.8    | 1.8             | 1.0             | 1.2/0.8         | 1.2       | 1.2/1.0   |
| FD type                     | RFD        | Counter    | Linear          | DQFD            | SRCG            | DLL       | BBPD      |
|                             |            |            | PD              |                 |                 |           |           |
| Data rate (Gb/s)            | 0.0125–2.7 | 9.95–11.3  | 8.2–10.3        | 8.5-11.5        | 0.5–2.5         | 0.65–8    | 4-10.5    |
| Acq. time (µs)              | < 800      | N/A        | < 200           | < 400           | N/A             | N/A       | < 600     |
| Architecture                | Full-rate  | Half-rate* | Full-rate       | Full-rate       | Half-rate       | Full-rate | Half-rate |
| JTRAN (MHz)                 | 0.5        | 1.2        | 4**             | N/A             | N/A             | N/A       | 0.2       |
| Oscillator                  | LC         | LC         | LC              | LC              | Ring            | Ring      | Ring      |
| Jitter $(ps_{rms}/ps_{pp})$ | 0.4/8.0    | 0.5/4.5    | 0.4/12.3        | 0.21/N/A        | 5.4/44.0        | 9.7/53.3  | 2.2/24.0  |
| Power (mW@Gb/s)             | 775@2.5    | 400@11.4   | 174@10.3        | 60@11.5         | 6.1@2           | 88.6@8    | 22.5@10   |
| FoM (mW/Gb/s)               | 310        | 35.1       | 16.8            | 5.22            | 3.05            | 11.1      | 2.25      |
| Area $(mm^2)$               | 9.0        | 8.0        | 0.54            | 0.35            | 0.39            | 0.11      | 1.63      |

TABLE I CDR Performance Summary and Comparison With the State-of-the-Art Designs

\*Requires a reference clock for acquisition.

\*\*Inferred from JTOL result.

input data is 2.9  $ps_{rms}/25.1 ps_{pp}$  at 4 Gb/s and 2.2  $ps_{rms}/24.0 ps_{pp}$  at 10 Gb/s (see Fig. 24).

The performance summary of the proposed CDR and its comparison to state-of-the-art designs are shown in Table I. Only the proposed scheme and [29] can perform frequency acquisition without using an explicit frequency detector. However, [29] is not suited for digital implementation and it is not amenable for subrate CDR architectures. Further, linear PD used in [29] is not the preferred choice at high data rates. The proposed CDR achieves best power efficiency and lowest jitter among CDRs implemented with ring oscillators [8], [30]. Compared to LC oscillator-based CDRs in [3], [10], and [29], the power efficiency is superior, but jitter is higher.

## VI. SUMMARY

A continuous-rate CDR with automatic frequency acquisition and ring-oscillator-based wide-range low-noise DCO is presented. Frequency detection is performed by using only the early/late outputs provided by a conventional BBPD. It is based on the simple observation that frequency error is inversely proportional to the number of consecutive early/late signals. Hence, frequency acquisition is achieved by adjusting DCO frequency until the number of consecutive early/late signals reaches the desired threshold. In contrast to divider-based SRCG scheme [8], the proposed method can lock the CDR to within 1000 ppm of the data rate independent of input data transition density.

A digital D/PLL CDR architecture is proposed to reduce the area penalty of large loop filter capacitors present in the analog counterpart. The digital implementation preserves the benefits of the analog D/PLL CDR such as decoupled JTRAN bandwidth and JTOL corner frequency. Furthermore, JTRAN peaking and JTRAN bandwidth dependence on BBPD gain are also eliminated. A ring-oscillator-based fractional-N-PLL is used as a DCO to achieve both wide range and low noise. This DCO also helps to alleviate the conflict between JGEN and JTRAN bandwidth in conventional CDRs. Fabricated in 65 nm CMOS technology, the prototype CDR operates without any errors from 4 to 10.5 Gb/s. At 10 Gb/s, the CDR consumes 22.5 mW power and achieves a JTRAN bandwidth of 0.2 MHz and JTOL corner frequency of 9 MHz, respectively. The proposed DCO has a operation range of 2 to 7 GHz and provides a 2.2  $ps_{rms}$  RCK with a 10 Gb/s PRBS31 input data sequence.

#### REFERENCES

- L. M. DeVito, "A versatile clock recovery architecture and monolithic implementation," in *Monolithic Phase-Locked Loops and Clock Recovery Circuits: Theory and Design*, B. Razavi, Ed. Piscataway, NJ, USA: IEEE Press, 1996, pp. 405–442.
- [2] W. Yin, R. Inti, A. Elshazly, M. Talegaonkar, B. Young, and P. Hanumolu, "A TDC-less 7-mW 2.5-Gb/s digital CDR with linear loop dynamics and offset-free data recovery," *IEEE J. Solid-State Circuits*, vol. 46, no. 12, pp. 3163–3173, Dec. 2011.
- [3] D. Dalton et al., "A 12.5-Mb/s to 2.7-Gb/s continuous-rate CDR with automatic frequency acquisition and data-rate readback," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2713–2725, Dec. 2005.
- [4] D. Richman, "Color-carrier reference phase synchronization accuracy in NTSC color television," *Proc. IRE*, vol. 42, no. 1, pp. 106–133, Jan. 1954.
- [5] F. M. Gardner, "Properties of frequency difference detectors," *IEEE Trans. Commun.*, vol. COM-33, no. 2, pp. 131–138, Feb. 1985.
- [6] A. Pottbacker, U. Langmann, and H. Schreiber, "A Si bipolar phase and frequency detector IC for clock extraction up to 8 Gb/s," *IEEE J. Solid-State Circuits*, vol. 27, no. 12, pp. 1747–1751, Dec. 1992.
- [7] N. Kocaman *et al.*, "An 8.5–11.5-Gb/s SONET transceiver with referenceless frequency acquisition," *IEEE J. Solid-State Circuits*, vol. 48, no. 8, pp. 1875–1884, Aug. 2013.
- [8] R. Inti, W. Yin, A. Elshazly, N. Sasidhar, and P. Hanumolu, "A 0.5-to-2.5-Gb/s reference-less half-rate digtial CDR with unlimited frequency acquisition range and improved input duty-cycle error tolerance," *IEEE J. Solid-State Circuits*, vol. 46, no. 12, pp. 3150–3162, Dec. 2011.
- [9] J. Han, J. Yang, and H.-M. Bae, "Analysis of a frequency acquisition technique with a stochastic reference clock generator," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 59, no. 6, pp. 336–340, Jun. 2012.
- J. Kenney et al., "A 9.95–11.3-Gb/s XFP transceiver in 0.13-μm CMOS," *IEEE J. Solid-State Circuits*, vol. 41, no. 12, pp. 2901–2910, Dec. 2006.
- [11] T. Lee and J. Bulzacchelli, "A 155-MHz clock recovery delay-and phaselocked loop," *IEEE J. Solid-State Circuits*, vol. 27, no. 12, pp. 1736–1746, Dec. 1992.
- [12] G. Shu *et al.*, "A reference-less clock and data recovery circuit using phase-rotating phase-locked loop," *IEEE J. Solid-State Circuits*, vol. 49, no. 4, pp. 1036–1047, Apr. 2014.
- [13] J. Kenney et al., "A 6.5-Mb/s to 11.3-Gb/s continuous-rate clock and data recovery," in Proc. IEEE Custom Integr. Circuits Conf. (CICC), Sep. 2014, pp. 1–4.
- [14] H. Won et al., "A 0.87 W transceiver IC for 100 Gigabit ethernet in 40 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 50, no. 2, pp. 399–413, Feb. 2015.

- [15] G. Shu, W.-S. Choi, S. Saxena, T. Anand, A. Elshazly, and P. Hanumolu, "A 4-to-10.5-Gb/s 2.2-mW/Gb/s continuous-rate digital CDR with automatic frequency acquisition in 65 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2014, pp. 150–151.
- [16] M.-J. Park and J. Kim, "Pseudo-linear analysis of bang-bang controlled timing circuits," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 60, no. 6, pp. 1381–1394, Jun. 2013.
- [17] B. Nikolic, V. G. Oklobdzija, V. Stojanovic, W. Jia, J. K.-S. Chiu, and M. Ming-Tak Leung, "Improved sense-amplifier-based flip-flop: Design and measurements," *IEEE J. Solid-State Circuits*, vol. 35, no. 6, pp. 876– 884, Jun. 2000.
- [18] A. Elkholy, A. Elshazly, S. Saxena, G. Shu, and P. Hanumolu, "A 20-to-1000 MHz 14 ps peak-to-peak jitter reconfigurable multi-output all-digital clock generator using open-loop fractional dividers in 65 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2014, pp. 272–273.
- [19] T. Riley, M. Copeland, and T. Kwasniewski, "Delta-sigma modulation in fractional-N frequency synthesis," *IEEE J. Solid-State Circuits*, vol. 28, no. 5, pp. 553–559, May 1993.
- [20] D. Park and S. Cho, "A 14.2-mW 2.55-to-3-GHz cascaded PLL with reference injection, 800-MHz delta-sigma modulator and 255fs<sub>rms</sub> integrated jitter in 0.13 μm CMOS," in *IEEE Int. Solid-State Circuits Conf.* (*ISSCC) Dig. Tech. Papers*, Feb. 2012, pp. 344–346.
  [21] A. Elshazly, R. Inti, B. Young, and P. Hanumolu, "Clock multiplication
- [21] A. Elshazly, R. Inti, B. Young, and P. Hanumolu, "Clock multiplication techniques using digital multiplying delay-locked loops," *IEEE J. Solid-State Circuits*, vol. 48, no. 6, pp. 1416–1428, Jun. 2013.
- [22] R. Nandwana *et al.*, "A calibration-free fractional-N ring PLL using hybrid phase/current-mode phase interpolation method," *IEEE J. Solid-State Circuits*, vol. 50, no. 4, pp. 882–895, Apr. 2015.
- [23] S. Ye, L. Jansson, and I. Galton, "A multiple-crystal interface PLL with VCO realignment to reduce phase noise," *IEEE J. Solid-State Circuits*, vol. 37, no. 12, pp. 1795–1803, Dec. 2002.
- [24] P. Larsson, "A 2–1600-MHz CMOS clock recovery PLL with low-Vdd capability," *IEEE J. Solid-State Circuits*, vol. 34, no. 12, pp. 1951–1960, Dec. 1999.
- [25] A. Arakali, S. Gondi, and P. Hanumolu, "Low-power supply-regulation techniques for ring oscillators in phase-locked loops using a split-tuned architecture," *IEEE J. Solid-State Circuits*, vol. 44, no. 8, pp. 2169–2181, Aug. 2009.
- [26] W. Yin, R. Inti, A. Elshazly, B. Young, and P. Hanumolu, "A 0.7-to-3.5-GHz 0.6-to-2.8-mW highly digital phase-locked loop with bandwidth tracking," *IEEE J. Solid-State Circuits*, vol. 46, no. 8, pp. 1870–1880, Aug. 2011.
- [27] J. Sonntag and J. Stonick, "A digital clock and data recovery architecture for multi-gigabit/s binary links," *IEEE J. Solid-State Circuits*, vol. 41, no. 8, pp. 1867–1875, Aug. 2006.
- [28] G. Shu et al., "A 5-Gb/s 2.6-mW/Gb/s reference-less half-rate PRPLLbased digital CDR," in *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2013, pp. 278–279.
- [29] S. Huang, J. Cao, and M. Green, "An 8.2-to-10.3-Gb/s full-rate linear reference-less CDR without frequency detector in 0.18 μm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2014, pp. 152–153.
- [30] S.-K. Lee, Y.-S. Kim, H. Ha, Y. Seo, H.-J. Park, and J.-Y. Sim, "A 650-Mb/s-to-8-Gb/s referenceless CDR circuit with automatic acquisition of data rate," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2009, pp. 184–185.



**Guanghua Shu** (S'10) received the M.S. degree in microelectronics from Fudan University, Shanghai, China, in 2011. He is pursuing the Ph.D. degree in electrical and computer engineering at the University of Illinois at Urbana-Champaign, Champaign, IL, USA.

In the summer of 2014, he was a Research Intern in Xilinx, San Jose, CA, USA, developing power and area-efficient parallel link architectures. He worked on 56 Gb/s wireline receivers in IBM Thomas J. Watson Research Center, Yorktown Heights, NY,

USA, at Mixed-Signal Communication IC Design Group in the fall of 2014 and summer of 2015. His research interests include energy-efficient wireline communication systems, clocking circuits, power converters, data converters, and ultralow-power circuits for biomedical applications. Mr. Shu serves as a Reviewer for the IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I&II (TCAS-I&II), the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION SYSTEMS (T-VLSI), and International Symposium on Circuits and Systems (ISCAS). He was the recipient of the Dissertation Completion Fellowship (2015–2016) from the University of Illinois and the SSCS Predoctoral Achievement Award (2014–2015).



**Woo-Seok Choi** received the B.S. and M.S. degrees in electrical engineering and computer science from Seoul National University, Seoul, South Korea, in 2008 and 2010, respectively. Currently, he is pursuing the Ph.D. degree in electrical and computer engineering at the University of Illinois at Urbana-Champaign, Champaign, IL, USA.

His research interests include designing powerefficient high-speed serial links, low-power analogto-digital converters, and interface circuits for capacitive sensors.



Saurabh Saxena received the B.Tech. degree in electrical engineering, and the M.Tech. degree in microelectronics and VLSI design from the Indian Institute of Technology Madras, Chennai, India, in 2009, as a part of the dual degree program. He is currently pursuing the Ph.D. degree in electrical and computer engineering at the University of Illinois at Urbana-Champaign, Champaign, IL, USA.

His research interests include data converters, high-speed I/O interfaces, and clocking circuits.



Mrunmay Talegaonkar received the B.Tech. degree in electrical engineering and the M.Tech. degree in microelectronics and VLSI design from the Indian Institute of Technology Madras, Chennai, India, in 2007, as part of the dual-degree program. He is currently pursuing the Ph.D. degree in electrical and computer engineering at the University of Illinois at Urbana-Champaign, Champaign, IL, USA.

Between 2007 and 2009, he worked as a Design Engineer with Analog Devices, Bangalore, India, where he was involved in design of digital-to-analog

converters. From 2009 to 2010, he was a Project Associate with the Indian Institute of Technology Madras, working on high-speed clock and data recovery circuits. His research interests include high-speed I/O interfaces and clocking circuits.



**Tejasvi Anand** (S'12) received the M.Tech. degree (with distinction) in electronics design and technology from the Indian Institute of Science, Bangalore, India, in 2008. He is currently pursuing the Ph.D. degree in electrical and computer engineering at the University of Illinois at Urbana-Champaign, Champaign, IL, USA.

He worked at IBM T. J. Watson Research Center, Yorktown Heights, NY, USA, with RF Circuits and Systems Group in the summer of 2015. From 2008 to 2010, he worked as an Analog Design Engineer

with Cosmic Circuits (now Cadence), Bangalore, India. His research interests include wireline communication, frequency synthesizers, and sensors with an emphasis on energy efficiency.

Mr. Anand serves as a Reviewer for the IEEE JOURNAL OF SOLID-STATE CIRCUITS. He was the recipient of the 2014–2015 IEEE Solid-State Circuits Society Predoctoral Achievement Award, the 2015 Broadcom Foundation University Research Competition Award (BFURC), the 2015 M. E. Van Valkenburg Graduate Research Award from the University of Illinois, the 2013 Analog Devices Outstanding Student Designer Award, and the 2009 CEDT Design (Gold) Medal from the Indian Institute of Science, Bangalore, India.



Ahmed Elkholy (S'08) received the B.Sc. (Hons.) and M.Sc. degrees in electrical engineering from Ain Shams University, Cairo, Egypt, in 2008 and 2012, respectively. He is pursuing the Ph.D. degree in electrical and computer engineering at the University of Illinois, Urbana-Champaign, Champaign, IL, USA.

Currently, he is a Research Assistant with the University of Illinois. From 2008 to 2012, he was an Analog/Mixed-Signal Design Engineer at Si-Ware Systems, Cairo, Egypt, designing high-performance clocking circuits and LC-based reference oscillators.

His research interests include frequency synthesizers, high-speed serial links, and low-power data converters.

Mr. Elkholy serves as a Reviewer for the IEEE JOURNAL OF SOLID-STATE CIRCUITS, the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I, and the IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS. He received Edward N. Rickert Engineering Fellowship from Oregon State University (2012–2013), and the best M.Sc. degree award from Ain Shams University in 2012.



Amr Elshazly (S'04–M'13) received the B.Sc. (Hons.) and M.Sc. degrees from Ain Shams University, Cairo, Egypt, in 2003 and 2007, respectively, and the Ph.D. degree from the Oregon State University, Corvallis, OR, USA, in 2012, all in electrical engineering.

He is currently a Design Engineer at Intel Corporation, Hillsboro, OR, USA, developing highperformance high-speed I/O circuits and architectures for next generation process technologies. From 2004 to 2006, he was a VLSI Circuit Design Engineer at

AIAT, Inc., Leander, TX, USA, working on the design of several RF building blocks such as PLLs, FM receivers, and LNAs. From 2006 to 2007, he was with Mentor Graphics, Inc., Cairo, Egypt, designing multistandard clock and data recovery circuits for high-speed serial links. His research interests include high-speed serial links, frequency synthesizers, digital phase-locked loops, multiplying delay-locked loops, clock and data recovery circuits, data converter techniques, and low-power mixed-signal circuits.

Dr. Elshazly serves as a Reviewer for the IEEE JOURNAL OF SOLID-STATE CIRCUITS, the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I & II, the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION SYSTEMS, the IEEE International Symposium on Circuits and Systems, IEEE International Conference of Electronic Circuits Systems, and IEEE Asian Solid State Circuits Conference. He was the recipient of the Analog Devices Outstanding Student Designer Award in 2011, the Center for Design of Analog-Digital Integrated Circuits (CDADIC) Best Poster Award in 2012, and the Graduate Research Assistant of the year Award in 2012 from the College of Engineering at the Oregon State University.



**Pavan Kumar Hanumolu** (S'99–M'07) received the Ph.D. degree from the School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA, in 2006, where he subsequently served as a faculty member till 2013.

He is currently an Associate Professor with the Department of Electrical and Computer Engineering and a Research Associate Professor with the Coordinated Science Laboratory, University of Illinois, Urbana-Champaign, Champaign, IL, USA. His research interests include energy-efficient inte-

grated circuit implementation of analog and digital signal processing, sensor interfaces, wireline communication systems, and power conversion.

Dr. Hanumolu currently serves as an Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS, and is a technical program committee member of the VLSI Circuits Symposium, and the IEEE International Solid-State Circuits Conference. He was the recipient of the National Science Foundation CAREER Award in 2010.