# A 5 Gb/s, 10 ns Power-On-Time, 36 $\mu$ W Off-State Power, Fast Power-On Transmitter for Energy Proportional Links

Tejasvi Anand, Student Member, IEEE, Amr Elshazly, Member, IEEE, Mrunmay Talegaonkar, Student Member, IEEE, Brian Young, Student Member, IEEE, and Pavan Kumar Hanumolu, Member, IEEE

*Abstract*—A fast power-on transmitter architecture that enables energy proportional communication for server and mobile platforms is presented. The proposed architecture and circuit techniques achieve fast power-on capability in voltage mode output driver by using fast-digital regulator, and in the clock multiplier by accurate frequency pre-setting and periodic reference insertion. To ease timing requirements, an improved edge replacement logic circuit for the clock multiplier is proposed. The proposed transmitter demonstrates energy proportional operation over wide variations of link utilization, and is therefore suitable for energy efficient links.

Fabricated in 90 nm CMOS technology, the voltage mode driver and the clock multiplier achieve power-on-time of only 2 ns and 10 ns, respectively. By using highly scalable digital architecture with accurate frequency pre-setting and instantaneous phase acquisition, the prototype MDLL-based clock multiplier achieves 10 ns (3 reference cycles) power-on-time, 2 ps<sub>rms</sub> long-term absolute jitter at 2.5 GHz output frequency. The proposed fast power-on transmitter architecture consumes 4.8 mW/36  $\mu$ W on/off-state power from 1.1 V supply, has 10 ns total power-on time, and achieves 100× effective data rate scaling (5 Gb/s-0.048 Gb/s), while scaling the power and energy efficiency by only 50× (4.8 mW–0.095 mW) and 2× (1–2 pJ/Bit), respectively. The proposed transmitter occupies an active die area of 0.3 mm<sup>2</sup>.

*Index Terms*—Burst mode, digital regulator, energy efficient, energy proportional, fast power-on, I/O, multiplying delay locked loop (MDLL), serial link, transmitter.

## I. INTRODUCTION

**E** VER-INCREASING demand for higher bandwidth in mobile devices and high performance servers has been the main driving force behind energy efficient links. Innovation

Manuscript received November 27, 2013; revised June 15, 2014; accepted July 21, 2014. Date of publication August 28, 2014; date of current version September 22, 2014. This paper was approved by Associate Editor Jack Kenney. This work was supported in part by the Intel Labs University Research Office, SRC under task ID:1836.125, 1836.129, and the National Science Foundation under CAREER EECS-0954969.

T. Anand, M. Talegaonkar, and P. K. Hanumolu are with the Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA (e-mail: tanand3@illinois.edu).

A. Elshazly is with Intel Corporation, Hillsboro, OR 97124 USA.

B. Young is with Marvell, Corvallis, OR 97333 USA.

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2014.2345764

in circuit and signaling techniques together with voltage and process scaling enabled almost 15% energy efficiency improvement each year over the past decade. However, despite this achievement, with the ongoing push to achieve data throughputs in the 1 TB/s range, the serial link power is becoming unacceptably large. Further increase in the data rates is limited by the thermal constraints of the package [1]. Therefore, in order to push the bandwidth without hitting the power wall, a paradigm shift in the design of serial links is needed.

A serial link in applications such as a memory controller or an Ethernet interface is used only when there is a request to access memory due to a miss in the last level cache or a request to download a web page. Thus, instantaneous bandwidth demand (utilization) of the link varies over time. As shown in Fig. 1(a), when a conventional link is idle, it consumes idle power, which is a significant portion of the total operating power. As a result, energy efficiency of such link, as quantified by the energyper-bit metric, degrades when the link utilization is low. Techniques such as dynamic power management [2], [3] achieve significant power saving by scaling the supply voltage to meet the link bandwidth demand. However, any change in the link rate requires dropping the existing link and re-negotiating new data rates between transmitter and receiver, which is time consuming. Moreover, these techniques offer only limited link rate scaling.

In view of these drawbacks, power-cycling or burst-mode communication where a link is powered-on only when needed has recently emerged as an attractive means to scale the link power based on its utilization levels [4]-[8]. In these systems, the link is powered down when idle and powered back up instantaneously when data is ready to be transferred, resulting in the most energy efficient use of link bandwidth. In such system, the power consumption scales linearly with link utilization, resulting in energy consumption to become proportional to the transferred data, as shown in Fig. 1(b). This mode of operation is commonly referred to as energy proportional operation [9]. In these energy proportional links (EPL), the energy-per-bit of the link stays constant across all utilization levels (see Fig. 1(b)) and is achieved without compromising their active link power. For example, serial interface with 5 ns fast power-on receiver [6] achieves 1.4 pJ/bit energy efficiency.



Fig. 1. Bandwidth availability, usage scenarios together with power consumption and energy-per-bit. (a) Conventional link. (b) Proposed energy proportional link (EPL).

In order to achieve such performance, these links need to be powered on/off in no time, consume near zero power in the off-state, and incur minimum energy overhead during on-to-off and off-to-on state transitions. In practice, it is very difficult to meet these requirements. Phase locking the clock multiplier on power-on and bringing voltage regulator to a steady state are some of the big challenges in achieving these goals.

In this paper, we present a fast power-on transmitter consisting of a voltage mode driver and a clock multiplier. Fabricated in 90 nm CMOS process, the prototype transmitter achieves  $100 \times$  effective data rate range (5 Gb/s–0.048 Gb/s) while scaling the power by  $50 \times (4.8 \text{ mW}-0.095 \text{ mW})$  and energy efficiency by only  $2 \times (1-2 \text{ pJ/Bit})$ . Such energy proportional operation is achieved by using a fast power-on voltage mode driver and multiplying delay locked loop (MDLL) based digital clock multiplier. In this work, wide effective data rate range is achieved by duty cycling the transmitter at a fixed data rate of 5 Gb/s and not by changing active data rate. By adopting a digital voltage regulator, the prototype voltage mode driver achieves 2 ns power-on time, less than 11  $\mu$ W off-state power, 32 pJ energy overhead for on/off transition, and 2.6 mW on-state power at 5 Gb/s output data rate. By employing a highly scalable digital architecture with accurate frequency pre-setting and instantaneous phase acquisition, the prototype  $8 \times / 16 \times$  clock multiplier achieves 10 ns (3 reference cycles) power-on time, 2  $ps_{rms}$  long-term absolute jitter, less than 25  $\mu$ W off-state power, 12 pJ energy overhead for on/off transition, and 2.2 mW on-state power at 2.5 GHz output frequency [10].

The rest of the paper is organized as follows. Section II discusses the effect of circuit non-idealities on the energy proportional behavior of links. Section III discusses the building blocks of a conventional transmitter and their limitations for energy proportional link application. Section IV introduces the proposed transmitter architecture. Circuit details of the clock multiplier are discussed in Section V. Section VI presents the measured results. Section VII concludes the paper.

# II. EFFECT OF NON-IDEALITIES ON ENERGY PROPORTIONAL LINKS

Practical energy proportional links have finite power-on time, non-zero off-state power, and finite power-cycling energy. Effect of these parameters on the link's energy efficiency (energy-per-bit) can be mathematically captured as:

$$\frac{Energy}{Bit} = \frac{P_{ON}T_{ON} + P_{ON}T_{POWER-ON} + P_{OFF}T_{OFF} + E_{ON-OFF}}{\#Bits}$$
(1)

where  $P_{ON}$  is the on-state power,  $P_{OFF}$  is the off-state power,  $T_{ON}$  is the on-state time,  $T_{OFF}$  is the off-state time,  $T_{POWER-ON}$  is the power-on time and  $E_{ON-OFF}$  is the energy consumed during on/off transition.

Finite power-on time is due to the time required to charge/discharge bias nodes, and time needed for the clock multiplier to achieve frequency and phase lock. Assuming the link consumes approximately peak power during power-on transition, this wasted energy equates to  $P_{ON}T_{POWER-ON}$ . The effect of power-on time on link efficiency is shown in Fig. 2(a). For a constant burst length, power-on energy ( $P_{ON}T_{POWER-ON}$ ) increases the energy-per-bit by a fixed amount across all link utilization values.

Fig. 2(b) illustrates the effect of static off-state power on link energy-efficiency. At lower link utilization  $(T_{OFF} \gg T_{ON})$ , even with a small off-state power  $(P_{OFF})$ , the off-state energy  $(P_{OFF}T_{OFF})$  starts to dominate link energy-efficiency. Therefore, the off-state power must be close to zero to achieve constant energy-per-bit at extremely low data rates.

Power-cycling energy is the energy consumed in charging/ discharging nodes in each power cycle or data burst event. Therefore, more frequent data bursts incur larger energy penalty. Fig. 2(c) shows the effect of burst length on energy-per-bit for a fixed effective data rate. When the data is transferred in smaller bursts, energy spent in powering-on



Fig. 2. (a) Effect of power-on time on link energy efficiency. (b) Effect of static off-state power on link energy efficiency. (c) Effect of data burst length on link energy efficiency.

the link  $(P_{ON}T_{POWER-ON})$  and power cycling energy  $(E_{ON-OFF})$  becomes comparable to the on-state energy  $(P_{ON}T_{ON})$  and consequently leads to increased energy-per-bit.

Plot of energy-per-bit versus utilization captures most essential features of energy proportional links. With the help of this plot, energy consumption estimates can be made for a given link usage scenario of an application. Thus, it forms one of the important metrics to characterize and compare such links.

# III. LIMITATIONS OF CONVENTIONAL TRANSMITTER FOR USE IN ENERGY PROPORTIONAL LINKS

#### A. Output Driver

Voltage-mode (VM) drivers dissipate a quarter of the power as compared to the current-mode logic (CML) output drivers [11]. However, voltage regulators required to set output swing and termination impedance cannot be powered-on instantaneously. Keeping these regulators always-on, severely impacts energy-per-bit at lower data rates [5]. Digital voltage regulators [12], [13], provide a means to power-on/off rapidly while consuming no static power in the off-state. Pass transistor in the digital regulator also helps in power-gating the logic, resulting in low leakage power in the off-state. For these reasons, digital regulators are employed in the proposed VM driver.

#### B. Clock Multiplier

The long locking time of conventional clock multipliers implemented using phase-locked loops (PLLs) presents the biggest bottleneck in achieving energy proportional operation. Increasing the PLL bandwidth reduces the locking time. However, to ensure loop stability, loop bandwidth cannot exceed one tenth the reference frequency [14]. As a result, even if the VCO frequency is precisely set digitally, the sluggish phase acquisition limits the phase locking time to at best few hundred nanoseconds [5], [15]. Techniques such as dynamic phase error compensation [16], edge-missing compensation [17], and hybrid PLLs [18] improve the phase acquisition time. By calibrating the phase of the feedback clock, best power-on time of forty reference cycles has also been reported [19]. However, such improvements are inadequate to achieve energy proportional operation goal.



Fig. 3. Schematic diagram of the proposed fast power-on transmitter.

Multiplying injection locked oscillator (MILO) provides a means to reduce power-on time. By increasing MILO's bandwidth with stronger injection strength, the locking time can be reduced. However, a wide bandwidth results in large spurs at the injection frequency [20]. Filtering of MILO's output with a second injection locked oscillator (ILO) could reduce these spurs. However, it comes at the cost of extra power [21].

Multiplying delay locked loop (MDLL) provides a means to overcome the drawbacks of MILOs. In MDLL, every *N*th VCO edge is replaced by the clean reference edge by opening up the ring oscillator for a brief period using a narrow pulse [22]–[24]. This edge replacement results in instantaneous phase locking, which is independent of the bandwidth. Inserting a clean reference edge every reference cycle resets all accumulated jitter in the VCO and results in superior jitter performance [25], [26]. These features make MDLL a suitable candidate for fast power-on applications.

#### IV. PROPOSED FAST POWER-ON TRANSMITTER ARCHITECTURE

Fig. 3 shows the block diagram of the proposed transmitter. It consists of a fast power-on clock multiplier, a 2:1 latch-based multiplexer, and a voltage mode driver output stage. The clock multiplier, implemented using a digital multiplying delay-locked loop (MDLL), generates a 2.5 GHz output from a 312.5 MHz reference clock. The PRBS9 generator outputs data at 2.5 Gb/s, which is serialized and transmitted at 5 Gb/s with 250 mV differential peak-to-peak output swing.

The proposed voltage mode driver consists of a replica bias, pre-driver, and output driver. The replica bias circuit generates reference voltages  $V_{REPLICA}$  and  $V_{SWING}$  for the pre-driver and output driver regulators, respectively.  $V_{REPLICA}$  needed to create 50 ohm output impedance is generated by enclosing a replica of the output driver [27], which is 1/16th the original size, in a closed loop.  $V_{SWING}$  sets the differential output driver swing. Pre-driver and output driver regulators use the replica bias output to generate virtual supply voltages for the pre-driver  $(V_{PRE})$  and output driver  $(V_{DRV})$ , respectively. These regulators are implemented using digital feedback loops, which help in storing states during power-off event and restoring them during power-on.

Fig. 4 shows the schematic of the proposed voltage regulator. It consists of a clocked comparator, an accumulator, and a resistive DAC.  $V_{OUT}$  is compared with  $V_{REF}$  and the output is fed to the 12 bit accumulator. The accumulated output drives a 7 bit resistive DAC to minimize error between  $V_{REF}$  and  $V_{OUT}$ . The lower 5-LSBs of the accumulator output are ignored to reduce voltage ripple due to loop delay. A 20 pF decoupling capacitor at the output of transistor  $M_1$  helps in suppressing noise on the output voltage.

### A. Power-On Transient Response of the Transmitter

Fig. 5 shows the transient response of the transmitter. When the START signal is de-asserted, the pre-driver and output driver regulator output  $V_{PRE}$  and  $V_{DRV}$  nodes are discharged to ground. The gates of transistors  $M_1$  and  $M_2$  are pulled low (see Fig. 3), the clock to all comparators is gated and bias current of the replica bias block is turned off. In this state, only the leakage current in digital logic contributes to off-state power consumption. During the off-state, regulator states are saved in their respective accumulators.

When the START signal is asserted, the DACs rapidly restore the accumulator's state and set the gate voltages of transistors  $M_1$  and  $M_2$ , which quickly bring  $V_{PRE}$  and  $V_{DRV}$  to the desired value. The 20 pF decoupling capacitor charges up in 2 ns after which the driver is ready for transmitting data. The MDLL



Fig. 4. Schematic diagram of the proposed digital voltage regulator.



Fig. 5. Transient response of the proposed fast power-on transmitter.

based clock multiplier limits the transmitter start-up time, which takes around 10 ns (3 reference cycles) to start.

## B. Effect of Power Supply Droop on Output Driver

Burst mode operation requires fast load transient response linear regulators [28]–[30] to power the transmitter. When the output driver is powered-on, output of the regulator droops momentarily before regaining its original value. The amount of droop and transient response time are a function of loop dynamics of external regulator and current step. In order to capture this effect, output driver including the multiplexer is simulated with a fast load transient response regulator. Fig. 6 shows the current step, power-supply droop and the output eye diagram at various time instants. Droop in the power supply increases output jitter which will eventually reduce the sampling margin on the receiver.

## C. Fast Power-On Clock Multiplier

Fig. 7 shows the block diagram of the proposed MDLL based digital clock multiplier. It employs a split tuned architecture in which a frequency locked loop (FLL) drives the ring oscillator close to frequency lock, and an integral control that brings oscillator frequency to the desired output frequency. The proportional path ensures stability by periodically resetting the oscillator output phase with the input reference clock phase using edge replacement logic (ERL). Frequency locked loop consists of a frequency detector, 12 bit accumulator clocked at  $F_{REF}/256$ , fast settling DAC and constant current source  $I_1$ . Integral path uses a bang-bang phase detector, 13 bit ac-

cumulator, and 8 bit DAC. The phase detector consisting of two D flip-flops, clocked at the reference frequency, produces lead/lag phase information with a 1 bit output. The accumulator clocked at  $F_{REF}/16$  integrates the sub-sampled bang-bang phase detector output.

The proportional path consists of a programmable divider and edge replacement logic (ERL). The edge replacement logic generates a narrow SEL signal pulse, which opens up the ring oscillator momentarily and passes the clean reference edge. In order to have a perfect edge replacement, care is taken to generate the SEL signal with sharp rise and fall times.

The digital accumulators store the frequency information of the oscillator in the digital form during the power-off state. They are synthesized with high  $V_{th}$  devices to reduce leakage. Four LSBs from the frequency locked loop accumulator and 5 LSBs from the fine integral path accumulator are ignored to avoid ripple on control voltage node,  $V_{CTRL}$ , due to loop delay. In order to reduce the power-on time penalty caused by slow settling transients, the DAC bias circuitry is not turned-off in the power-off state. The bias voltages are maintained at the expense of small power penalty during the off-state. When the MDLL is powered-on, the frequency information is rapidly restored to the oscillator using fast Nyquist-rate DACs, thus bringing the oscillator to frequency lock quickly. Once frequency lock is achieved, the rising edge of the reference replaces the Nth oscillator edge thus achieving instantaneous phase lock.

Periodic edge replacement results in current being drawn periodically from  $V_{CTRL}$ , thereby causing a supply ripple at reference frequency and its harmonics. Despite the pseudo differ-



Fig. 6. Simulated power supply droop on the output driver.



Fig. 7. Schematic diagram of the proposed fast power-on MDLL based digital clock multiplier.

ential nature of VCO, current drawn by VCO is not constant, which causes ripple on  $V_{CTRL}$ . Deterministic jitter (DJ) resulting from  $V_{CTRL}$  ripple can be reduced with a decoupling capacitor. However, a large decoupling capacitor increases the time constant on the  $V_{CTRL}$  node, thereby increasing the time it takes for the frequency to settle to the right value, which eventually increases power-on time. In order to quantify this trade-off between DJ and power-on time, a bank of programmable decoupling capacitor ( $C_D$  bank) is added on the  $V_{CTRL}$  node and the measured results are presented in Section VI.

#### D. Power-On Transient Response of the Clock Multiplier

Fig. 8 shows the power-on transient response of the proposed clock multiplier. When the multiplier is powered-off, *REF* 



Fig. 8. Transient response of the proposed fast power-on clock multiplier.



Fig. 9. Schematic diagram of the voltage-controlled oscillator.

signal is gated, SEL signal is asserted high and the VCO stops oscillating. Once the VCO is open, it no longer sinks current. This causes  $V_{CTRL}$  node to charge up to  $V_{DD}$ , which eventually shuts down current source  $I_1$ . When the multiplier is powered-on, the SEL signal is de-asserted, and the VCO starts to oscillate. The  $V_{CTRL}$  node then settles to the desired value with a finite time constant. During the time when  $V_{CTRL}$  node is settling, the VCO oscillates at a higher frequency, which causes the rising edge of the divider output COUNT signal to appear earlier than desired. On rising edge of the COUNT signal, SEL signal is asserted high, and the VCO opens up and waits for the rising edge of the REF signal to pass through. During this wait period, VCO stops and the  $V_{CTRL}$  node again rises towards  $V_{DD}$ . On the subsequent rising edge of the reference, the SEL signal is de-asserted, and the VCO begins to oscillate again.

In the second reference cycle,  $V_{CTRL}$  node again settles to the desired value but the initial high oscillation frequency again causes rising edge of the divider output COUNT signal to appear earlier than desired. However, this time it appears closer to the rising edge of reference signal, which causes smaller disturbance on the  $V_{CTRL}$  node. In the third reference cycle, disturbance on the  $V_{CTRL}$  node is even smaller and the multiplier is close to achieving frequency and phase lock. Thus, in the proposed MDLL architecture, the power-on time is mainly limited by the time constant on the  $V_{CTRL}$  node.

#### V. CLOCK MULTIPLIER BUILDING BLOCKS

## A. Voltage Controlled Oscillator (VCO)

Schematic diagram of VCO is shown in Fig. 9. It consists of five stages connected in a ring configuration. One of these five



Fig. 10. Simulated power supply induced jitter (PSIJ) of VCO (including DAC) and MDLL.

stages is an inverting multiplexer and the rest four are cross coupled inverters.  $OUT_{CLK-ADV}$  signal is tapped from the middle of the VCO. This signal is used to meet timing requirements in the edge replacement logic. REF signal to the multiplexer is driven by a VCO delay cell, shown as a shaded cell in Fig. 9. Passing the REF signal through the VCO buffer matches the rise time of REF edge with  $OUT_{CLK}$  edge resulting in lower reference spur at the output. Choice of five stages in the oscillator was made carefully to achieve sharp rise and fall times, which helps in reducing deterministic jitter caused due to imperfect reference edge replacement. Resistive cross coupling on the multiplexer was avoided to reduce noise coupling from  $OUT_{CLK}$  edge to the clean REF edge during the edge replacement operation. Choice of pseudo differential stages was made to achieve smaller self induced ripple on the  $V_{CTRL}$  node, resulting in better jitter performance with a smaller decoupling capacitor. When simulated at 2.5 GHz, the VCO consumes approximately 550  $\mu$ A from a 1.1 V supply.

Power supply induced jitter (PSIJ) of the VCO (including DACs) and MDLL is a strong function of decoupling capacitor  $C_D$  on the  $V_{CTRL}$  node. Simulated  $PSIJ_{pk-pk}$  value for  $C_D = 0$  pF and 20 pF is shown in Fig. 10. At 100 MHz sinusoidal supply disturbance, simulated  $PSIJ_{pk-pk}$  of MDLL for  $C_D = 0$  pF and 20 pF is 2 ps/mV and 0.053 ps/mV respectively. At 1 MHz sinusoidal supply disturbance, simulated  $PSIJ_{pk-pk}$  of MDLL for  $C_D = 0$  pF and 20 pF is 2 ps/mV and 0.053 ps/mV.

#### B. Edge Replacement Logic (ERL)

Edge replacement logic is responsible for generating a narrow pulse to pass the clean reference edge every Nth VCO cycle. The width of this pulse is typically  $T_{VCO}/2$ . Conventional select logic requires synchronous thermometric counter running at VCO frequency to generate periodic pulses of one VCO period width [22]. However, running a synchronous counter at VCO frequency results in large power dissipation. Moreover, the initial VCO frequency is required to be higher than  $N * F_{REF}$ , which limits this circuit [22] to be used for fast power-on applications where the VCO frequency is very close to  $N * F_{REF}$ during power-on.

During normal operation, the SEL signal must be de-asserted after the rising edge of REF signal and before the falling edge of the VCO output, and within time  $T_{VCO}/4$  in the best case. The select logic in [26] avoids thermometric counter, but suffers from  $T_{REF\uparrow-SEL\downarrow}$  timing constraint, which is difficult to meet at higher VCO frequencies. In this work, the proposed ERL employs ripple counter to reduce power consumption, and uses advanced reference signal to overcome the  $T_{REF\uparrow-SEL\downarrow}$  timing constraint.

Fig. 11 shows the schematic and timing diagram of the proposed ERL circuit. When the COUNT signal is low, the output of the first stage,  $STG1_{OUT}$  is pre-charged to logic high. After the completion of N VCO cycles, the COUNT signal is asserted high. The SEL signal is asserted high on the falling edge of  $OUT_{CLK-ADV}$ . The SEL signal opens up the oscillator and waits for the REF rising edge to pass.  $REF_{ADV}$  signal is generated by tapping the REF signal before the delayed REF goes into the VCO. On the rising edge of the  $REF_{ADV}$ , output of Stage1 is discharged and SEL signal is de-asserted. The timing by which the  $REF_{ADV}$  signal must be advanced with-respect-to REF signal is given by the following equation:

$$T_{REFADV\uparrow-REF\uparrow} = T_{REFADV\uparrow-SEL\downarrow} - T_{VCO}/4 \quad (2)$$

where  $T_{REFADV\uparrow-SEL\downarrow}$  is the time between  $REF_{ADV}$  rising edge to SEL falling edge.

## C. DAC

Delta-sigma DACs followed by post filter offer a compact way to achieve high-resolution frequency control of the VCOs. However, large time constant of the post-filter increases the frequency settling time. The proposed Nyquist-rate DAC and its timing diagram is shown in Fig. 12. The DAC is implemented using thermometer-coded current-mode architecture to ensure monotonicity and fast settling. Single ended source switched PMOS current elements are used to minimize area. By employing current mode Nyquist-rate DACs, use of low bandwidth post-filter is avoided, thereby achieving high bandwidth to rapidly set VCO frequency during power-on/off events. When the system is powered-off, clock to the accumulator is gated and accumulator holds its state. De-asserting the STARTsignal causes the output of binary to thermometric logic to go down to zero, which shuts down the DAC. When the system is powered-on, previous state of the DAC is restored and  $I_{OUT}$ quickly reaches the desired value.

The choice of DAC's resolution and frequency tuning range is governed by tolerable frequency quantization error. Frequency quantization error results in accumulation of VCO phase for one reference cycle, which results in deterministic jitter and consequently reference spurs. For a given DAC resolution and frequency tuning range, deterministic jitter  $\phi_{DJ}$  can be estimated mathematically as follows:

$$\phi_{DJ} = \frac{F_{TUNE} \cdot N \cdot T_{VCO}}{2^{DAC-BIT} \cdot F_{VCO}} \tag{3}$$

where  $F_{TUNE}$  is the frequency tuning range, N is the frequency multiplication factor, DAC - BIT is the size of DAC,  $F_{VCO}$ is the VCO frequency, and  $T_{VCO}$  is the VCO period. Using (3), plot of deterministic jitter as a function of frequency tuning



Fig. 11. Schematic and timing diagram of the proposed edge replacement logic (ERL).



Fig. 12. Schematic and timing diagram of the Nyquist-rate DAC.



Fig. 13. Simulated deterministic jitter as a function of frequency tuning range for N=8 and  $F_{VCO}=2.5$  GHz.

range for various DAC sizes is shown in Fig. 13. Increasing the DAC resolution on one-hand reduces the frequency quantization error and on the other-hand increases the area and parasitic capacitance on the  $V_{CTRL}$  (virtual supply node of VCO), which eventually increases the power-on time of MDLL. Design of fast power-on MDLL with wide frequency tuning range VCO remains a challenging problem. In the proposed architecture, 8 bit integral path DAC provides up to 125 MHz of tuning range.

### VI. MEASUREMENT RESULTS

The die micrograph of the prototype transmitter, implemented in 90 nm CMOS process, is shown Fig. 14. It occupies an active area of  $0.3 \text{ mm}^2$  of which the voltage mode driver



Fig. 14. Die micrograph of the proposed transmitter.



Fig. 15. Measured power breakup of the proposed transmitter.



Fig. 16. Measured power-on/off transient of the proposed MDLL for multiplication factors of 8 and 16.

occupies 0.14 mm<sup>2</sup>(330  $\mu$ m × 430  $\mu$ m) and the MDLL occupies 0.16 mm<sup>2</sup>(450  $\mu$ m × 350  $\mu$ m). The chip is packaged in a 48 pin QFN plastic package.

Wide range of measurements were conducted to quantify the trade-offs between performance and power-on time. For all the measurements, multiplication ratio of 8, reference frequency of 312.5 MHz and a 0 pF decoupling capacitor  $(C_D)$  for MDLL is used unless otherwise stated. At 5 Gb/s, the transmitter consumes 4.8 mW (excluding PRBS generators) with voltage mode driver consuming 2.6 mW from a 1 V supply and the MDLL consuming 2.2 mW from a 1.1 V supply. Fig. 15 shows the on-state and off-state power break-up of the transmitter. In the power-off state the transmitter consumes 33  $\mu$ W of which 11  $\mu$ W is consumed by the voltage mode driver and 25  $\mu$ W by the MDLL. Off-state power in the voltage mode driver is largely due to leakage in digital circuits such as accumulators and multiplexers. In MDLL, out of the measured 2.2 mW on-state power, 1.38 mW is consumed in the digital logic and the remaining 0.82 mW is consumed in the DACs and the oscillator. In the off-state, out of measured 25  $\mu$ W, 17  $\mu$ W is consumed in bias circuits and leakage in DAC's decoder logic, and 8  $\mu$ W is due to leakage in rest of logic circuits.

Fig. 16 shows the captured transient of 2.5 GHz clock waveform, while power cycling the MDLL for a multiplication factor of 8 (312.5 MHz reference) and 16 (156.25 MHz reference). Fig. 17 plots the measured peak period jitter versus power-on time. In both cases, the MDLL locks in approximately 3 reference cycles. Mathematically, it can be shown that a small time constant on  $V_{CTRL}$  node makes settling time, in terms of reference cycles, to be independent of the multiplication factor N.

Fig. 18 shows the captured  $DATA_{OUT}$  and START signals while power cycling the transmitter. The delay difference between the START signal captured on the CSA8200 and the START signal which is applied on the transmitter input is 1 ns, as seen in the power-off event. The measured power-on time of the voltage mode driver is around 2 ns and is dominated by the time needed to charge 20 pF decoupling capacitors of the output driver and per-driver regulators. The voltage mode driver thus doesn't limit the power-on time of the whole transmitter. The power-on time is limited by the MDLL's power-on time, which



Fig. 17. Measured period jitter of the proposed MDLL for multiplication factors of 8 and 16.

is 10 ns. The measured energy overhead of power cycling is 44 pJ of which 32 pJ is consumed in the voltage mode driver and the remaining 12 pJ in the MDLL. Charging and discharging of capacitors on  $V_{DRV}$  and  $V_{PRE}$  nodes (see Fig. 3) are the major contributors to this overhead.

Fig. 19 plots the power consumption and energy efficiency of the transmitter versus effective data rate for different data burst lengths (in bytes). Ideally, the power consumption must scale linearly with the data rate, as shown by dashed line in the power consumption versus effective data rate plot. However, the power overhead due to power cycling and finite power-on time increases the slope of the power versus effective data rate curve at smaller burst lengths (4 bytes). For longer burst lengths, the power overhead due to power cycling is a smaller portion of the total power consumed during data transmission. Therefore, energy proportional behavior, which is closer to the ideal case is achieved. For 128 bytes packet size, the power consumption varies from 4.8 mW to 0.095 mW ( $50 \times$  change) and the energy efficiency varies from 1 pJ to 2 pJ ( $2 \times$  change) when the effective data rate varies from 5 Gb/s to 48 Mb/s ( $100 \times$  change). The 32 byte packet size data burst reaches the break-even power point at 3.33 Gb/s and any increase in the bandwidth demand

**Transmitter is Powered-Off** 

1ns

START



Fig. 18. Measured power-on/off transient of the proposed transmitter.



Fig. 19. Measured power consumption and energy-per-bit of the proposed transmitter as a function of effective data rate for various burst lengths in bytes.



Fig. 20. Measured eye diagram of the proposed transmitter at 5 Gb/s with PRBS9 output data.

beyond this point must be met by keeping the transmitter in an always-on state.

Fig. 20 shows the captured transmitter output eye diagram with the PRBS9 data. The differential output swing is 250 mV<sub>ppd</sub> and the measured long term jitter is 4 ps<sub>rms</sub> and 26.3 ps<sub>pk-pk</sub> with 100 k hits. Fig. 21 shows the MDLL output phase noise plot at 2.5 GHz. The measured phase noise at 1 MHz offset is -116.9 dBc/Hz and the jitter obtained by integrating phase noise from 3.125 KHz to 100 MHz is 752 fs. Trade-off between power-on time and jitter performance is measured using programmable capacitor bank  $C_D$  (see Fig. 7). Fig. 22 shows the measured period jitter for different decoupling capacitor values. These measurements were conducted by enabling the capacitors one at a time while the MDLL is power cycled. As expected, a big decoupling capacitor increases the power-on time by increasing the time for the control voltage  $V_{CTRL}$  to settle. The measured settling time is 256 ns for a 20 pF capacitor and 10 ns for 0 pF capacitor.

1ns



Fig. 21. Measured phase noise spectrum of the proposed MDLL at 2.5 GHz output frequency.



Fig. 22. Measured peak period jitter as a function of time for different decoupling capacitor ( $C_D$ ) settings.

Fig. 23 shows the measured MDLL jitter performance for two extreme capacitor values. The measured long term absolute jitter over 100 k hits for a 20 pF decoupling capacitor is 1.1 ps<sub>rms</sub>/10 ps<sub>pk-pk</sub>, and for a 0 pF capacitor is 2 ps<sub>rms</sub>/18.6 ps<sub>pk-pk</sub>. A big decoupling capacitor filters the noise from current sources and supply thereby achieving superior performance. A big decoupling capacitor also helps in reducing reference spurs. Fig. 24 shows the measured MDLL's output spectrum for two capacitor values.

The measured reference spur for a decoupling capacitor of 20 pF and 0 pF is -51.5 dB and -43.5 dB, respectively. We believe common ground pins between the ripple counter, digital logic, and VCO is the main reason for this spur and the spurs appearing at subharmonic reference frequencies.

Energy proportional performance of the stand alone MDLL is measured separately. Fig. 25 shows the measured power consumption and energy-per-cycle versus utilization when the MDLL is power cycled at four different on/off periods. The power scales linearly with utilization. The average energy overhead of on-to-off and off-to-on transition is 12 pJ. A non-zero y-intercept indicates finite turn-on time and non-zero off-state power. In a conventional multiplier the energy-per-cycle increases at lower utilization. The proposed MDLL achieves

TABLE I Performance Comparison of the Proposed Fast-On Transmitter With State-of-the-Art Designs

|                      | This Work                  | [4]VLSI'11 | [5]JSSC'10                 |  |
|----------------------|----------------------------|------------|----------------------------|--|
| Technology           | 90nm                       | 40nm       | 40nm                       |  |
| 0 1 (0)              | 1 (VM Driver)              |            | 1.1                        |  |
| Supply(v)            | 1.1 (MDLL)                 | N/A        | 1.1                        |  |
| Peak data rate(Gb/s) | 5                          | 2.5-5.6    | 2.7-4.3                    |  |
| Power-on time(ns)    | 10                         | 8          | 241.8                      |  |
| Power-on time        | 2                          | 5 (        | 130                        |  |
| (Reference Cycles)   | 3                          | 5.0        |                            |  |
| Ref. freq.(MHz)      | 312.5                      | 700        | 537.5                      |  |
| On-state power(mW)   | 4.8*                       | 13.4**     | 14.2***                    |  |
| Off-state power      | 36µW                       | 0mW        | mW $50\mu W^{\dagger}$     |  |
| Output swing(mV)     | $250(\text{Diff}_{pk-pk})$ | N/A        | $200(\text{Diff}_{pk-pk})$ |  |

† 0.4 mW for 8 channels reported

\*Output driver and clock power at 5 Gb/s

\*\*Output driver, clock and receiver power at 5.6 Gb/s

\* \* \*Output driver, clock and receiver power at 4.3 Gb/s

almost constant energy-per-cycle when power cycled with a cycle time of 1600 ns. For this case, the energy-per-cycle changes from 0.88 pJ to 1 pJ when the utilization changes from 100% to 9%. When MDLL is power cycled with a cycle time of 400 ns, the energy-per-cycle changes from 0.88 pJ to 1.4 pJ when the utilization changes from 100% to 7.5%.

Table I compares the proposed transmitter with state-ofthe-art designs. The proposed voltage mode driver compares favorably and achieves smallest power-on time. The transmitter as a whole achieves smallest power-on time in terms of reference cycles. Comparison of the proposed transmitter with prior-art is made using normalized energy-per-bit versus effective data rate and normalized power versus effective data rate plots. Burst length of 8 bytes is used in this comparison. Comparison plots were obtained based on the power-on time, off-state power, and on-state power of the prior-art and proposed transmitter. Normalization of energy-per-bit versus effective data rate was done such that, proposed transmitter and prior-art have unity energy efficiency at their respective peak data rates. Normalization of power versus effective data rate was done such that, proposed transmitter and prior-art have unity power at their respective peak data rates. Fig. 26 shows the comparison plot of normalized power versus effective data rate and energy-per-bit versus effective data rate. At an effective data rate of 10 Mb/s, normalized energy-per-bit of the



Fig. 23. Measured long term jitter histogram of the proposed MDLL for 0 pF and 20 pF decoupling capacitor cases.



Fig. 24. Measured spectrum of the proposed MDLL for 0 pF and 20 pF decoupling capacitor cases



Fig. 25. Measured MDLL power consumption and energy-per-cycle as a function of utilization for various power-cycling periods.

proposed transmitter, [5], and [4]<sup>1</sup> is approximately 5.2, 3.6 and 1.7, respectively. Table II compares the proposed MDLL with the state-of-the-art fast power-on frequency multipliers. The proposed MDLL achieves smallest power-on time (3 reference cycles) and competitive power efficiency (0.88 mW/GHz).

## VII. CONCLUSION

Despite being energy efficient at their peak data rates, conventional links suffer from efficiency degradation at lower link utilization. Power cycling technique is employed to achieve constant energy-per-bit operation across all utilization levels of a link. In this work, we have presented a fast power-on transmitter architecture, which demonstrates energy proportional operation over wide variations of link utilization, and is therefore suitable for energy efficient links. The transmitter

<sup>&</sup>lt;sup>1</sup>The reason behind a flat normalized energy-per-bit versus effective data rate plot (see Fig. 26) for [4] even at relatively low data rates is due to 0 mW offstate power reported in [4]. However, a non-zero (even few microwatts) off-state power could increase the energy-per-bit at lower effective data rates as seen in the energy-per-bit versus data rate plot in [4].



Fig. 26. Normalized power consumption and energy-per-bit of the proposed transmitter and prior-art as a function of effective data rate for 8 byte burst length.

|                          | This Work    | [19]VLSI'13 | [21]CICC'12          | [4]VLSI'11       | [15]ISSCC'12           | [5]JSSC'10 |
|--------------------------|--------------|-------------|----------------------|------------------|------------------------|------------|
| Technology               | 90nm         | 40nm        | 65nm                 | 40nm             | 22nm                   | 40nm       |
| Supply(V)                | 1.1          | N/A         | 1.1                  | N/A              | 1                      | 1.1        |
| Output frequency(GHz)    | 2.5          | 25          | 2.3-4                | 2.8              | 3.2                    | 4.3        |
| Reference fequency(MHz)  | 312.5        | 390         | 790 <sup>†</sup>     | 700              | 100                    | 537.5      |
| Jitter long-term         | 2/18.6(0pF)  |             | N/A                  | N/A              | 6/N/A                  | N/A        |
| (rms/pp) (ps)            | 1.1/10(20pF) | N/A         |                      |                  |                        |            |
| Power efficiency(mW/GHz) | 0.88         | 2.56        | 30.4 <sup>†</sup>    | 4.8*             | 1.06                   | N/A        |
| Power(mW)                | 2.2          | 64          | 96†                  | 13.44            | 3.4                    | N/A        |
| Power-on time(s)         | 10ns         | 100ns       | 12.65ns <sup>†</sup> | 8ns <sup>◊</sup> | $1.83\mu s^{\ddagger}$ | 241.8ns    |
| Power-on time            | _            |             |                      |                  |                        |            |
| (Reference Cycles)       | 3            | 40          | 10                   | 5.6              | 183                    | 130        |
| Area(mm <sup>2</sup> )   | 0.16         | 0.1         | 0.149                | N/A              | 0.017                  | N/A        |
| Architecture             | MDLL         | PLL         | MILO                 | MILO             | PLL                    | PLL        |

 TABLE II

 PERFORMANCE COMPARISON OF THE PROPOSED FAST-ON CLOCK MULTIPLIER WITH STATE-OF-THE-ART DESIGNS

† Results reported at 3.16 GHz

\* Individual MILO power is not reported in [4]

Power-On time of the transmitter including MILO

‡ Power-on time is reported at 3 GHz

architecture combines architectural and circuit design techniques to achieves fast power-on capability in the voltage mode output driver by using fast-digital regulator, and in the MDLL based clock multiplier by accurate frequency pre-setting and instantaneous phase acquisition. An improved edge replacement logic circuit for the MDLL was presented to ease timing requirements. The prototype fast power-on transmitter was fabricated in 90 nm CMOS technology and occupies an active die area of 0.3 mm<sup>2</sup>. The proposed fast power-on transmitter architecture achieves 10 ns total power-on time, which is

limited by the clock multiplier, and consumes 4.8 mW/36  $\mu$ W on/off-state power from 1.1 V supply. The voltage mode driver and the clock multiplier achieve power-on-time of only 2 ns and 10 ns, respectively. The transmitter achieves  $100 \times$  effective data rate scaling (5 Gb/s–0.048 Gb/s), while scaling the power and energy efficiency by only 50× (4.8 mW–0.095 mW) and  $2 \times (1-2 \text{ pJ/Bit})$ , respectively.

#### ACKNOWLEDGMENT

The authors thank Berkeley Design Automation for providing the Analog Fast Spice (AFS) simulator and the anonymous reviewers for their helpful feedback.

#### REFERENCES

- F. O'Mahony, G. Balamurugan, J. Jaussi, J. Kennedy, M. Mansuri, S. Shekhar, and B. Casper, "The future of electrical I/O for microprocessors," in *Proc. IEEE Symp. VLSI Design Automation and Test* (VLSI-DAT), 2009, pp. 31–34.
- [2] G. Balamurugan, J. Kennedy, G. Banerjee, J. Jaussi, M. Mansuri, F. O'Mahony, B. Casper, and R. Mooney, "A scalable 5–15 Gbps, 14–75 mW low-power I/O transceiver in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 1010–1019, Apr. 2008.
- [3] M. Mansuri, J. Jaussi, J. Kennedy, T. Hsueh, S. Shekhar, G. Balamurugan, F. O'Mahony, C. Roberts, R. Mooney, and B. Casper, "A scalable 0.128-to-1 Tb/s 0.8-to-2.6 pJ/b 64-lane parallel I/O in 32 nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, 2013, pp. 402–403.
- [4] J. Zerbe, B. Daly, W. Dettloff, T. Stone, W. Stonecypher, P. Venkatesan, K. Prabhu, B. Su, J. Ren, B. Tsang, B. Leibowitz, D. Dunwell, A. Carusone, and J. Eble, "A 5.6 Gb/s 2.4 mW/Gb/s bidirectional link with 8 ns power-on," in *Proc. IEEE Symp. VLSI Circuits*, 2011, pp. 82–83.
- [5] B. Leibowitz, R. Palmer, J. Poulton, Y. Frans, S. Li, J. Wilson, M. Bucher, A. Fuller, J. Eyles, M. Aleksic, T. Greer, and N. Nguyen, "A 4.3 GB/s mobile memory interface with power-efficient bandwidth scaling," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 889–898, Apr. 2010.
- [6] F. O'Mahony, J. Jaussi, J. Kennedy, G. Balamurugan, M. Mansuri, C. Roberts, S. Shekhar, R. Mooney, and B. Casper, "A 47 × 10 Gb/s 1.4 mW/Gb/s parallel interface in 45 nm CMOS," , vol. 45, no. 12, pp. 2828–2837, Dec. 2010.
- [7] Energy Efficient Ethernet Task Force, 2010, IEEE P802.3az [Online]. Available: http://grouper.ieee.org/groups/802/3/az
- [8] K. Christensen, P. Reviriego, B. Nordman, M. Bennett, M. Mostowfi, and J. Maestro, "IEEE 802.3az: The road to energy efficient ethernet," *IEEE Commun. Mag.*, vol. 48, no. 11, pp. 50–56, Nov. 2010.
- [9] L. Barroso and U. Holzle, "The case for energy-proportional computing," *Computer*, vol. 40, no. 12, pp. 33–37, Dec. 2007.
- [10] T. Anand, M. Talegaonkar, A. Elshazly, B. Young, and P. Hanumolu, "A 2.5 GHz 2.2 mW/25 μW on/off-state power 2 ps<sub>rms</sub>-long-termjitter digital clock multiplier with 3-reference-cycles power-on time," in *IEEE ISSCC Dig. Tech. Papers*, 2013, pp. 256–257.
- [11] H. Hatamkhani, K.-L. J. Wong, R. Drost, and C.-K. K. Yang, "A 10-mW 3.6-Gbps I/O transmitter," in *Proc. IEEE Symp. VLSI Circuits*, 2003, pp. 97–98.
- [12] Y. Okuma, K. Ishida, Y. Ryu, X. Zhang, P.-H. Chen, K. Watanabe, M. Takamiya, and T. Sakurai, "0.5-V input digital LDO with 98.7% current efficiency and 2.7-μA quiescent current in 65 nm CMOS," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, 2010, pp. 1–4.
- [13] M. Onouchi, K. Otsuga, Y. Igarashi, T. Ikeya, S. Morita, K. Ishibashi, and K. Yanagisawa, "A 1.39-V input fast-transient-response digital LDO composed of low-voltage MOS transistors in 40-nm CMOS process," in *Proc. IEEE Asian Solid-State Circuits Conf.*, 2011, pp. 37–40.
- [14] P. Hanumolu, M. Brownlee, K. Mayaram, and U.-K. Moon, "Analysis of charge-pump phase-locked loops," *IEEE Trans. Circuits Syst. I*, vol. 51, no. 9, pp. 1665–1674, Sep. 2004.
- [15] N. August, H. Jin Lee, M. Vandepas, and R. Parker, "A TDC-less ADPLL with 200-to-3200 MHz range and 3 mW power dissipation for mobile SoC clocking in 22 nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, 2012, pp. 246–248.

- [16] W.-H. Chiu, Y.-H. Huang, and T.-H. Lin, "A dynamic phase error compensation technique for fast-locking phase-locked loops," *IEEE J. Solid-State Circuits*, vol. 45, no. 6, pp. 1137–1149, Jun. 2010.
- [17] T.-H. Chien, C.-S. Lin, Y. Z. Juang, C.-M. Huang, and C.-L. Wey, "An edge-missing compensator for fast-settling wide-locking-range PLLs," in *IEEE ISSCC Dig. Tech. Papers*, 2009, pp. 394–395, 395a.
- [18] K. Woo, Y. Liu, E. Nam, and D. Ham, "Fast-lock hybrid PLL combining fractional-N and integer-N modes of differing bandwidths," *IEEE J. Solid-State Circuits*, vol. 43, no. 2, pp. 379–389, Feb. 2008.
- [19] R. Navid, M. Hekmat, F. Aryanfar, J. Wei, and V. Gadde, "A 25 GHz 100 ns lock time digital LC PLL with an 8-phase output clock," in *Proc. IEEE Symp. VLSI Circuits*, 2013, pp. 196–197.
- [20] M. Izad and C.-H. Heng, "A pulse shaping technique for spur suppression in injection-locked synthesizers," *IEEE J. Solid-State Circuits*, vol. 47, no. 3, pp. 652–664, Mar. 2012.
- [21] D. Dunwell, A. Carusone, J. Zerbe, B. Leibowitz, B. Daly, and J. Eble, "A 2.3–4 GHz injection-locked clock multiplier with 55.7% lock range and 10-ns power-on," in *Proc. IEEE Custom Integr. Circuits Conf.* (CICC), 2012, pp. 1–4.
- [22] R. Farjad-rad, W. Dally, H.-T. Ng, R. Senthinathan, M.-J. Lee, R. Rathi, and J. Poulton, "A low-power multiplying DLL for low-jitter multigigahertz clock generation in highly integrated digital chips," *IEEE J. Solid-State Circuits*, vol. 37, no. 12, pp. 1804–1812, Dec. 2002.
- [23] S. Ye, L. Jansson, and I. Galton, "A multiple-crystal interface PLL with VCO realignment to reduce phase noise," *IEEE J. Solid-State Circuits*, vol. 37, no. 12, pp. 1795–1803, Dec. 2002.
- [24] G.-Y. Wei, J. Stonick, D. Weinlader, J. Sonntag, and S. Searles, "A 500 MHz MP/DLL clock generator for a 5 Gb/s backplane transceiver in 0.25 μm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, 2003, vol. 1, pp. 464–465.
- [25] A. Elshazly, R. Inti, B. Young, and P. Hanumolu, "Clock multiplication techniques using digital multiplying delay-locked loops," *IEEE J. Solid-State Circuits*, vol. 48, no. 6, pp. 1416–1428, Jun. 2013.
- [26] B. Helal, M. Straayer, G.-Y. Wei, and M. Perrott, "A highly digital MDLL-based clock multiplier that leverages a self-scrambling time-todigital converter to achieve subpicosecond jitter performance," *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 855–863, Apr. 2008.
- [27] J. Poulton, R. Palmer, A. Fuller, T. Greer, J. Eyles, W. Dally, and M. Horowitz, "A 14-mW 6.25-Gb/s transceiver in 90-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 42, no. 12, pp. 2745–2757, Dec. 2007.
- [28] Y. Lu, W.-H. Ki, and C. Yue, "A 0.65 ns-response-time 3.01 ps FOM fully-integrated low-dropout regulator with full-spectrum power-supply-rejection for wideband communication systems," in *IEEE ISSCC Dig. Tech. Papers*, 2014, pp. 306–307.
- [29] J. Bulzacchelli, Z. Toprak-Deniz, T. Rasmus, J. Iadanza, W. Bucossi, S. Kim, R. Blanco, C. Cox, M. Chhabra, C. LeBlanc, C. Trudeau, and D. Friedman, "Dual-loop system of distributed microregulators with high dc accuracy, load response time below 500 ps, and 85-mV dropout voltage," *IEEE J. Solid-State Circuits*, vol. 47, no. 4, pp. 863–874, 2012.
- [30] P. Hazucha, T. Karnik, B. Bloechel, C. Parsons, D. Finan, and S. Borkar, "Area-efficient linear regulator with ultra-fast load regulation," *IEEE J. Solid-State Circuits*, vol. 40, no. 4, pp. 933–940, 2005.



**Tejasvi Anand** (S'12) is currently pursuing the Ph.D. degree at the University of Illinois, Urbana-Champaign, IL, USA. He received the M.Tech. degree (first class with distinction) in electronics design and technology from the Indian Institute of Science, Bangalore, India, in 2008.

From 2008 to 2010, he worked as an Analog Design Engineer at Cosmic Circuits (now a part of Cadence), Bangalore, on the design of analog-to-digital converters. From 2010 to 2011, he worked as a Project Associate at Indian Institute of Science, Ban-

galore, where he was involved in the design of neural recoding system and RF building blocks. His research interests are in energy efficient high-speed wireline communication systems, frequency synthesizers, data converters and energy efficient sensors.

Mr. Anand received the Analog Devices Outstanding Student Designer Award in 2013.



Amr Elshazly (S'04–M'13) received the B.Sc. (Hons.) and M.Sc. degrees from Ain Shams University, Cairo, Egypt, in 2003 and 2007, respectively, and the Ph.D. degree from Oregon State University, Corvallis, OR, USA, in 2012, all in electrical engineering.

He is currently a Design Engineer at Intel Corporation, Hillsboro, OR, USA, developing high-performance high-speed I/O circuits and architectures for next generation process technologies. From 2004 to 2006, he was a VLSI Circuit Design

Engineer at AIAT, Inc. working on the design of RF building blocks. From 2006 to 2007, he was with Mentor Graphics Inc., Cairo, designing multi-standard clock and data recovery circuits. His research interests include high-speed serial-links, frequency synthesizers, digital phase-locked loops, multiplying delay-locked loops, clock and data recovery circuits, data converter techniques, and low-power mixed-signal circuits.

Dr. Elshazly received the Analog Devices Outstanding Student Designer Award in 2011, the Center for Design of Analog-Digital Integrated Circuits (CDADIC) Best Poster Award in 2012, and the Graduate Research Assistant of the year Award in 2012 from the College of Engineering at the Oregon State University. He serves as a reviewer for the IEEE JOURNAL OF SOLID-STATE CIRCUITS, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I AND II, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION SYSTEMS, IEEE International Symposium on Circuits and Systems, IEEE International Conference of Electronic Circuits Systems, and IEEE Asian Solid-State Circuits Conference.



**Mrunmay Talegaonkar** received the B.Tech. degree in electrical engineering and the M.Tech. degree in microelectronics and VLSI Design from Indian Institute of Technology Madras, Chennai, India, in 2007. He is currently pursuing the Ph.D. degree at University of Illinois, Urbana-Champaign, IL, USA.

Between 2007 and 2009, he worked as a design engineer at Analog Devices, Bangalore, India, where he was involved in design of digital-to-analog converters. During 2009–2010, he was a project associate at Indian Institute of Technology Madras,

working on high speed clock and data recovery circuits. From 2010 to 2013, he was a research assistant, working on high speed links, at Oregon State University, Corvallis, OR, USA. His research interests include high speed I/O interfaces and clocking circuits.



**Brian Young** received the B.S. degree in electrical engineering from the Pennsylvania State University in 2000 and the Ph.D. degree from Oregon State University in 2013.

From 2000 to 2003, he was with the Timing Solutions Operation of Motorola Semiconductor in Chandler, AZ. From 2003 to 2007, he was with AMI Semiconductor in Lower Gwynedd, PA. Since 2013, he has been with Marvell Semiconductor, Corvallis, OR. His research interests include time-based data converters, digital PLLs, and high performance

mixed-signal circuits.

Dr. Young has received the 2010 and 2013 Analog Devices Outstanding Student Designer Award.



**Pavan Kumar Hanumolu** (S'99–M'07) is currently an Associate Professor in the Department of Electrical and Computer Engineering and a Research Associate Professor with the Coordinated Science Laboratory at the University of Illinois, Urbana-Champaign, IL, USA. He received the Ph.D. degree from the School of Electrical Engineering and Computer Science at Oregon State University, Corvallis, in 2006, where he subsequently served as a faculty member till 2013. Dr. Hanumolu's research interests are in energy-efficient integrated circuit

implementation of analog and digital signal processing, sensor interfaces, wireline communication system, and power conversion.

Dr. Hanumolu received the National Science Foundation CAREER Award in 2010. He currently serves as an Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS, and is a technical program committee member of the VLSI Circuits Symposium, and IEEE International Solid-State Circuits Conference.