## 28.4 A 20 Gb/s/pin 1.18pJ/b 1149µm² Single-Ended Inverter-based 4-tap Addition-Only Feed-Forward Equalization Transmitter with Improved Robustness to Coefficient Errors in 28nm CMOS

Changjae Moon, Jaeyoung Seo, Myungguk Lee, Iksu Jang, Byungsub Kim

Pohang University of Science and Technology, Pohang, Korea

This paper presents an inverter-based 4-tap addition-only feed forward equalization (FFE) transmitter (TX) for compact and power-efficient single-ended memory interfaces. Area and power efficiencies of I/Os are increasingly important as I/Os are becoming faster, more complex, and larger; whereas, the area and power budget is limited. Source-series termination (SST) drivers, which are widely used in FFE TXs, provide good linearity and good impedance matching; however, the linear resistors occupy too much area and add significant parasitic capacitance, thereby dissipating additional power [1] (Fig. 28.4.1). On the other hand, inverters occupy a small area and are power-efficient, but matching their impedance and controlling their output voltage is difficult due to their nonlinear behavior. For example, replacing SST drivers in a conventional FFE (C-FFE) with inverters results in the pull-up PMOSs and pull-down NMOSs to turn on simultaneously during FFE tap subtraction. In this case, the output voltage is not well defined, since it is sensitive to supply noise, mismatch, and process variation. To overcome these issues, a 4-tap addition-only FFE (A-FFE) is proposed: it improves area, power consumption, and the resulting voltage swing, as well as its robustness to coefficient errors (Fig. 28.4.1).

The proposed A-FFE is compared to the C-FFE and the coefficient-error-robust FFE (B-FFE) [2] in Fig. 28.4.2. Functionally, they produce identical outputs if their coefficients are related by the formulas shown in Fig. 28.4.2. Unlike the other FFEs, A-FFE generates its output without using tap subtraction. Figure 28.4.3 shows the tap output waveforms and the overall A-FFE TX waveform for a single bit pattern (...0001000...). The maincursor tap is always enabled, while the other taps are only enabled when necessary; furthermore, the enabled tap polarities are identical, avoiding the need for tap subtraction. Figure 28.4.3 also shows formulas expressing all outputs of the C-FFE and the A-FFE in terms of their tap weights. Whereas C-FFE taps are subtracting in some cases, A-FFE taps are always additive (Fig. 28.4.3).

The addition-only property of A-FFE has three advantages: (1) it allows for using inverter drivers as FFE taps; (2) it saves unnecessary power consumption required by tap subtraction; (3) it is robust to coefficient errors. Since A-FFE does not subtract taps, the inverter tap drivers never pulls the output up or down simultaneously. In A-FFE operation, either NMOS or PMOS transistors drive the channel; since the channel has a linear characteristic (50 $\Omega$ ) the output voltage can be controlled by configuring the strengths of the inverter banks. Therefore, A-FFE can employ area-and-power-efficient inverter drivers. By avoiding tap subtraction, A-FFE also saves unnecessarily consumed power while subtracting taps as B-FFE does [2]. If pull-up and pull-down transistors simultaneously turn on, the current flowing from the power supply to ground does not contribute to signaling, and thus is wasted. A-FFE also suppresses the received error signal caused by coefficient errors by utilizing the channel loss as these coefficient errors are modulated up to higher frequencies, just like the B-FFE [2], and the intrinsic channel loss attenuates these high-frequency error signals.

g To verify the proposed concept, a 4-tap A-FFE TX was designed for a single-ended memory interface utilizing inverter drivers. Figure 28 4 4 designed. \_diagram of the TX. The TX is designed using a half-rate architecture: consisting of a full-ਨੁrate A-FFE summing driver, serializing 2:1 multiplexers (MUXs), half-rate decoding த் blocks, a duty cycle corrector (DCC), and latch-based half-rate shift registers. The A-FFE summing driver consists of three strong drivers and two weak drivers. A strong driver ㅎis an inverter bank whose driving strength can be digitally configured. Its pull-up and g pull-down strengths can be independently configured. A weak driver is a current-starved inverter whose strength can be configured by the tail current source. Because the main ⊒ and 2<sup>st</sup> postcursor tap coefficients of A-FFE are much smaller than the pre and postcursor at taps, they were implemented with weak drivers for precise tap control. The large pre and g postcursor taps were implemented with strong drivers for sufficient driving strength. The main tap is always enabled while the other taps turn on only as necessary. A booster Etap is implemented with a strong driver to compensate for the inverter's nonlinear behavior when the TX output voltage is close to the supply or ground. For example, if the TX output voltage is getting close to the supply, then the pull-up strengths of the strong drivers become reduced due to the reduced drain voltage. Therefore, in this case, the TX output voltage does not increase sufficiently. In this case, the booster tap is  $\stackrel{ extstyle imes}{ extstyle imes}$  enabled to raise the TX output voltage to the appropriate level. Likewise, the booster tap also helps in pulling down the TX output when necessary. As the TX output impedance is not  $50\Omega$ , signal integrity problems may arise. Based on the relaxed impedance

matching method [3], the signal integrity problems are resolved by matching the farend terminal to  $50\Omega$ . This technique can improve the voltage swing of the inverters. The half-rate digital inputs are serialized by the re-timing 2:1 MUXs and then fed to the A-FFE TX summing drivers. While the inputs ( $D_{o1}$  and  $D_{e1}$ ) of the main tap is directly fed from the shift register, the inputs to the other taps are generated by the decoding blocks, which are composed of simple digital gates. The half-rate shift register uses dynamic latches for area and power efficiency.

The test chip was fabricated in a 28 nm CMOS technology, and tested using a 1.1V supply and a PRBS31 pattern. The TX output was connected to an oscilloscope through a PCB trace and an SMA cable. The loss of the PCB trace is measured to be 15dB at 10GHz. The TX achieves a maximum speed of 20Gb/s/pin while achieving a 55.1mV eye height and 0.44UI width, while consuming 1.18pJ/b (Fig. 28.4.5). When the booster tap is disabled, the eye height is reduced to 30.9mV (Fig. 28.4.4). This result shows that the booster tap improves the eye height by 78%, thereby compensating for the nonlinearity of the inverters. The eye sensitivity [2] is also measured for all coefficients. The worst eye sensitivity is measured when the pre-cursor tap coefficient is reduced by 20%. The eye height decreases by only 13.6% (from 55.1mV to 47.6mV), which corresponds to an eye sensitivity of 68%. This result verifies that A-FFE is robust to coefficient errors. The energy consumption of the entire TX and the A-FFE TX summing driver are also measured for various data transition probabilities. Since the TX is mostly composed of digital circuits and A-FFE taps are only turned on when needed, excluding the main tap, the energy consumption linearly increases with the probability of a data transition. This result shows that the proposed A-FFE saves unnecessary power consumption in the idle

The performance of the proposed TX is summarized and compared with other similar prior art [1, 2, 4] in Fig. 28.4.6. The proposed A-FFE architecture completely removes subtraction of taps for a 4-tap FFE. Note that a 2-tap B-FFE never subtracts taps, but a 4-tap B-FFE may require tap subtraction. Using an addition-only architecture allows for inverter drivers to be utilized for a 4-tap TX FFE. Because area-efficient inverter drivers are employed, the TX occupies only 1149 $\mu$ m². The TX occupies a smaller area than the passive equalization TX [4], although the TX in [4] consumes less energy. Figure 28.4.7 shows the die micrograph. Other prior art [1] achieves a slower speed (18Gb/s) than the proposed TX. The measured eye sensitivity of the proposed TX is similar to the best prior art [2].

## Acknowledgement:

This work was supported in part by the Commercializations Promotion Agency for R&D Outcomes (COMPA) grant funded by the Korea government (MSIT) (No. 20211100); in part by the BK21 FOUR Project of NRF for the Department of Electrical Engineering, POSTECH; in part by the Institute for Information and Communications Technology Promotion (IITP), Korea (No. 2019001394); in part by design and application of next generation non-volatile memory hierarchy Cluster Academia Collaboration Program funded by Samsung Electronics; and in part by Samsung Electronics Co., Ltd (IO201211-08055-01). Authors would like to thank IC Design Education Center (IDEC) for tool supports.

## References:

- [1] S. Lee et al., "An 8 nm 18 Gb/s/pin GDDR6 PHY with TX Bandwidth Extension and RX Training Technique," *ISCCC*, pp. 338-339, 2020.
- [2] S. Han et al., "A Coefficient-Error-Robust FFE TX with 230 % Eye Variation Improvement Without Calibration in 65 nm CMOS Technology", *ISSCC*, pp. 50-51, 2014. [3] M. Choi et al., "An FFE Transmitter Which Automatically and Adaptively Relaxes Impedance Matching," *IEEE JSSC*, vol. 53, no. 6, pp. 1780-1792, June, 2018.
- [4] B. Dehlaghi et al., "A 0.3 pJ/bit 20 Gb/s/Wire Parallel Interface for Die-to-Die Communication," *IEEE JSSC*, vol. 51, no. 11, pp. 2690-2701, Nov., 2016.

D

D

D

AFFE TX

 $A_0 = 2W_{0}$ 

 $A_2 = 2W_2$ 

 $A_3 = 2W_3$ 

 $A_1 = W_1 - W_0 - W_2 - W_3$ 

+0.5 ( -0.5 @ Negative

2<sup>nd</sup> post-cursor)

 $B_0 = W_1 + W_3 - W_0 - W_2$ 

 $B_1 = 2(W_1 + W_3 - W_2)$ 

 $B_2 = 2(W_2 - W_3)$ 

 $B_3 = 2W_3$ 

+B2

**BFFE TX** 

D

D

W<sub>0</sub>, W<sub>1</sub>, W<sub>2</sub>, W<sub>3</sub> > 0,

W<sub>1</sub>: Main cursor

D

D

**CFFE TX** 



Figure 28.4.1: Comparison of 4-tap TX FFE implementation options: SST-based C-FFE, an inverter-based C-FFE, and the proposed inverter-based A-FFE.



Figure 28.4.3: Single bit response of A-FFE TX, and comparison table between C FFE and A-FFE.



30.9 mV

Figure 28.4.4: TX schematic diagram.



| Figure 28.4.5: Measured eye diagrams with and without a 20% error on the most     |
|-----------------------------------------------------------------------------------|
| sensitive tap coefficient (top). Measured A-FFE energy consumption (bottom-left). |
| Driver energy consumption versus probability of data transition (bottom-right).   |

|                                      | ISSCC'20 [1]                                          | ISSCC'14 [2] | JSSC'16 [4]                                          | This work                           |
|--------------------------------------|-------------------------------------------------------|--------------|------------------------------------------------------|-------------------------------------|
| Technology (nm)                      | 8                                                     | 65           | 28 FD-SOI                                            | 28 LPP                              |
| Supply voltage (V)                   | VDDQ = 1.35,<br>VDD = 0.85                            | 1.3          | N/A                                                  | 1.1                                 |
| Single/Differential                  | Single                                                | Differential | Single                                               | Single                              |
| Driver Type                          | Voltage-mode driver<br>+ capacitive-peaking<br>driver | CML          | High-impedance<br>Inverter + RC high-<br>pass filter | Inverter + current starved inverter |
| Equalization (TX)                    | 1-tap de-emphasis,<br>Edge boost, FEXT EQ             | 4-tap BFFE   | Passive EQ                                           | 4-tap AFFE                          |
| FFE Tap Addition-Only                | N/A                                                   | Χ            | N/A                                                  | 0                                   |
| Data pattern                         | N/A                                                   | N/A          | PRBS 7                                               | PRBS 31                             |
| Data rate (Gb/s)                     | 18                                                    | 8            | 20                                                   | 20                                  |
| Channel loss (dB)                    | 10                                                    | 25           | 10.7                                                 | 15 (PCB trace only)                 |
| Worst eye sensitivity                | N/A                                                   | 0.56         | N/A                                                  | 0.68                                |
| Energy efficiency (TX) (pJ/b)        | N/A                                                   | N/A          | 0.14                                                 | 1.18                                |
| Area (µm²)                           | 4151250*                                              | 2128         | 4556                                                 | 1149                                |
| Area includes PLL, CA Slice, and dat | a slice (16bit)                                       |              |                                                      |                                     |

## **ISSCC 2022 PAPER CONTINUATIONS**



Figure 28.4.7: Chip micrograph.