

#### 저작자표시-비영리-변경금지 2.0 대한민국

#### 이용자는 아래의 조건을 따르는 경우에 한하여 자유롭게

• 이 저작물을 복제, 배포, 전송, 전시, 공연 및 방송할 수 있습니다.

#### 다음과 같은 조건을 따라야 합니다:



저작자표시. 귀하는 원저작자를 표시하여야 합니다.



비영리. 귀하는 이 저작물을 영리 목적으로 이용할 수 없습니다.



변경금지. 귀하는 이 저작물을 개작, 변형 또는 가공할 수 없습니다.

- 귀하는, 이 저작물의 재이용이나 배포의 경우, 이 저작물에 적용된 이용허락조건 을 명확하게 나타내어야 합니다.
- 저작권자로부터 별도의 허가를 받으면 이러한 조건들은 적용되지 않습니다.

저작권법에 따른 이용자의 권리는 위의 내용에 의하여 영향을 받지 않습니다.

이것은 이용허락규약(Legal Code)을 이해하기 쉽게 요약한 것입니다.

Disclaimer 🖃





## **Doctoral Dissertation**

# Design of Compact and Energy-Efficient Inverter-Based High-Speed Transmitter

Changjae Moon (문 창 재)

Department of Electrical Engineering

Pohang University of Science and Technology

2025



# 컴팩트하고 에너지 효율적인 인버터 기반 고속 송신기 설계

Design of Compact and Energy-Efficient Inverter-Based High-Speed Transmitter



# Design of Compact and Energy-Efficient Inverter-Based High-Speed Transmitter

by

Changjae Moon

Department of Electrical Engineering

Pohang University of Science and Technology

A dissertation submitted to the faculty of the Pohang University of Science and Technology in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Electrical Engineering

Pohang, Korea

12. 20. 2024

Approved by

Byungsub Kim (Signature)

Academic advisor



# Design of Compact and Energy-Efficient Inverter-Based High-Speed Transmitter

# Changjae Moon

The undersigned have examined this dissertation and hereby certify that it is worthy of acceptance for a doctoral degree from POSTECH

12. 20. 2024

Committee Chair Byungsub Kim

Collection @ pos

Member Hong-Jun Park

Member Jae-Yoon Sim

Member Ho-jin Song

Member Jaeyoung Seo

DEE

문창재 Changjae Moon

20192387

Design of Compact and Energy-Efficient Inverter-Based

High-Speed Transmitter.

컴팩트하고 에너지 효율적인 인버터 기반 고속 송신기 설

계.

Department of Electrical Engineering, 2025, 92p

Advisor: Byungsub Kim.

Text in English.

#### **ABSTRACT**

This thesis introduces design techniques for developing compact and energyefficient inverter-based high-speed transmitters for memory interfaces. The research is divided into two main components: a study on a novel feed-forward equalizing (A-FFE) transmitter (TX) architecture based on addition-only operations, and the development of a four-level pulse-amplitude modulation (PAM4) transmitter with crosstalk compensation (XTC) implemented with inverter-based XTC taps.

The first component presents an inverter-based 4-tap A-FFE TX designed for compact and power-efficient single-ended interfaces. Conventional FFE (C-FFE) TXs typically employ source-series terminated (SST) drivers; however, their linear resistors consume substantial area and introduce significant parasitic capacitance, compromising both power efficiency and output bandwidth. To address these challenges, we developed the A-FFE architecture, which eliminates subtractions between FFE taps

and enhances robustness to coefficient quantization errors. These improvements facilitate the implementation of area-and-power-efficient inverter drivers in FFE. The prototype, fabricated in 28-nm LP CMOS technology, demonstrates a data rate of 20 Gb/s/pin, achieving an eye height of 55.1 mV and an eye width of 0.44 UI over a 15 dB PCB trace, while maintaining power efficiency at 1.18 pJ/b and 68 % worst eye sensitivity. Notably, reducing the most sensitive FFE coefficient by 20 % resulted in only a 13.6 % decrease in eye-opening. The resistor-free design, utilizing inverter drivers, achieves a compact layout of just 1149 µm<sup>2</sup>.

The second component introduces a PAM4 TX with XTC designed for short-reach memory interfaces. The design incorporates efficient encoders and transition detectors to recognize crosstalk-inducing patterns and control inverter-based XTC taps accordingly. Precise gain and delay control in the XTC system minimizes compensation errors arising from mismatches between victim and aggressor channels. Implemented in 28 nm LP CMOS technology, the TX operates at 16 Gb/s and demonstrates significant performance improvements with XTC enabled, showing 203 % and 396 % increases in eye height and width, respectively. This area-efficient design, employing inverter-based XTC taps, requires only 0.0067 mm², resulting in an area per data rate of 0.00042 mm²/Gbps.

# **Contents**

| I.  | Int   | roduction                                                               | 1  |  |  |  |  |  |  |  |
|-----|-------|-------------------------------------------------------------------------|----|--|--|--|--|--|--|--|
|     | 1.1   | Motivation                                                              |    |  |  |  |  |  |  |  |
|     | 1.2   | Application and Problem Overview                                        |    |  |  |  |  |  |  |  |
|     | 1.3   | Thesis Contributions                                                    |    |  |  |  |  |  |  |  |
|     | 1.4   | Thesis Organization                                                     |    |  |  |  |  |  |  |  |
| II. | Bac   | ckground                                                                | 7  |  |  |  |  |  |  |  |
|     | 2.1   | Resistive Termination Techniques for <i>LC</i> -Dominant Channels       | 7  |  |  |  |  |  |  |  |
|     |       | 2.1.1 An Intuitive Analytical Channel Model                             | 7  |  |  |  |  |  |  |  |
|     |       | 2.1.2 Resistive Termination Techniques for <i>LC</i> -Dominant Channels | 13 |  |  |  |  |  |  |  |
|     | 2.2   | Feed-Forward Equalization (FFE)                                         |    |  |  |  |  |  |  |  |
|     | 2.3   | Coefficient-Error-Robust FFE (B-FFE)                                    |    |  |  |  |  |  |  |  |
|     | 2.4   | Far-End Crosstalk (FEXT)                                                |    |  |  |  |  |  |  |  |
| Ш   | . Des | sign of a Single-Ended Inverter-based Addition-Only Feed-Forward        |    |  |  |  |  |  |  |  |
|     | Equ   | ualization Transmitter                                                  | 27 |  |  |  |  |  |  |  |
|     | 3.1   | Overview                                                                | 28 |  |  |  |  |  |  |  |
|     | 3.2   | Architecture                                                            | 36 |  |  |  |  |  |  |  |
|     | 3.3   | Robustness to Quantization Errors of Coefficients                       | 45 |  |  |  |  |  |  |  |
|     | 3.4   | Transmitter Design                                                      | 49 |  |  |  |  |  |  |  |
|     | 3.5   | Measurement Results                                                     | 54 |  |  |  |  |  |  |  |
|     | 3.6   | Summary                                                                 | 59 |  |  |  |  |  |  |  |
| IV. | Des   | sign of Compact Single-ended PAM4 Transmitters with Inverter-based      |    |  |  |  |  |  |  |  |

**Crosstalk Compensation for Memory Interfaces** 

| Re                  | feren | res                 | 88 |  |  |  |
|---------------------|-------|---------------------|----|--|--|--|
| Summary (in Korean) |       |                     |    |  |  |  |
| V. Conclusion 84    |       |                     |    |  |  |  |
|                     | 4.5   | Summary             | 82 |  |  |  |
|                     | 4.4   | Measurement Results | 75 |  |  |  |
|                     | 4.3   | Transmitter Design  | 68 |  |  |  |
|                     | 4.2   | Architecture        | 66 |  |  |  |
|                     | 4.1   | Overview            | 62 |  |  |  |

# **List of Tables**

| 3.1 | A-FFE Sub-filter Outputs                                           | 38 |
|-----|--------------------------------------------------------------------|----|
| 3.2 | Tap Coefficients of C-FFE and A-FFE for Various Channel Losses     | 38 |
| 3.3 | FFE Output Formulas of C-FFE and A-FFE in terms of FFE Coeffi-     |    |
|     | cients                                                             | 44 |
| 3.4 | Performance Summary and Comparison                                 | 61 |
| 4.1 | Performance Summary and Comparison With Other Reported Transmitter | -  |
|     | side XTC Designs                                                   | 83 |

# **List of Figures**

| 1.1 | Data rate trend of DRAM interfaces [1]                                        | 2     |
|-----|-------------------------------------------------------------------------------|-------|
| 1.2 | Channel loss of DRAM interfaces [1]                                           | 3     |
| 2.1 | A schematic diagram of a wireline channel model                               | 8     |
| 2.2 | Intuitive circuit models of wireline channels having (a) voltage-mode         |       |
|     | and (b) current-mode TXs with RXs.                                            | 9     |
| 2.3 | (a) A cross-sectional view of an example LC-dominant interconnect.            |       |
|     | (b) The simulated characteristic impedance of the LC-dominant inter-          |       |
|     | connect versus the frequency.                                                 | 13    |
| 2.4 | The simulated $LC$ -dominant channel (Fig. 3(a)) loss versus the fre-         |       |
|     | quency when the channel length is 10 cm. Dominant sources of the              |       |
|     | loss are marked in (b).                                                       | 16    |
| 2.5 | An example of an interconnect with a CML TX and a voltage-mode                |       |
|     | RX. Parasitic capacitors Cpar are present at the TX output and RX             |       |
|     | input [21]                                                                    | 19    |
| 2.6 | The magnitude of transfer function $( V_RX(l,f)/I_TX(f) )$ of the in-         |       |
|     | terconnect (Fig. 11) with various $Z0 = 50 \Omega$ , Cpar = 500 fF, and vari- |       |
|     | ous RTX and RRX configuration [21]                                            | 19    |
| 2.7 | Comparison of transceiver architectures: (a) conventional transmitter         |       |
|     | without FFE and (b) N-tap FFE transmitter [22]. For a single-bit trans-       |       |
|     | mission case $(x[n] = 1)$ , simulation results show the signal propaga-       |       |
|     | tion through a low-pass channel h(t), from transmitted signal x(t) to         |       |
|     | received signal r(t).                                                         | 20    |
| 2.8 | Block diagrams of an FFE TX and the B-FFE TX [26]                             | 22    |
|     | / :                                                                           | 5. 5. |

| 2.9  | Error spectra of FFE and B-FFE for one bit pusle [26]                      | 22 |  |  |  |  |  |
|------|----------------------------------------------------------------------------|----|--|--|--|--|--|
| 2.10 | O A schematic diagram of a two-channel model for the FEXT model 2          |    |  |  |  |  |  |
| 3.1  | Comparison of 4-tap TX FFE design options : (a) SST-based C-FFE,           |    |  |  |  |  |  |
| J.1  | (b) an inverter-based C-FFE, and (c) the proposed inverter-based A-        |    |  |  |  |  |  |
|      |                                                                            | 20 |  |  |  |  |  |
| 2.2  | FFE                                                                        | 28 |  |  |  |  |  |
| 3.2  | Schematic of (a) an SST driver and (b) an inverter driver. The output      |    |  |  |  |  |  |
|      | swing amplitudes and average power consumption of the drivers are          |    |  |  |  |  |  |
|      | shown.                                                                     | 30 |  |  |  |  |  |
| 3.3  | The 20 Gb/s eye diagrams of an inverter-based 4-tap C-FFE TX (a)           |    |  |  |  |  |  |
|      | without and (b) with a 20 $\%$ error on the most sensitive tap coefficient |    |  |  |  |  |  |
|      | (the main-cursor). The 20 Gb/s eye diagrams of the inverter-based          |    |  |  |  |  |  |
|      | 4-tap A-FFE TX (c) without and (d) with a 20 % error on the most           |    |  |  |  |  |  |
|      | sensitive tap coefficient (1st post-cursor)                                | 31 |  |  |  |  |  |
| 3.4  | Output voltage histograms of (a) the inverter-based 4-tap C-FFE TX         |    |  |  |  |  |  |
|      | and (b) the inverter-based 4-tap A-FFE TX when the input data pattern      |    |  |  |  |  |  |
|      | $(D_{pre}, D_{main}, D_{post1}, D_{post2})$ is (-1, -1, -1, -1)            | 32 |  |  |  |  |  |
| 3.5  | Current flows of (a) an inverter-based C-FFE TX and (b) the corre-         |    |  |  |  |  |  |
|      | sponding proposed inverter-based A-FFE TX for the same FFE oper-           |    |  |  |  |  |  |
|      | ation. The C-FFE TX is subtracting FFE taps. The average power             |    |  |  |  |  |  |
|      | consumptions of the drivers are also shown                                 | 33 |  |  |  |  |  |
| 3.6  | (a) An example design of the 4-tap B-FFE architecture. (b) The single-     |    |  |  |  |  |  |
|      | bit response and tap driver outputs of the 4-tap B-FFE example             | 35 |  |  |  |  |  |
| 3.7  | Block diagrams of (a) an N-tap C-FFE TX and (b) an N-tap A-FFE TX.         | 37 |  |  |  |  |  |
| 3.8  | Example designs of the identical 4-tap FFE employing (a) C-FFE and         |    |  |  |  |  |  |
|      | (b) A-FFE architectures. The single-bit responses and tap driver out-      |    |  |  |  |  |  |
|      | puts of (c) the C-FFE and (d) the A-FFE examples                           | 42 |  |  |  |  |  |

| 3.9  | Error signals when C-FFE and A-FFE TXs transmit a single bit pulse       |      |
|------|--------------------------------------------------------------------------|------|
|      | at 20 Gb/s and there are 20 % quantization errors on (a) (m+1)-th C-     |      |
|      | FFE tap coefficient, (b) (m+2)-th A-FFE tap coefficient, (c) (m+3)-th    |      |
|      | A-FFE tap coefficient, and (d) (m+4)-th A-FFE tap coefficient. A 1st-    |      |
|      | order RC channel with a loss of 15 dB at Nyquist frequency and a time    |      |
|      | constant of 88 ps is employed for the simulations                        | 46   |
| 3.10 | The frequency-domain transmitted and received error signals of the       |      |
|      | C-FFE and the A-FFE caused (a) by 20 % quantization errors on the        |      |
|      | (m+1)-th C-FFE tap coefficient and (m+2)-th A-FFE tap coefficient,       |      |
|      | (b) by 20 % quantization errors on the (m+1)-th C-FFE tap coefficient    |      |
|      | and (m+3)-th A-FFE tap coefficient, (c) by 20 % quantization errors      |      |
|      | on the (m+1)-th C-FFE tap coefficient and (m+4)-th A-FFE tap coef-       |      |
|      | ficient, respectively.                                                   | 48   |
| 3.11 | A schematic diagram of the implemented 4-tap A-FFE TX                    | 50   |
| 3.12 | (a) The single-bit responses of the 4-tap A-FFE at TX output. (b) The    |      |
|      | single-bit responses of the 4-tap and 3-tap A-FFE at RX input            | 51   |
| 3.13 | Simulated and measured TX outputs with and without enabling a booster    |      |
|      | tap driver at 20 Gb/s with 1.1 V supply: (a) a simulated one bit pulse   |      |
|      | response, (b) a measured eye diagram with a disabled booster tap         |      |
|      | driver, (c) a measured eye diagram with a enabled booster tap driver     | 51   |
| 3.14 | Histograms of the A-FFE TX's output impedance and simulated eye          |      |
|      | diagrams of TX output across corners: (a) the pull-up impedance, (b)     |      |
|      | the pull-down impedance, and (c) simulated eye diagrams                  | 53   |
| 3.15 | A schematic diagram of clocking circuits for serializing 2:1 MUXs        | 54   |
| 3.16 | A die micrograph.                                                        | 56   |
| 3.17 | The test setup.                                                          | 57   |
| 3.18 | Measured eye diagrams of the 4-tap A-FFE TX: (a) without and (b)         |      |
|      | with a 20 $\%$ error on the most sensitive tap (1st pre-tap) coefficient | 58   |
|      |                                                                          |      |
|      | VIII (S)                                                                 |      |
|      | – VIII –                                                                 |      |
|      | ection @ postech                                                         |      |
| COII | color w pooleon                                                          | 4.84 |

| The measured A-FFE TX energy consumption versus probability of          |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |  |  |  |  |  |
|-------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|
| data transition: the energy consumption of (a) the total TX circuit and |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |  |  |  |  |  |
| (b) the only drivers                                                    | 58                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |  |  |  |
| The power breakdown of the A-FFE TX                                     | 59                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |  |  |  |
| PAM4 crosstalk compensation                                             | 63                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |  |  |  |
| Crosstalk compensation for (a) un-skewed and (b) skewed lanes           | 65                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |  |  |  |
| The overall architecture of the proposed TX                             | 67                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |  |  |  |
| The schematic diagram of the transmitter and the S-parameters of the    |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |  |  |  |  |  |
| interconnects                                                           | 68                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |  |  |  |
| A schematic diagram of the 4:2 MUX with the encoder for XTC seg-        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |  |  |  |  |  |
| ments                                                                   | 70                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |  |  |  |
| Encoding table for XTC segments, and the Boolean expressions of the     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |  |  |  |  |  |
| encoded input (INR, INF) to the four XTC segments (SEG1, SEG2,          |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |  |  |  |  |  |
| SEG3, and SEG4)                                                         | 71                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |  |  |  |
| (a) A schematic diagram of the DCDL and (b) delay increments of the     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |  |  |  |  |  |
| DCDL versus fine and coarse digital codes under different corners and   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |  |  |  |  |  |
| temperatures                                                            | 73                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |  |  |  |
| (a) Retimer and 2-to-1 MUX. (b) 2-to-1 MUX timing                       | 74                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |  |  |  |
| (a) A chip microphotograph and (b) the power breakdown of the pro-      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |  |  |  |  |  |
| posed PAM4 TX with XTC taps                                             | 76                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |  |  |  |
| A measurement setup.                                                    | 77                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |  |  |  |
| A measured FEXT eye diagram when two TXs at the aggressor chan-         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |  |  |  |  |  |
| nels generate 16 Gb/s PAM4 signals.                                     | 77                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |  |  |  |
| Measured single-bit responses at the aggressor and FEXT pulses at the   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |  |  |  |  |  |
| victim (a) without and (b) with XTC.                                    | 78                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |  |  |  |
| Measured PAM4 TX eye diagrams with and without XTC taps, delay          |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |  |  |  |  |  |
| control, and aggressors.                                                | 80                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |  |  |  |
|                                                                         | data transition: the energy consumption of (a) the total TX circuit and (b) the only drivers.  The power breakdown of the A-FFE TX.  PAM4 crosstalk compensation.  Crosstalk compensation for (a) un-skewed and (b) skewed lanes.  The overall architecture of the proposed TX.  The schematic diagram of the transmitter and the S-parameters of the interconnects.  A schematic diagram of the 4:2 MUX with the encoder for XTC segments.  Encoding table for XTC segments, and the Boolean expressions of the encoded input (INR, INF) to the four XTC segments (SEG1, SEG2, SEG3, and SEG4).  (a) A schematic diagram of the DCDL and (b) delay increments of the DCDL versus fine and coarse digital codes under different corners and temperatures.  (a) Retimer and 2-to-1 MUX. (b) 2-to-1 MUX timing.  (a) A chip microphotograph and (b) the power breakdown of the proposed PAM4 TX with XTC taps.  A measurement setup.  A measured FEXT eye diagram when two TXs at the aggressor channels generate 16 Gb/s PAM4 signals.  Measured single-bit responses at the aggressor and FEXT pulses at the victim (a) without and (b) with XTC.  Measured PAM4 TX eye diagrams with and without XTC taps, delay |  |  |  |  |  |

| 4.14 | Measured PAM4    | ГХ еуе  | diagrams   | of the | victim | and | two | aggre | ssors |    |
|------|------------------|---------|------------|--------|--------|-----|-----|-------|-------|----|
|      | without XTC taps | and del | ay control |        |        |     |     |       |       | 81 |

## I. Introduction

The increasing demand for high-performance memory interfaces has been driven by the growth of computing technologies and data-intensive applications. Developments in areas such as artificial intelligence, machine learning, and multimedia require memory interfaces that offer high speed, low latency, and energy efficiency. Single-ended interfaces have been widely adopted in modern memory systems like Graphic Double Data Rate (GDDR) memory and High Bandwidth Memory (HBM) due to their high pin efficiency, area efficiency, and low power consumption [1].

As illustrated in Fig. 1.1, the required data rates for dynamic random-access memory (DRAM) interfaces have been rising and are expected to surpass 16 Gb/s, particularly for GDDR modules. This increase has led to greater channel losses, exceeding -10 dB as shown in Fig. 1.2. Such losses amplify the effects of inter-symbol interference (ISI), further reducing the sampling margin. To mitigate these challenges, transmitter (TX) equalizers like feed-forward equalizers (FFE) are commonly employed in memory interfaces [2], [3], [4].

Designing high-speed memory interfaces presents two main challenges. First, overcoming the bandwidth limitations of transmission channels necessitates effective equalization techniques. Second, maintaining signal integrity becomes more difficult due to increasing crosstalk interference in parallel interfaces. Traditional approaches to these issues often involve trade-offs among performance, power consumption, and area efficiency.

#### 1.1 Motivation

As data rates surpass tens of gigabits per second, TX architectures encounter significant challenges in maintaining signal integrity while adhering to power and area



Figure 1.1: Data rate trend of DRAM interfaces [1].

constraints. Traditional high-speed TX designs often utilize source-series terminated (SST) drivers because of their linearity and impedance matching properties. However, SST drivers have limitations in modern memory interfaces. The considerable area consumed by termination resistors and the parasitic capacitance inherent in SST drivers negatively impact the output bandwidth at higher data rates, diminishing their effectiveness in advanced applications.

Inverter-based drivers offer an alternative approach for memory interfaces, providing advantages in area efficiency and power consumption [5]. Their compact design is particularly beneficial for memory systems that require numerous I/O channels within a limited silicon area. The simpler circuit structure of inverter-based drivers also results in reduced parasitic capacitance. Despite these benefits, implementing ef-



Figure 1.2: Channel loss of DRAM interfaces [1].

fective equalization techniques with inverter-based drivers poses technical challenges, especially in terms of maintaining linearity and controlling output impedance [3], [4].

To achieve higher data rates without increasing channel bandwidth, four-level pulse-amplitude modulation (PAM4) signaling has been adopted in memory interfaces. PAM4 transmits two bits per symbol by employing four voltage levels, effectively doubling the data rate compared to binary signaling. This makes PAM4 suitable for applications where bandwidth limitations are a primary concern. However, the multilevel nature of PAM4 introduces additional design considerations, such as reduced voltage margins and increased sensitivity to noise.

In short-reach memory interfaces, crosstalk noise has become a more significant issue than channel loss-induced ISI. Due to the short channel lengths in memory modules, far-end crosstalk (FEXT) emerges as the main source of signal degradation [6]. This problem is exacerbated in PAM4 systems, where the eye height is one-third of that in binary signaling. The reduced voltage margins between PAM4 levels increase the system's sensitivity to crosstalk interference, necessitating effective compensation

SCIENCEAND

## 1.2 Application and Problem Overview

High-speed memory interfaces face specific challenges in implementing signaling techniques. The requirement for area efficiency in parallel I/O channels, combined with the need to maintain signal integrity at higher data rates, imposes significant design constraints. Traditional inverter-based FFE implementations involve tap subtraction operations, which add complexity and can affect reliability, particularly when precise coefficient control is needed to sustain signal quality [3], [4].

Implementing PAM4 signaling introduces additional considerations for signal integrity management. The varying crosstalk patterns associated with different PAM4 voltage level combinations necessitate more sophisticated cancellation techniques [6]. Current approaches to crosstalk cancellation often employ complex circuits that consume considerable area and power, presenting challenges for integration into dense memory interfaces.

Designing efficient TXs requires careful balancing of signal integrity, power consumption, and area efficiency. While inverter-based architectures offer promising solutions to meet these requirements, new methods are needed to overcome traditional limitations in equalization and signal conditioning applications.

#### 1.3 Thesis Contributions

This thesis introduces two TX architectures that address challenges in high-speed memory interfaces: an addition-only feed-forward equalizing (A-FFE) TX and a PAM4 TX with crosstalk compensation (XTC). Both architectures focus on achieving compact and energy-efficient designs using inverter-based implementations.

The first contribution is a 4-tap A-FFE TX designed for single-ended interfaces.

Unlike conventional FFE TXs that use SST drivers with area-consuming linear resis

tors and high parasitic capacitance, the proposed A-FFE architecture eliminates tap subtraction operations. This simplification enhances robustness to coefficient quantization errors and reduces circuit complexity. Implemented in 28-nm LP CMOS technology, the prototype operates at 20 Gb/s per pin and achieves an eye height of 55.1 mV and an eye width of 0.44 unit interval (UI) over a 15 dB printed circuit board (PCB) trace. The design demonstrates a power efficiency of 1.18 pJ per bit and shows a 68 % worst-case eye sensitivity, with only a 13.6 % reduction in eye opening when the most sensitive FFE coefficient is decreased by 20 %. The inverter-based, resistor-free design occupies a compact area of 1,149 µm<sup>2</sup>.

The second contribution is a PAM4 TX with crosstalk compensation tailored for short-reach memory interfaces. The design includes efficient encoders and transition detectors for crosstalk pattern recognition, along with inverter-based XTC tap control. The architecture features precise gain and delay control mechanisms to minimize compensation errors between victim and aggressor channels. Fabricated in 28-nm LP CMOS technology, the TX operates at 16 Gb/s and demonstrates improvements in eye height and width by 203 % and 396 %, respectively, when crosstalk compensation is enabled. The area-efficient design occupies 0.0067 mm², achieving an area per data rate of 0.00042 mm²/Gbps.

These architectures provide practical solutions for implementing high-speed memory interfaces, emphasizing area and power efficiency through inverter-based designs.

## 1.4 Thesis Organization

The rest of this thesis is organized as follows:

Chapter 2 provides a theoretical background on wireline channel behavior and FFE techniques used for ISI compensation in high-speed transmitters. It reviews the limitations of conventional FFE (C-FFE) transmitters, particularly those employing SST drivers with linear resistors. The chapter also discusses prior work on relaxed impedance matching techniques and coefficient-error-robust FFE (B-FFE) architec-

tures. Additionally, it presents a theoretical analysis of FEXT in high-speed memory interfaces, which is crucial for understanding the challenges addressed in later chapters.

Chapter 3 introduces the design and implementation of the addition-only feed-forward equalizing (A-FFE) transmitter architecture. This chapter explains how eliminating subtractions between FFE taps enhances robustness to coefficient quantization errors and enables the use of area- and power-efficient inverter-based drivers. The inverter-based 4-tap A-FFE transmitter is detailed, including its circuit design, implementation in 28-nm LP CMOS technology, and performance evaluation. Experimental results demonstrate the transmitter's ability to achieve 20 Gb/s per pin with improved power efficiency and a compact area.

Chapter 4 presents a PAM4 transmitter with crosstalk compensation (XTC) for short-reach memory interfaces. The chapter outlines the challenges of implementing PAM4 signaling in the presence of crosstalk and describes the proposed solution using inverter-based XTC taps. It covers the design of efficient encoders and transition detectors for crosstalk pattern recognition, as well as mechanisms for precise gain and delay control to minimize compensation errors between victim and aggressor channels. Measurement results from the 28-nm LP CMOS prototype demonstrate significant improvements in signal integrity with the XTC enabled.

Chapter 5 summarizes the contributions of the thesis and provides concluding remarks. It reflects on how the proposed inverter-based transmitter architectures address the challenges in high-speed memory interfaces and suggests potential directions for future research in this area.

This dissertation is based on the papers published in *IEEE Journal of Solid-State Circuits* [4] and *IEEE Transactions on Circuits and Systems II: Express Briefs* [6]. The material has been reused in accordance with the *IEEE* reuse policy, and proper credit has been given to the original sources as follows: *Copyright* © 2024, *IEEE*.

# II. Background

# 2.1 Resistive Termination Techniques for *LC*-Dominant Channels

Resistive termination with 50- $\Omega$  resistors is a trivial impedance matching method widely used in conventional standard 50- $\Omega$  applications. However, the simple 50- $\Omega$  matching too strictly constrains the transceiver (TRX) design, preventing design improvement beyond the 50- $\Omega$  matching constraint. Because the design challenges have been significantly increased after decades of advancements in wireline technology, the potential room for further improvement beyond the traditional 50- $\Omega$  matching constraint is increasingly attractive.

In LC-dominant applications, by carefully changing the channel behavior, the non-conventional design choice of resistive termination improves the link performance such as eye height or power efficiency beyond the impedance matching constraint.

We review several design techniques of resistive termination from the perspectives of channel behaviors.

## 2.1.1 An Intuitive Analytical Channel Model

In this section, we explain an intuitive analytical channel model in the frequency domain that helps designers easily understand various trade-offs by resistive termination so that the designer can appropriately design the resistive termination considering the channel behavior and the overall link performance.



Figure 2.1: A schematic diagram of a wireline channel model.

#### 2.1.1.1 An Approximate Transfer Function Model

A wireline channel is typically modeled as a transmission line (TL) having a transmitter (TX) and a receiver (RX) (Fig. 2.1). The TX is modeled as a Thévenin-equivalent circuit having a voltage source  $V_{TX}(f)$  and an impedance  $Z_{TX}(f)$ . The RX is modeled as an input impedance  $Z_{RX}(f)$ . The interconnect of length l is modeled with RLGC parameters, where R, L, G, and G denote the per-unit-length resistance, inductance, conductance, and capacitance, respectively. V(z, f) and I(z, f) are the voltage and the current along the channel at a distance z from the TX, respectively (Fig. 2.1). The channel's rigorous transfer function (2.4) can be derived by solving the telegrapher's equation (2.1), with boundary conditions (2.2) and (2.3) [9].

$$-\frac{\partial}{\partial z} \begin{bmatrix} V(z,f) \\ I(z,f) \end{bmatrix} = \begin{bmatrix} 0 & R+j2\pi fL \\ G+j2\pi fC & 0 \end{bmatrix} \begin{bmatrix} V(z,f) \\ I(z,f) \end{bmatrix}$$
(2.1)

$$V_{TX}(f) = V(0, f) + I(0, f)Z_{TX}(f)$$
(2.2)

$$V_{RX}(l,f) = I_{RX}(l,f)Z_{RX}(f).$$



Figure 2.2: Intuitive circuit models of wireline channels having (a) voltage-mode and (b) current-mode TXs with RXs.

$$\frac{V_{RX}(l,f)}{V_{TX}(f)} = Z_C(f)Z_{RX}(f)/[(Z_C(f)^2 + Z_{TX}(f)Z_{RX}(f))\sinh(\gamma(f)l) 
+ (Z_C(f)Z_{TX}(f) + Z_C(f)Z_{RX}(f))\cosh(\gamma(f)l)], \quad (2.4)$$

where  $Z_C(f)$  and  $\gamma(f)$  are the characteristic impedance and propagation constant of the interconnect, respectively.

$$Z_C(f) = \sqrt{\frac{R + j2\pi fL}{G + j2\pi fC}}$$

$$\gamma(f) = \sqrt{(R + j2\pi f L)(G + j2\pi f C)} \tag{2.6}$$

Although the formula (2.4) is rigorously derived without approximation, it is too complex to provide design intuition. Instead, equation (2.4) can be rewritten into the same but more intuitive formula (2.7).

$$\frac{V_{RX}(l,f)}{V_{TX}(f)} = \frac{Z_C(f)}{Z_{TX}(f) + Z_C(f)} 2e^{-l\gamma(f)} \frac{Z_{RX}(f)}{Z_C(f) + Z_{RX}(f)} \frac{1}{1 - \eta(f)},$$
 (2.7)

where  $\eta(f) = \Gamma_{TX}(f)\Gamma_{RX}(f)e^{-2l\gamma(f)}$ ,  $\Gamma_{TX}(f) = (Z_{TX}(f) - Z_C(f))/(Z_{TX}(f) + Z_C(f))$ , and  $\Gamma_{RX}(f) = (Z_{RX}(f) - Z_C(f))/(Z_{RX}(f) + Z_C(f))$ .  $\Gamma_{TX}(f)$  and  $\Gamma_{RX}(f)$  are the reflection coefficients at the TX and the RX, respectively.  $\eta(f)$  is the round-trip gain of a reflected wave traveling back and forth between the TX and the RX.

The term  $1/(1-\eta(f))$  in (2.7) complicates the channel behavior if  $\eta(f) \neq 0$ . A complex channel behavior is not preferred in wireline applications because it requires complex hardware to implement broadband signaling. For this reason, resistive terminations are usually designed to simplify the channel behavior for a broad frequency range. Because  $|\eta(f)|$  is the product of three terms  $|\Gamma_{TX}(f)|$ ,  $|\Gamma_{RX}(f)|$ , and  $|e^{-2l\gamma(f)}|$ , all of which are usually smaller than 1, designers can easily make  $|\eta(f)| \ll 1$  in most practical designs, where reflection is avoided. For instance, if  $|\Gamma_{TX}(f)| = 0.5$ ,  $|\Gamma_{RX}(f)| = 0.5$ , and  $|e^{-2l\gamma(f)}| = 0.5$ , then  $|\eta(f)| = 0.0625 \ll 1$  even though the reflection coefficients are large: this condition  $(|\eta(f)| \ll 1)$  is traditionally enforced by impedance matching  $(|\Gamma_{TX}(f)| = |\Gamma_{RX}(f)| = 0)$ . Therefore, we may assume (2.8), and approximate (2.7) to (2.9), an intuitive closed-form transfer function of a wireline channel:

$$|\eta(f)| = |\Gamma_{TX}(f)\Gamma_{RX}(f)e^{-2\gamma(f)}| \ll 1,$$

$$\frac{V_{RX}(l,f)}{V_{TX}(f)} \approx \frac{Z_C(f)}{Z_{TX}(f) + Z_C(f)} 2e^{-l\gamma(f)} \frac{Z_{RX}(f)}{Z_C(f) + Z_{RX}(f)},$$
 (2.9)

Equation (2.9) accurately describes the transfer function if the validity condition (2.8) is satisfied [2].

The approximate transfer function (2.9) can be intuitively modeled using an equivalent voltage-controlled voltage source (VCVS) model of a transmission line (Fig. 2.2). The circuit model of the channel in Fig. 2.2(a) is obtained by replacing the transmission line in Fig. 2.1 with the equivalent VCVS model, which has input and output impedances of  $Z_C(f)$  and a gain of  $2e^{-l\gamma(f)}$ .  $Z_C(f)$  is the characteristic impedance (2.5), and the channel's length l is included in the VCVS gain  $2e^{-l\gamma(f)}$ .

If the current-mode TX or the current-model RX is used, the model can be easily modified. The current-mode TX can be modeled with a Norton-equivalent current source  $I_{TX}(f)$  with an impedance  $Z_{TX}(f)$  (Fig. 2.2(b)) instead of a Thévenin-equivalent circuit of the voltage-mode TX in Fig. 2.2(a). If the current-mode RX is used instead of the voltage-mode RX, the input current into the same RX impedance  $Z_{RX}(f)$  can be used as the received signal instead of the voltage across  $Z_{RX}(f)$  (Fig. 2.2).

The transfer function  $V_{RX}(l,f)/I_{TX}(l,f)$  of the channel with a current-mode TX and a voltage-mode RX is

$$\frac{V_{RX}(l,f)}{I_{TX}(f)} \approx \frac{Z_{TX}(f)Z_C(f)}{Z_{TX}(f) + Z_C(f)} 2e^{-l\gamma(f)} \frac{Z_{RX}(f)}{Z_C(f) + Z_{RX}(f)}$$
(2.10)

With a voltage-mode TX and a current-mode RX, the transfer function  $I_{RX}(l,f)/V_{TX}(l,f)$  is

$$\frac{I_{RX}(l,f)}{V_{TX}(f)} \approx \frac{Z_C(f)}{Z_{TX}(f) + Z_C(f)} 2e^{-l\gamma(f)} \frac{1}{Z_C(f) + Z_{RX}(f)}$$
(2.11)

With a current-mode TX and a current-mode RX, the transfer function  $I_{RX}(l,f)/I_{TX}(l,f)$  is

$$\frac{I_{RX}(l,f)}{I_{TX}(f)} \approx \frac{Z_{TX}(f)Z_C(f)}{Z_{TX}(f) + Z_C(f)} 2e^{-l\gamma(f)} \frac{1}{Z_C(f) + Z_{RX}(f)}$$
(2.12)

The transfer functions (2.9)-(2.12) can be easily extended even if the TX and the RX impedances are not resistive and have extra parasitic components [10]. Equation (2.9)-(2.12) holds even though the TX and RX impedances are not resistive. If there are parasitic capacitors or inductors at the TX and RX, then the parasitic elements can be added to the circuit model (Fig. 2.2) while the validity condition (2.8) must be appropriately modified by using Thévenin's and Norton's Theorems [10].

#### 2.1.1.2 Design Intuition for Termination Impedances

The channel models provide two simple, intuitive, separate formulas to understand how the TX and the RX impedance affect the channel behavior. In equation (2.9), the transfer function can be interpreted as a product of the three terms: 1) the voltage division between the TX impedance and the channel's characteristic impedance  $Z_C(f)/(Z_{TX}(f)+Z_C(f))$ ; 2) the interconnect's attenuation  $(2e^{-l\gamma(f)})$  modeled by a VCVS; and 3) the voltage division between the channel's characteristic impedance and the RX impedance  $Z_{RX}(f)/(Z_C(f)+Z_{RX}(f))$  (Fig. 2.2).

Because this transfer function model has three separate terms for the TX, the interconnect, and the RX, this formula provides a clear intuition of how each of these components affects the channel behavior. The impedance  $Z_{TX}(f)$  of the TX appears only in the first voltage division term at the TX, whereas the RX impedance  $Z_{RX}(f)$  appears only in the third voltage division term at the RX. The length l of the interconnect and the propagation constant  $\gamma(f)$  only appear in the second interconnect's attenuation term.

Therefore, during the TX impedance design, a designer can only focus on the  $Z_C(f)/(Z_{TX}(f)+Z_C(f))$  term in order to consider the channel behavior affected by the  $Z_{TX}(f)$  without considering the other two terms that are independent of  $Z_C(f)$ . Similarly, when designing the RX impedance, the designer can only focus on the



Figure 2.3: (a) A cross-sectional view of an example *LC*-dominant interconnect. (b) The simulated characteristic impedance of the *LC*-dominant interconnect versus the frequency.

 $Z_{RX}(f)/(Z_C(f)+Z_{RX}(f))$  term without considering the other two terms. Therefore, if the validity condition (2.8) is satisfied, we can interpret these terms as the interconnect-TX and interconnect-RX interaction terms, respectively. A similar discussion holds for the other three transfer function cases (2.10)-(2.12).

#### 2.1.2 Resistive Termination Techniques for *LC*-Dominant Channels

In this section, we review several resistive termination techniques from the perspective of *LC*-dominant channel behaviors in the frequency domain. Firstly, we explain the characteristics of an *LC*-dominant channel using the intuitive model explained in Section 2.1.1, and then we review several resistive termination techniques, especially, emphasizing the relaxed impedance matching technique that enables a design beyond the limitation of the impedance matching constraint.

#### 2.1.2.1 Characteristics of *LC*-Dominant Channels

Typical interconnects are LC-dominant channels. Ideally, they do not suffer from channel losses because the resistance R and conductance G parameters are negligible in their ideal RLGC model. However, in reality, skin effect and dielectric loss cause significant channel loss.

Skin effect is a phenomenon that the current concentrates on the conductor's surface at high frequencies because of the induced magnetic field in the conductor. This effect slows phase velocity and increases series impedance and attenuation. Skin depth  $\delta_s$  is the distance where current density decreases by  $e^{-1}$ , and can be expressed as

$$\delta_s = \sqrt{\frac{\rho}{\pi f \mu}} \tag{2.13}$$

where  $\rho$ ,  $\mu$ , and f are the electrical resistivity, permeability, and the frequency, respectively [11], [12], [13]. The surface impedance  $Z_s$  is given by:

$$Z_s = \sqrt{\pi f \mu \rho (1+j)} = R_s \sqrt{f(1+j)}$$
 (2.14)

where  $R_s = \sqrt{\pi\mu\rho}$  is the skin effect parameter [12], [13].

Dielectric loss refers to the energy dissipation within the dielectric material of the transmission lines. This loss is primarily characterized by the loss tangent  $\delta_d$ , which is a ratio of the material's energy loss to the energy stored within it. The loss tangent is the ratio of the imaginary part to the real part of the dielectric constant [11], [12]. Materials with higher loss tangents will experience greater dielectric losses, leading to increased signal attenuation and reduced transmission efficiency. The dielectric loss parameter  $G_d$  is given by:

$$G_d = 2\pi C \tan \delta_d \tag{2.15}$$

where C is the per-unit-length capacitance [11].

To model the skin effect and dielectric loss, the  $R+j2\pi fL$  and  $G+j2\pi fC$  terms in (1) are modified as

$$R + j2\pi f L = R_0 + R_s \sqrt{f} + j(2\pi f L + R_s \sqrt{f})$$
 (2.16)

$$G + j2\pi fC = G_0 + fG_d + j2\pi fC. \tag{2.17}$$

 $R_0$  and  $G_0$  are the static resistance and the conductance parameters, respectively.

In LC-dominant channels, the inductance L and capacitance C parameters are dominant in the transmission line while  $R_0$  and  $G_0$  are negligible [14]:

$$2\pi f L \gg R_s \sqrt{f} \gg R_0, \quad 2\pi f C \gg G_d f \gg G_0.$$
 (2.18)

With conditions (2.18), (2.16) can be simplified to (2.19).

$$R + j2\pi fL \approx R_0 + R_s \sqrt{f} + j2\pi fL \tag{2.19}$$

With (2.18) and (2.19), the characteristic impedance (2.20) and propagation constant (2.21) of an LC-dominant channel can be derived [14]:

$$Z_{C,LC}(f) \approx Z_0 = \sqrt{L/C}, \tag{2.20}$$

$$\gamma_{LC}(f) \approx j2\pi\sqrt{LC}f + \frac{R_s}{2Z_{C,LC}}\sqrt{f} + \frac{G_dZ_{C,LC}}{2}f. \tag{2.21}$$

Even with skin effect and dielectric loss, the characteristic impedance  $Z_{C,LC}(f)$  of an LC-dominant interconnect is almost the same constant resistance as the characteristic impedance  $Z_0 = \sqrt{L/C}$  of the ideal transmission line (2.20). In a 2-D field solver simulation of an example LC-dominant microstrip line (Fig. 2.3(a)),  $Z_{C,LC}(f) \approx 50~\Omega$  across a very wide frequency range (Fig. 2.3(b)). Therefore, even with skin effect and dielectric loss, a typical LC-dominant interconnect with  $Z_{C,LC}(f) \approx 50~\Omega$  can be resistively terminated with 50- $\Omega$  resistors like an ideal 50- $\Omega$  transmission line.



Figure 2.4: The simulated LC-dominant channel (Fig. 3(a)) loss versus the frequency when the channel length is 10 cm. Dominant sources of the loss are marked in (b).

Although skin effect and dielectric loss barely change  $Z_{C,LC}(f)$ , they greatly increase the channel loss. By substituting (2.20) and (2.21) into (2.9), we can approximate transfer function  $V_{RX}(l,f)/V_{TX}(f)$  of an LC-dominant channel with skin effect and dielectric loss:

$$\frac{V_{RX}(l,f)}{V_{TX}(f)} \approx \frac{Z_0}{Z_{TX}(f) + Z_0} 2e^{-l(j2\pi\sqrt{LC}f)} \frac{Z_{RX}(f)}{Z_0 + Z_{RX}(f)} \times e^{-l(\frac{R_s}{2Z_0})\sqrt{f}} e^{-l(\frac{G_dZ_0}{2})f} 
= TF_{wo}(f) \times e^{-l(\frac{R_s}{2Z_0})\sqrt{f}} e^{-l(\frac{G_dZ_0}{2})f},$$
(2.22)

where  $TF_{wo}(f)$  is the transfer function of an LC-dominant channel without skin effect and dielectric loss. It is noticeable that the channel loss contribution  $e^{-l(\frac{R_s}{2Z_0})\sqrt{f}}$  by the skin effect as well as the channel loss contribution  $e^{-l(\frac{G_dZ_0}{2})f}$  by the dielectric loss are multiplied to the transfer function  $TF_{wo}(f)$  in (2.22). In a typical LC-dominant channel, the first skin-effect term is approximately proportional to the length l and  $\sqrt{f}$ 

while the second dielectric-loss term is roughly proportional to both l and f. In the simulation of an example LC-dominant channel, the loss-versus-frequency plot has a square-root curvature at low frequencies (< 40 MHz) because the skin effect primarily contributes to the channel loss. On the other hand, the plot is linear at high frequencies (> 40 MHz) because the dielectric loss becomes dominant (Fig. 2.4). The loss is usually compensated by equalization in a conventional link design.

It is noticeable that the condition (2.18) for an LC-dominant channel depends on various factors: dimensions and materials of interconnects as well as operation frequencies. As long as (2.18) is satisfied, cables, micro-strip lines, or strip lines can be considered as LC-dominant channels. Typically, most large-scale interconnects such as cables, backplane traces, PCB traces, and package interconnects can be considered as LC-dominant channels. However, depending on widths and spaces of lines as well as operating frequency, scaled interconnects such as interposer and on-chip wires can be also LC-dominant channels, too.

#### 2.1.2.2 The Conventional Impedance Matching Techniue

Impedance matching is the most popular and conservative termination technique in order to ensure signal integrity [15], [16], [17], [18], [19]. In this technique, the TX and the RX are terminated using resistors with the characteristic impedance of the channel:  $Z_{TX}(f) = R_{TX} = Z_0$ ;  $Z_{RX}(f) = R_{RX} = Z_0$ ;  $Z_{C,LC}(f) = Z_0$ . Because the characteristic impedance of an LC-dominant channel is approximately a constant resistance in a wide frequency range as expressed in (2.20), it theoretically ensures "zero" signal reflection:  $\Gamma_{TX}(f) = 0$  and  $\Gamma_{RX}(f) = 0$ . As a result, the validity condition (2.8) is satisfied ( $|\Gamma_{TX}(f)\Gamma_{RX}(f)e^{-2l\gamma(f)}| = 0 \ll 1$ ), and a simple transfer function (2.23) can be theoretically derived from equation (2.9) as

$$\frac{V_{RX}(l,f)}{V_{TX}(f)} = \frac{1}{2}e^{-l\gamma(f)},$$
(2.23)

where  $\gamma(f) = \gamma_{LC}(f)$  as in (2.9). Therefore, in order to provide the best signal in-

tegrity by making the theoretical channel behavior as simple as possible, designers widely use this technique in a wide range of applications from 224 Gb/s long-reach differential wireline transceivers [15], [16] to short-reach single-ended on-/off- package links for 20-40 Gb/s data rates [17], [18], [19].

#### 2.1.2.3 The Relaxed Impedance Matching Technique

Relaxed impedance matching is a method that deliberately creates small mismatches in termination impedances to enhance the performance of high-speed TRXs in various aspects [3], [4], [5], [20], [21]. We will explain the relaxed impedance matching in comparison with the impedance matching.

In the traditional impedance matching design, it is noticeable that the spectral shape is solely determined by the propagation constant  $\gamma(f)$  because the other terms are not functions of f in (2.23).

If  $R_{TX}$  and  $R_{RX}$  (not necessarily equal to  $Z_0$ ) satisfy the validity condition (2.8), then the transfer function (2.9) becomes

$$\frac{V_{RX}(l,f)}{V_{TX}(f)} \approx \frac{Z_0}{R_{TX} + Z_0} 2e^{-l\gamma(f)} \frac{R_{RX}}{Z_0 + R_{RX}}.$$
 (2.24)

Because  $\gamma(f) = \gamma_{LC}(f)$  is the only frequency-dependent term in (2.24) as in impedance matching case (2.23), the transfer functions (2.23) and (2.24) have identical spectral shapes and differ only in magnitude. As a result, as long as  $R_{TX}$  and  $R_{RX}$  satisfy (2.8), the signal integrity penalty is negligible. This constraint (2.8) is more relaxed than impedance matching, and therefore, it is called "relaxed impedance matching" [21]. The design space set by the relaxed impedance matching constraint (2.8) is larger than one set by impedance matching. Therefore, the link design can be improved with relaxed impedance matching.

Choi et al. proposed an adaptive relaxed impedance matching technique to automatically adjust the TX impedance  $R_{TX}$  to maximize eye opening for arbitrary characteristic impedances  $Z_0$  of the interconnect and the RX impedance  $R_{RX}$  [20], [21]. A



Figure 2.5: An example of an interconnect with a CML TX and a voltage-mode RX. Parasitic capacitors Cpar are present at the TX output and RX input [21].



Figure 2.6: The magnitude of transfer function  $(|V_RX(l,f)/I_TX(f)|)$  of the interconnect (Fig. 11) with various Z0 = 50  $\Omega$ , Cpar = 500 fF, and various RTX and RRX configuration [21].

12-Gb/s current mode logic (CML) style FFE TX was designed (Fig. 2.5), and thus the TX impedance larger than 50  $\Omega$  can improve the signal amplitude. Even with parasitic capacitances of 500 fF at both ends, an example relaxed impedance matching config-



Figure 2.7: Comparison of transceiver architectures: (a) conventional transmitter without FFE and (b) N-tap FFE transmitter [22]. For a single-bit transmission case (x[n] = 1), simulation results show the signal propagation through a low-pass channel h(t), from transmitted signal x(t) to received signal r(t).

uration of the TRX ( $R_{TX}$  = 65  $\Omega$ ,  $R_{RX}$  = 80  $\Omega$ ) improves the swing with a slightly steeper channel roll-off (Fig. 2.6).

## 2.2 Feed-Forward Equalization (FFE)

This chapter provides a brief overview of inter-symbol interference (ISI), which limits the bandwidth of data transmission, and feed-forward equalization (FFE), a technique commonly used to mitigate ISI and increase data rates beyond these limitations. In wireline data communication, the low-pass filter (LPF) characteristics inherent in interconnects constrain the maximum achievable data rate. These LPF characteristics are primarily due to channel loss mechanisms such as the skin effect and dielectric loss, as shown in equation (2.22).

Fig. 2.7(a) depicts a basic TRX without FFE. The TX's output signal x(t) corre-

sponds to the transmitted bit sequence x[n], where each bit is represented by either 1 or -1, signifying bits '1' and '0', respectively [22]. The RX determines the received bit r[n] by slicing the incoming signal r(t).

When a single bit is transmitted  $(\cdots 01000\cdots)$ , the transmitted square pulse (solid line in Fig. 2.7(a)) becomes dispersed due to the LPF characteristics of the channel. This dispersion results in the received signal (marked by 'x' in Fig. 2.7(a)) exhibiting a long tail caused by ISI. For the RX to make a correct bit decision, the bit interval T must be longer than the duration of this ISI tail, which in turn limits the maximum data rate.

To overcome this limitation and enhance the data rate, an N-tap FFE transmitter is often utilized, as shown in Fig. 2.7(b). This setup includes a shift register composed of flip-flops or latches that generates the FFE taps N-consecutive transmitted bits  $[x_{[n]}x_{[n-1]}\cdots x_{[n-N+1]}]^T$ . Each tap is weighted by an FFE coefficient  $[w_{[0]}w_{[1]}\cdots w_{[N-1]}]^T$ , and the weighted taps are summed to produce the transmitter output voltage:  $x[n] = w_{[0]}x_{[n]} + w_{[1]}x_{[n-1]} + \ldots + w_{[N-1]}x_{[n-N+1]}$ .

Fig. 2.7(a) also illustrates the output of a 4-tap FFE transmitter (dotted line) and the corresponding received signal (marked by circles) for the transmission of a single bit  $(\cdots 01000\cdots)$ . The transmitter output x(t) includes one pre-cursor  $w_{[0]}$ , one main cursor  $w_{[1]}$ , and two post-cursor square pulses  $w_{[2]}$  and  $w_{[3]}$ , with amplitudes proportional to their respective FFE coefficients. These pre- and post-cursor pulses are designed to cancel out the ISI in the received signal at the RX. As a result, ISI can be eliminated through FFE to improve signal integrity.

## 2.3 Coefficient-Error-Robust FFE (B-FFE)

Massively parallel links, including on-chip interconnections [23], [24], and silicon interposers [25], are increasingly essential for meeting the demands of high data throughput within limited power budgets. A notable challenge with these systems is the significant hardware overhead required for calibrating a vast number of I/O chan-



Figure 2.8: Block diagrams of an FFE TX and the B-FFE TX [26].



Figure 2.9: Error spectra of FFE and B-FFE for one bit pusle [26].

nels to compensate for coefficient errors caused by nanoscale variations. To address this, a 4-tap coefficient-error-robust FFE transmitter (TX), known as B-FFE, has been proposed [26], [27].

The B-FFE TX architecture utilizes the inherent channel loss to mitigate signal perturbations resulting from coefficient errors, while functioning identically to a conventional FFE under ideal conditions. Without coefficient errors, the B-FFE can be designed to replicate the behavior of any standard FFE TX. The innovation in the B-FFE TX lies in its digital transition-detection (TD) filter, which detects transitions in the incoming data stream and generates a corresponding transition signal: '1' for a transition from '-1' to '1', '-1' for a transition from '1' to '-1', and '0' when no transition occurs.

This transition signal is processed through a series of unit interval (UI) delay elements. Each delayed transition signal is weighted by a coefficient  $a_k$  (for  $k \neq 0$ ) and combined with the incoming data weighted by  $a_0$  to produce the output voltage (Fig. 2.8). By configuring the FFE coefficients  $w_k$  and the B-FFE coefficients  $a_k$  such that  $a_0 = \sum_i w_i$  and  $a_{k\neq 0} = -2\sum_i w_i$ , the output voltages of both the B-FFE and conventional FFE TXs can be made identical (Fig. 2.8).

Despite their nominally identical behavior, the B-FFE exhibits superior tolerance to coefficient errors compared to the conventional FFE. Coefficient errors, arising from factors such as mismatch, process and temperature variations, or supply voltage fluctuations, can be modeled as additive constants ( $\Delta w_k$  and  $\Delta a_k$ ) to the nominal coefficients ( $w_k$  and  $a_k$ ) (Fig. 2.9). Since both TX architectures are linear time-invariant (LTI) systems, these coefficient errors introduce perturbations in the pulse response, manifested as error pulses proportional to the errors at their respective tap positions.

In the conventional FFE TX, the error pulse introduced by a coefficient error is square-shaped and contains significant low-frequency components. Given that the low-pass filter (LPF) characteristics of the channel (e.g., -25 dB loss at a Nyquist frequency of 4 GHz) do not significantly attenuate low-frequency signals, these error pulses persist through the channel and impact the receiver performance (Fig. 2.9).

In contrast, the error pulses associated with the B-FFE TX, except for those from the first tap ( $\Delta a_0$ ), possess higher frequency content due to the modulation effect of the TD filter (Fig. 2.9). The LPF nature of the channel effectively attenuates these high-frequency error pulses, reducing their impact at the receiver and enhancing the coefficient error tolerance of the B-FFE TX. Errors from the first tap ( $\Delta a_0$ ) are generally less significant, as this tap typically has a smaller magnitude in lossy channels, determining the DC level rather than contributing substantially to signal transitions (Fig. 2.9).

By capitalizing on the channel's filtering characteristics, the B-FFE TX architecture effectively suppresses the adverse effects of coefficient errors without necessitating extensive calibration procedures. This approach offers a practical solution for reducing hardware overhead in systems with a large number of parallel I/Os, improving overall system efficiency and reliability in high-speed data transmission applications [26], [27].

## 2.4 Far-End Crosstalk (FEXT)

Far-End Crosstalk (FEXT) is a phenomenon observed in multi-conductor transmission lines, where a signal propagating along one conductor induces an undesired voltage at the far end of an adjacent conductor. This effect arises due to electromagnetic coupling between conductors and is significant in high-speed digital circuits and communication systems, as it can degrade signal integrity and overall system performance.

To quantitatively describe FEXT, consider two parallel lossless transmission lines of length  $\ell$ , designated as Line 1 and Line 2 (Fig. 2.10). Line 1 carries an input signal  $V_{\rm in}(t)$ , while Line 2 is initially unexcited. Each line is characterized by per-unit-length self-inductance L and capacitance C. Mutual inductance M and mutual capacitance  $C_m$  represent the electromagnetic coupling between the lines due to their proximity (Fig. 2.10).



Figure 2.10: A schematic diagram of a two-channel model for the FEXT model.

The voltage and current along the coupled lines are governed by the coupled telegrapher's equations [9]:

$$\begin{split} \frac{\partial V_1}{\partial x} &= -L \frac{\partial I_1}{\partial t} - M \frac{\partial I_2}{\partial t}, \\ \frac{\partial V_2}{\partial x} &= -M \frac{\partial I_1}{\partial t} - L \frac{\partial I_2}{\partial t}, \\ \frac{\partial I_1}{\partial x} &= -C \frac{\partial V_1}{\partial t} - C_m \frac{\partial V_2}{\partial t}, \\ \frac{\partial I_2}{\partial x} &= -C_m \frac{\partial V_1}{\partial t} - C \frac{\partial V_2}{\partial t}. \end{split}$$
(2.25)

In these equations,  $V_1(x,t)$  and  $I_1(x,t)$  are the voltage and current on Line 1, while  $V_2(x,t)$  and  $I_2(x,t)$  are the voltage and current on Line 2 (Fig. 2.10). The mutual inductance M and mutual capacitance  $C_m$  terms account for the coupling effects causing crosstalk.

Assuming weak coupling ( $M \ll L, C_m \ll C$ ) and neglecting higher-order terms, the induced voltage at the far end of Line 2 due to FEXT can be approximated. The FEXT voltage  $V_{\text{FEXT}}(t)$  at position  $x=\ell$  is given by:

$$V_{\text{FEXT}}(t) = \left(\frac{1}{2} \left(\frac{C_m}{C} - \frac{M}{L}\right) T_d\right) \frac{dV_{\text{in}}(t - T_d)}{dt},$$

where  $T_d=\frac{\ell}{v_p}$  is the propagation delay along the lines, with the phase velocity  $v_p=\frac{1}{\sqrt{LC}}$ . The term  $\frac{dV_{\rm in}(t-T_d)}{dt}$  represents the time derivative of the input voltage delayed by the propagation time, reflecting the effect of the signal's transition arriving at the far end.

This expression illustrates that the FEXT voltage is directly proportional to the rate of change of the input signal, the length of the parallel coupling  $\ell$ , and the difference in coupling coefficients  $\left(\frac{C_m}{C} - \frac{M}{L}\right)$ . The coupling coefficients  $\frac{C_m}{C}$  and  $\frac{M}{L}$  represent the normalized mutual capacitance and inductance, respectively. When these two terms are equal, the capacitive and inductive coupling effects can cancel each other, potentially reducing FEXT to zero. This condition emphasizes the importance of balanced line design in minimizing crosstalk. For example, in microstrip lines, the inductive coupling (M/L) tends to be more significant than the capacitive coupling  $(C_m/C)$  due to the presence of air around the exposed side of the microstrip line.

Design considerations to mitigate FEXT involve controlling the factors present in the expression. Reducing the signal's slew rate, meaning decreasing  $\frac{dV_{\rm in}}{dt}$ , can lower the induced FEXT voltage. Increasing the physical separation between the conductors decreases both  $C_m$  and M, thereby reducing electromagnetic coupling. Employing differential signaling techniques can help cancel out common-mode interference, as the opposing currents in differential pairs generate electromagnetic fields that tend to cancel. Implementing proper shielding and grounding practices provides isolation between conductors, further diminishing crosstalk effects.

Understanding the mathematical basis of FEXT is crucial for designing high-speed communication systems and printed circuit boards (PCBs) where signal integrity is paramount. By analyzing how FEXT depends on physical parameters and signal characteristics, it becomes possible to make informed design choices that minimize interference and enhance overall system performance. Recognizing that FEXT increases with longer coupling lengths and faster signal transitions highlights the need for careful layout and signal management in high-speed designs.

# III. Design of a Single-Ended Inverter-based Addition-Only Feed-Forward Equalization Transmitter

This chapter presents an inverter-based 4-tap addition-only feed-forward equalizing (A-FFE) transmitter (TX) for compact and power-efficient single-ended interfaces. Source-series terminated (SST) drivers are widely used in conventional FFE (C-FFE) TXs. However, linear resistors in C-FFE SST TXs occupy too much area and add significant parasitic capacitance, degrading power efficiency, and output bandwidth. To overcome these problems, we propose a new FFE architecture dubbed A-FFE that completely eliminates subtractions between FFE taps and improves the robustness to quantization errors of coefficients. These advantages of the proposed architecture allow to utilize area-and-power-efficient inverter drivers in FFE. An inverter-based 4-tap A-FFE TX was designed and fabricated in a 28-nm CMOS process. The TX achieved a data rate of 20 Gb/s/pin, an eye height of 55.1 mV, and an eye width of 0.44 UI with a 15 dB PCB trace while consuming 1.18 pJ/b and achieving the worst eye sensitivity of 68 %. The eye-opening was decreased only by 13.6 % when the most sensitive FFE coefficient was reduced by 20 %. Because it uses area-efficient inverter drivers without resistors, the TX occupies only 1149 µm².

The rest of this chapter is organized as follows. Chapter 3.1 introduces the motivation behind the proposed A-FFE architecture, highlighting the limitations of conventional FFE transmitters with linear resistors, including area overhead, parasitic capacitance issues, and power inefficiency, which necessitate a new design approach. Chapter 3.2 mathematically explains the A-FFE architecture. Chapter 3.3 theoretically analyzes the robustness to quantization errors of coefficients compared with the C-FFE. Chapter 3.4 describes the circuit design of the proposed inverter-based A-FFE



Figure 3.1: Comparison of 4-tap TX FFE design options : (a) SST-based C-FFE, (b) an inverter-based C-FFE, and (c) the proposed inverter-based A-FFE.

TX. Chapter 3.5 shows the experimental results and comparison with the prior arts. Finally, Chapter 3.6 concludes this Chapter with summaries.

#### 3.1 Overview

The demand for area-and-power-efficient input/output (I/O) circuits has consistently increased in single-ended high-speed interfaces like graphic double data rate

(GDDR) [28]. A feed-forward equalization (FFE) transmitter (TX) is a key circuit to overcome the bandwidth limitations in such applications [2], [3]. As the design constraints on area and power consumption are becoming more stringent, the efficiency of FFE drivers must be improved.

Source-series terminated (SST) drivers are commonly adopted in conventional FFE (C-FFE) TXs. A differential SST driver theoretically consumes only a quarter-power of a differential current-mode logic (CML) driver for the same output swing because termination resistors are inserted in series rather than parallel [29]. While an SST driver offers good linearity and impedance matching, the termination resistor occupies too much area and adds significant parasitic capacitance, dissipating additional power and degrading the output bandwidth as in Fig. 3.1(a).

On the other hand, an inverter driver [5] has a small area, good power efficiency, and a large output voltage swing because it does not have a series resistor for termination as in Fig. 3.1(b). An inverter driver may suffer from a signal integrity problem because the driver's impedance is not necessarily 50  $\Omega$  and it changes with the output voltage level [5]. This problem can be easily solved by the relaxed impedance matching [3], [5], [20], [21], [30], [31], [32]. Utilizing only the receiver-side termination of 50  $\Omega$  improves the voltage swing and driver area at the cost of a negligible penalty in signal integrity [5].

Fig. 3.2 shows schematic of an SST driver and an inverter driver. A single-ended SST driver with matched termination on both RX and TX consumes  $(VDD^2)/(4Z_0)$  and the output swing amplitude of the TX is VDD/2, where  $Z_0$  is the characteristic impedance of the channel. However, a single-ended inverter driver with matched RX termination and without matched TX termination consumes  $(VDD^2)/(2(R_{TX} + Z_0))$  and the output swing amplitude of the TX is  $(VDD \cdot Z_0)/(R_{TX} + Z_0)$ , where  $R_{TX}$  is the impedance of the TX [5]. Therefore, the inverter output driver, whose impedance is lower than  $Z_0$ , has a larger output swing at the cost of more power than the conventional SST driver [5].

However, the prior single-ended inverter-based transmitter [5] does not contain an

SCIENCE



Figure 3.2: Schematic of (a) an SST driver and (b) an inverter driver. The output swing amplitudes and average power consumption of the drivers are shown.

FFE because two major disadvantages of the inverter-based C-FFE TX (Fig. 3.1(b)) have not been solved: 1) The inverter-based C-FFE is very sensitive to quantization errors of FFE coefficients; 2) The inverter-based C-FFE TX consumes large power in FFE tap subtraction.

First, the inverter-based C-FFE TX is very sensitive to quantization errors of tap coefficients. Fig. 3.3 shows 20 Gb/s eye diagrams simulated with a 20 dB loss channel. Fig. 3.3(a) and Fig. 3.3(b) show eye diagrams of an inverter-based 4-tap C-FFE TX without and with a 20 % error in the size of the tap driver of the most sensitive FFE coefficient. The eye height is almost 100 mV without the quantization error (Fig. 3.3(a)). However, when the most sensitive tap coefficient (the main tap) is reduced by 20 %, the eye height decreases by 76 % (Fig. 3.3(b)), showing that the inverter-based C-FFE TX is terribly vulnerable to quantization errors of tap coefficients.

Moreover, the output of the inverter-based C-FFE TX cannot be accurately con-



Figure 3.3: The 20 Gb/s eye diagrams of an inverter-based 4-tap C-FFE TX (a) without and (b) with a 20 % error on the most sensitive tap coefficient (the main-cursor). The 20 Gb/s eye diagrams of the inverter-based 4-tap A-FFE TX (c) without and (d) with a 20 % error on the most sensitive tap coefficient (1st post-cursor).

trolled due to the tap subtraction. Because of the non-linear characteristics of inverter drivers, tap coefficients and the output voltage are nonlinearly affected by errors of tap coefficients. Fig. 3.4 shows output voltage histograms when the input data pattern  $(D_{pre}, D_{main}, D_{post1}, D_{post2})$  is (-1, -1, -1). The histograms were acquired by Monte Carlo simulation with 1000 samples.



Figure 3.4: Output voltage histograms of (a) the inverter-based 4-tap C-FFE TX and (b) the inverter-based 4-tap A-FFE TX when the input data pattern  $(D_{pre}, D_{main}, D_{post1}, D_{post2})$  is (-1, -1, -1, -1).



Figure 3.5: Current flows of (a) an inverter-based C-FFE TX and (b) the corresponding proposed inverter-based A-FFE TX for the same FFE operation. The C-FFE TX is subtracting FFE taps. The average power consumptions of the drivers are also shown.

The output voltage and the 3 $\sigma$  output voltage variation of the C-FFE TX are 454.2 mV and 61.77 mV, respectively (Fig. 3.4(a)). Because both PMOS and NMOS transistors of inverter-based drivers are turned on, the C-FFE TX's output voltage is sensitive to changes in the characteristics of PMOS and NMOS transistors due to the nonlinearity of the inverter-based drivers. Therefore, the output voltage cannot be controlled accurately.

Second, the inverter-based C-FFE TX consumes large power due to FFE tap subtraction. For example, when pull-up PMOSs and pull-down NMOSs turn on simultaneously during FFE tap subtraction (Fig. 3.5(a)), the current flowing from the power supply to the ground is wasted as it does not contribute to pull-up or pull-down of the output signal. Consequently, unnecessary driver power is consumed.

To address the problems above, Han proposed a coefficient-error-robust FFE (B-FFE) [26], [27]. B-FFE employs a transition detection filter to improve robustness against FFE coefficient errors (Fig. 3.6(a)), and also to reduce power consumption due to tap subtraction (Fig. 3.6(b)). However, the prior B-FFE work has three problems: 1)

it still has subtractions between its taps; 2) the sum of the magnitudes of the B-FFE's coefficients is bigger than the C-FFE, and thus, the tap driver sizes are unnecessarily larger than the ideal design; 3) B-FFE was not demonstrated with a voltage-mode driver, which is appropriate for single-ended design. Instead, the prior B-FFE was designed only with a CML-type current-mode driver. Therefore, additional improvements are required for single-ended inverter-based FFE TXs.

In this chapter, we propose a new FFE [3], [4] architecture (Fig. 3.1(c)) that can solve the aforementioned problems of the inverter-based FFE. For convenience, the proposed FFE will be referred as A-FFE (Addition-only FFE). If there is no quantization error of FFE coefficients, we can always find an A-FFE architecture of which input/output response is mathematically identical to the ones of any C-FFE or B-FFE. However, A-FFE differs from the other two FFEs in that its output can be produced only by addition of taps in most practical applications. With quantization errors of FFE coefficients, error signals caused by FFE coefficient errors are also effectively suppressed by the channel loss as in B-FFE [26], [27]. Due to this merit of the A-FFE, the eye height of an inverter-based A-FFE TX does not seriously decrease with the quantization errors of FFE coefficients (Fig. 3.3(d)). Without quantization errors, its eye height is almost the same as the one of a C-FFE TX (Fig. 3.3(c)). When the most sensitive coefficient ( $1^{st}$  post-cursor) has an error of -20 %, the eye height decreases by only 14% (Fig. 3.3(d)). Also, the A-FFE TX is more robust to tap coefficient errors due to process variation and mismatch than the C-FFE TX. The output voltages of the C-FFE TX and A-FFE TX have nearly the same average values of 454.2 mV and 450.7 mV, when the input data pattern ( $D_{pre}$ ,  $D_{main}$ ,  $D_{post1}$ ,  $D_{post2}$ ) is (-1, -1, -1, -1), respectively. However, the 3σ output voltage variation of the A-FFE TX is 3.8 times less than that of the C-FFE TX (Fig. 3.4(b)) because either the PMOS or NMOS transistors of the A-FFE TX is turned on.

In addition, the A-FFE has no subtraction between FFE taps and thus consumes less power. For example, when a pseudorandom binary sequence-31 (PRBS-31) data pattern is transmitted, the proposed FFE driver consumes only about 30 % of the aver-



Figure 3.6: (a) An example design of the 4-tap B-FFE architecture. (b) The single-bit response and tap driver outputs of the 4-tap B-FFE example.

age power of the conventional driver (Fig. 3.5).

#### 3.2 Architecture

Fig. 3.7 illustrates the architectures of an N-tap C-FFE TX and the corresponding N-tap A-FFE TX. The A-FFE is composed of a shift register consisting of N delay units (D), an adder, and N-1 simple digital sub-filters, whereas the corresponding C-FFE does not have sub-filters. In Fig. 3.7, x[n] is the digital data input of which value is either '1' or '-1', representing a binary number of '1' or '0', respectively. v[n] is the output of the FFE TX. k is the tap position index of the shift resistor and also corresponds to the elapsed delay time from the input x[n] to the (k+1)-th data tap x[n-k] of the A-FFE. For convenience, we will use 'm' as the tap position index of the main tap; x[n-m] is the main data tap in Fig. 3.7. To design the A-FFE to produce the same output of the C-FFE, they must have the same main tap position. In the A-FFE, a sub-filter is assigned to every pair of the main data tap x[n-m] and non-main data tap x[n-k] where  $k \neq m$ . It takes these two inputs and produces one output b[n-k]. The A-FFE output is the weighted sum of all sub-filter output taps (b[n-k]) where k=0,...,N-1.

There are two types of digital sub-filters (Fig. 3.7(b)): a difference filter (red) and an average filter (blue). The output of a difference filter is the difference between its two inputs divided by 2. The output of the average filter is the average of its two inputs. Table 3.1 presents the mapping between inputs (x[n-k] and x[n-m]) and outputs of the A-FFE sub-filters. To make the outputs of the A-FFE and the C-FFE identical, a difference filter must be used for the main data tap x[n-m] and the (k+1)-th data tap x[n-k] of the A-FFE if the (k+1)-th coefficient  $w_k$  of the corresponding C-FFE is negative. In contrast, if the (k+1)-th coefficient  $w_k$  of the C-FFE is positive, the average filter must be used for x[n-m] and x[n-k] of the A-FFE. It is noticeable that b[n-m] = x[n-m], which is the output of the average filter with two identical inputs of the main data taps (x[n-m] and x[n-m]). Including b[n-m], we will simply refer to the sub-filter output tap as b[n-k] where  $k=0,\ldots,N-1$ .

Blue w: if the k-th C-FFE coefficient is positive, then use +0.5 for the non-main tap. Red w: if the k-th C-FFE coefficient is negative, then use -0.5 for the non-main tap.

x[n-k],  $k \neq m$ : a non-main tap. x[n-m]: the main tap. [n-m] = x[n-m]



Figure 3.7: Block diagrams of (a) an N-tap C-FFE TX and (b) an N-tap A-FFE TX.

PHO TECHNOLOG

Table 3.1: A-FFE Sub-filter Outputs

| x[n-k] $x[n-m]$ | Difference filter output (b[n-k]) | Average filter output (b[n-k]) |
|-----------------|-----------------------------------|--------------------------------|
| -1 -1           | 0                                 | -1                             |
| -1 +1           | +1                                | 0                              |
| +1 -1           | -1                                | 0                              |
| +1 +1           | 0                                 | +1                             |

Table 3.2: Tap Coefficients of C-FFE and A-FFE for Various Channel Losses

| Channel<br>Loss | C-FFE's Tap Coefficients (W <sub>pre</sub> , W <sub>main</sub> , W <sub>post1</sub> , W <sub>post2</sub> ) | A-FFE's Tap Coefficients<br>(A <sub>pre</sub> , A <sub>main</sub> , A <sub>post1</sub> , A <sub>post2</sub> ) |
|-----------------|------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------|
| 20 dB           | -0.16, +0.54, -0.28, +0.02                                                                                 | +0.32, +0.08, +0.56, +0.04                                                                                    |
| 25 dB           | -0.18, +0.52, -0.28, +0.02                                                                                 | +0.36, +0.04, +0.56, +0.04                                                                                    |
| 30 dB           | -0.19, +0.5, -0.29, +0.02                                                                                  | +0.38, +0, +0.58, +0.04                                                                                       |

All sub-filter output taps b[n-k] are multiplied by the corresponding A-FFE coefficients  $a_k$  and then added up. The summation result is the output v[n] of the A-FFE, as illustrated in Fig. 3.7(b).

We can always find the A-FFE coefficients  $a_k$ s  $(k=0,\ldots,N-1)$  that make both FFE outputs mathematically identical. For simple derivation, we will use the following vector variables to describe the architectures of A-FFE and C-FFE.  $\underline{b}$  is the column vector of the sub-filter output taps of A-FFE:  $\underline{b} = [b_{[n]}b_{[n-1]}\cdots b_{[n-m]}\cdots b_{[n-N+1]}]^T$ .

 $\underline{x}$  is the column vector of the data taps:  $\underline{x} = [x_{[n]}x_{[n-1]}\cdots x_{[n-N+1]}]^T$ . Note that both C-FFE and A-FFE have the same  $\underline{x}$  for the same data inputs.  $\underline{w}$  is the column vector of the normalized C-FFE coefficients:  $\underline{w} = [w_0w_1\cdots w_{N-1}]^T$ ,  $\sum_{k=0}^{N-1}|w_k|=1$ .  $\underline{a}$  is the column vector of the A-FFE coefficients:  $\underline{a} = [a_0a_1\cdots a_{N-1}]^T$ . Because the output  $b_{[n-k]}$  of the (k+1)-th sub-filter of A-FFE can be expressed as

$$b_{[n-k]} = 0.5(x_{[n-m]} + \frac{w_k}{|w_k|}x_{[n-k]}), \tag{3.1}$$

<u>b</u> can be described in terms of <u>w</u> using an  $N \times N$  matrix **A** as

$$\underline{b} = \mathbf{A}\underline{x},\tag{3.2}$$

where

$$\mathbf{A} = \begin{bmatrix} 0.5 & \cdots & 0 & 0.5 & 0 & \cdots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \cdots & -0.5 & 0.5 & 0 & \cdots & 0 \\ 0 & \cdots & 0 & 1 & 0 & \cdots & 0 \\ 0 & \cdots & 0 & 0.5 & -0.5 & \cdots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \cdots & 0 & 0.5 & 0 & \cdots & 0.5 \end{bmatrix}$$

$$= 0.5 \begin{bmatrix} 1 & \cdots & 0 & 0 & 0 & \cdots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \cdots & -1 & 0 & 0 & \cdots & 0 \\ 0 & \cdots & 0 & 1 & 0 & \cdots & 0 \\ 0 & \cdots & 0 & 0 & -1 & \cdots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \cdots & 0 & 1 & 0 & \cdots & 0 \end{bmatrix} + 0.5 \begin{bmatrix} 0 & \cdots & 0 & 1 & 0 & \cdots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \cdots & 0 & 1 & 0 & \cdots & 0 \\ 0 & \cdots & 0 & 1 & 0 & \cdots & 0 \\ 0 & \cdots & 0 & 1 & 0 & \cdots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \cdots & 0 & 1 & 0 & \cdots & 0 \end{bmatrix}$$

$$= 0.5(\mathbf{W_{sign}} + \mathbf{C}). \tag{3.3}$$

In (3.3),  $\mathbf{W_{sign}}$  is an  $N \times N$  diagonal matrix of which the (k+1)-th diagonal element is the sign of the corresponding (k+1)-th C-FFE coefficient  $w_k$ . C is an

 $N \times N$  matrix in which the (m+1)-th column vector is filled with '1's and the other elements are '0's. From Fig. 3.7(b), the output v[n] of the A-FFE can be expressed in terms of a and b, and then can be reformulated in terms of a, x, and A as

$$v_{[n]} = \underline{a}^T \underline{b} = \underline{a}^T (\mathbf{A} \underline{x}) = (\underline{a}^T \mathbf{A}) \underline{x}$$
(3.4)

by using (3.2). By assuming that the C-FFE and the A-FFE have the identical output  $v_{[n]}$ , the same output  $v_{[n]}$  of C-FFE can be also expressed in terms of  $\underline{w}$  and  $\underline{x}$  as

$$v_{[n]} = \underline{w}^T \underline{x}. \tag{3.5}$$

From (3.4) and (3.5), the outputs of both FFEs are identical if

$$\underline{w} = \mathbf{A}^T \underline{a}. \tag{3.6}$$

By using (3.6), we can always find the C-FFE that produces the same output of any A-FFE.

More explicit closed-form formulas of the A-FFE coefficients can be derived from (3.6) in terms of the C-FFE coefficients. The column vector of C-FFE coefficients  $\underline{w}$  can be expressed as

$$\underline{w} = \mathbf{W_{sign}} \underline{w_{abs}} \tag{3.7}$$

where  $\underline{w_{abs}}$  is a column vector of the absolute values of the C-FFE coefficients:  $\underline{w_{abs}} = [|w_0||w_1||w_2|\cdots|w_{N-1}|]^T$ . By substituting (3.3) and (3.7) into (3.6), we can derive (3.8) because  $\mathbf{W_{sign}}$  is a symmetric matrix.

$$\mathbf{W_{sign}}\underline{w_{abs}} = 0.5(\mathbf{W_{sign}} + \mathbf{C}^T)\underline{a}$$
 (3.8)

By multiplying  $W_{sign}$  to both sides of equation (3.8), (3.9) is acquired.

$$\mathbf{W_{sign}}^2 \underline{w_{abs}} = 0.5 (\mathbf{W_{sign}}^2 + \mathbf{W_{sign}} \mathbf{C}^T) \underline{a}$$
 (3.9)

Because  $\mathbf{W_{sign}}\mathbf{C}^T = \mathbf{C}^T$  and  $\mathbf{W_{sign}}^2 = \mathbf{I}$ , where  $\mathbf{I}$  is the  $N \times N$  identity matrix,

(3.9) can be simplified to (3.10).

$$\underline{w_{abs}} = 0.5(\mathbf{I} + \mathbf{C}^{T})\underline{a} = 0.5 \begin{bmatrix}
1 & \cdots & 0 & 1 & 0 & \cdots & 0 \\
\vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\
0 & \cdots & 1 & 0 & 0 & \cdots & 0 \\
1 & \cdots & 1 & 2 & 1 & \cdots & 1 \\
0 & \cdots & 0 & 0 & 1 & \cdots & 0 \\
\vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\
0 & \cdots & 0 & 1 & 0 & \cdots & 1
\end{bmatrix}$$
(3.10)

By solving (3.10) for each element of a, we can get the explicit closed-form formulas for A-FFE's coefficients in terms of the absolute values of C-FFE coefficients as

$$a_{k\neq m} = 2|w_k|, \quad a_m = w_m - \sum_{k=0, k\neq m}^{N-1} |w_k|.$$
 (3.11)

By using equation (3.6) or (3.11), for the given C-FFE design, we can always find the A-FFE design that produces the identical output and vice versa. Although the structures and coefficients of the two FFEs acquired by (3.6) or (3.11) are different, both FFEs have mathematically identical outputs  $v_{[n]}$  if there is no coefficient error. Therefore, the mapping between C-FFE and A-FFE always exists.

Equation (3.1) and (3.11) also prove that A-FFE does not have tap subtraction in most practical applications where the main tap coefficient  $(w_m)$  of the corresponding C-FFE is not smaller than 0.5. The A-FFE output  $v_{[n]}$  is a weighted sum of sub-filter output taps  $b_{[n-k]}s$ :  $v_{[n]} = \sum_{k=0}^{N-1} b_{[n-k]}a_k$ . Therefore, if all non-zero terms  $b_{[n-k]}a_ks$  have the same sign, then there is no analog subtraction between taps. We can prove this proposition by showing that all non-zero  $b_{[n-k]}s$  have the same sign using (3.1) and that  $a_k \geq 0$  for all  $k = 0, \ldots, N-1$  in practically interesting applications  $(w_m \geq 0.5)$  using (3.11). According to (3.1), of which results are also listed in Table 3.1, all non-zero sub-filter output taps  $b_{[n-k]}s$ , including k = m, have the same sign of the main data tap  $x_{[n-m]} = b_{[n-m]}$ . Because  $a_{k \neq m} = 2|w_k|$  in (3.11), it is trivial that  $a_k \geq 0$  for all  $k \neq m$ . Because we are mostly interested in practical single-ended channels



Figure 3.8: Example designs of the identical 4-tap FFE employing (a) C-FFE and (b) A-FFE architectures. The single-bit responses and tap driver outputs of (c) the C-FFE and (d) the A-FFE examples.

for which the C-FFE's main tap coefficient  $w_m \geq 0.5$ ,  $a_m = w_m - \sum_{k=0, k \neq m}^{N-1} |w_k| \geq 0$  from (3.11); because the C-FFE's coefficients are normalized  $(\sum_{k=0}^{N-1} |w_k| = 1)$ ,  $w_m \geq 0.5 \geq \sum_{k=0, k \neq m}^{N-1} |w_k|$  if  $w_m \geq 0.5$ . Therefore, equation (3.11) proves that  $a_k \geq 0$  for all ks in practical applications. Because all non-zero  $b_{[n-k]}$ s have the same sign and  $a_k \geq 0$  for all ks, there is no analog subtraction between taps of A-FFE in most practical applications. Table 3.2 shows the tap coefficients of C-FFE and A-FFE for various channel losses. In the simulation, when the PCB channel loss is 30 dB, the size of the C-FFE's main tap is 0.5. Therefore, A-FFE TX can be used when the channel loss is less than 30 dB.

For better understanding, Fig. 3.8 shows example designs of 4-tap FFE employ-

ing C-FFE and A-FFE architectures, and the waveforms of their single-bit responses as well as their tap drivers' outputs. A-FFE's coefficients are acquired from the C-FFE's coefficients by equation (3.11). Although C-FFE and A-FFE have different outputs of tap drivers in Fig. 3.8(c) and (d), both FFEs have the identical single-bit responses. However, the operation of the A-FFE differs from that of the C-FFE in that the A-FFE produces the output voltage by only adding the tap drivers' outputs (Fig. 3.8(d)) whereas the C-FFE has analog subtractions between tap drivers' outputs (Fig. 3.8(c)). It is also noticeable that the A-FFE's tap drivers, except the main one, are enabled only when necessary for FFE operation, whereas all the tap drivers of the C-FFE are always enabled. Table 3.3 summarizes the formulas of the outputs of C-FFE and A-FFE in terms of their tap coefficients. For all input patterns, the A-FFE does not have analog subtraction whereas the C-FFE does. Therefore, A-FFE saves unnecessary power consumption by analog subtraction between tap drivers' outputs.

Table 3.3: FFE Output Formulas of C-FFE and A-FFE in terms of FFE Coefficients

| D <sub>pre</sub> D <sub>main</sub> D <sub>post1</sub> D <sub>post2</sub> | CFFE sum (Coeff : -W <sub>pre</sub> ,                                        | AFFE sum (Coeff : +A <sub>pre</sub> ,                                        |
|--------------------------------------------------------------------------|------------------------------------------------------------------------------|------------------------------------------------------------------------------|
|                                                                          | +W <sub>main,</sub> -W <sub>post1,</sub> +W <sub>post2</sub> )               | +A <sub>main,</sub> +A <sub>post1,</sub> +A <sub>post2</sub> )               |
| -1 -1 -1 -1                                                              | +W <sub>pre</sub> -W <sub>main</sub> +W <sub>post1</sub> -W <sub>post2</sub> | -A <sub>main</sub> -A <sub>post2</sub>                                       |
| -1 -1 -1 +1                                                              | +W <sub>pre</sub> -W <sub>main</sub> +W <sub>post1</sub> +W <sub>post2</sub> | -A <sub>main</sub>                                                           |
| -1 -1 +1 -1                                                              | +W <sub>pre</sub> -W <sub>main</sub> -W <sub>post1</sub> -W <sub>post2</sub> | -A <sub>main</sub> -A <sub>post1</sub> -A <sub>post2</sub>                   |
| -1 -1 +1 +1                                                              | +W <sub>pre</sub> -W <sub>main</sub> -W <sub>post1</sub> +W <sub>post2</sub> | -A <sub>main</sub> -A <sub>post1</sub>                                       |
| -1 +1 -1 -1                                                              | +W <sub>pre</sub> +W <sub>main</sub> +W <sub>post1</sub> -W <sub>post2</sub> | +A <sub>pre</sub> +A <sub>main</sub> +A <sub>post1</sub>                     |
| -1 +1 -1 +1                                                              | +W <sub>pre</sub> +W <sub>main</sub> +W <sub>post1</sub> +W <sub>post2</sub> | +A <sub>pre</sub> +A <sub>main</sub> +A <sub>post1</sub> +A <sub>post2</sub> |
| -1 +1 +1 -1                                                              | +W <sub>pre</sub> +W <sub>main</sub> -W <sub>post1</sub> -W <sub>post2</sub> | +A <sub>pre</sub> +A <sub>main</sub>                                         |
| -1 +1 +1 +1                                                              | +W <sub>pre</sub> +W <sub>main</sub> -W <sub>post1</sub> +W <sub>post2</sub> | +A <sub>pre</sub> +A <sub>main</sub> +A <sub>post2</sub>                     |
| +1 -1 -1 -1                                                              | -W <sub>pre</sub> -W <sub>main</sub> +W <sub>post1</sub> -W <sub>post2</sub> | -A <sub>pre</sub> -A <sub>main</sub> -A <sub>post2</sub>                     |
| +1 -1 -1 +1                                                              | -W <sub>pre</sub> -W <sub>main</sub> +W <sub>post1</sub> +W <sub>post2</sub> | -A <sub>pre</sub> -A <sub>main</sub>                                         |
| +1 -1 +1 -1                                                              | -W <sub>pre</sub> -W <sub>main</sub> -W <sub>post1</sub> -W <sub>post2</sub> | -A <sub>pre</sub> -A <sub>main</sub> -A <sub>post1</sub> -A <sub>post2</sub> |
| +1 -1 +1 +1                                                              | -W <sub>pre</sub> -W <sub>main</sub> -W <sub>post1</sub> +W <sub>post2</sub> | -Apre -Amain -Apost1                                                         |
| +1 +1 -1 -1                                                              | -W <sub>pre</sub> +W <sub>main</sub> +W <sub>post1</sub> -W <sub>post2</sub> | +A <sub>main</sub> +A <sub>post1</sub>                                       |
| +1 +1 -1 +1                                                              | -W <sub>pre</sub> +W <sub>main</sub> +W <sub>post1</sub> +W <sub>post2</sub> | +A <sub>main</sub> +A <sub>post1</sub> +A <sub>post2</sub>                   |
| +1 +1 +1 -1                                                              | -W <sub>pre</sub> +W <sub>main</sub> -W <sub>post1</sub> -W <sub>post2</sub> | +A <sub>main</sub>                                                           |
| +1 +1 +1 +1                                                              | -W <sub>pre</sub> +W <sub>main</sub> -W <sub>post1</sub> +W <sub>post2</sub> | +A <sub>main</sub> +A <sub>post2</sub>                                       |
|                                                                          |                                                                              |                                                                              |

## 3.3 Robustness to Quantization Errors of Coefficients

A-FFE suppresses the dominant error signals resulting from quantization errors of tap coefficients by utilizing channel loss like B-FFE [26], [27]. These errors are modulated to higher frequencies by A-FFE's difference filters. The modulated high-frequency error signals are more attenuated by the channel loss than the C-FFE error signals that are not modulated to a higher frequency.

Fig. 3.9 shows the output error signals of both a C-FFE and an A-FFE by 20 % coefficient errors when each TX transmits a one-bit pulse. We assumed that the low-bit tap control could be employed for practical use. The coefficient errors due to quantization of FFE coefficients can be modeled as an additive constant to the nominal coefficient, as shown in Fig. 3.9. In C-FFE, the main tap coefficient is usually the largest, and therefore, the quantization error of the main tap coefficient dominantly contributes to the error signal for the same percentage of errors (Fig. 3.8(c)). Fig. 3.9(a) shows the output error signal of the C-FFE when its main tap coefficient  $w_m$  has +20 % error. On the other hand, in A-FFE, the pre-cursor and post-cursor coefficients at the output taps of difference filters nearby the main tap are large for de-emphasis (Fig. 3.8(d)), and thus their contributions to the error signals are dominant (Fig. 3.8(d)). Fig. 3.9(b)-(d) depict the output error signals when A-FFE has +20 % errors on the  $1^{\rm st}$  ( $a_{m+1}$ ),  $2^{\rm nd}$  ( $a_{m+2}$ ),  $3^{\rm rd}$  ( $a_{m+3}$ ) post-cursor coefficients at the output taps of difference filters, respectively.

We can express the C-FFE's additive TX output error signal  $w_m$ -TX(t) due to errors on the main tap coefficient  $w_m$  as

$$w_{m}TX(t) = \begin{cases} \Delta w_m & 0 \le t < T \\ 0 & otherwise \end{cases}$$
 (3.12)



(m+4)-th A-FFE tap coefficient. A 1st-order RC channel with a loss of 15 dB at Nyquist frequency and a time constant of 88 Figure 3.9: Error signals when C-FFE and A-FFE TXs transmit a single bit pulse at 20 Gb/s and there are 20 % quantization errors on (a) (m+1)-th C-FFE tap coefficient, (b) (m+2)-th A-FFE tap coefficient, (c) (m+3)-th A-FFE tap coefficient, and (d) ps is employed for the simulations.

PHO TECHNOLOG

The A-FFE's TX output error signal  $a_{m+k}$ -TX(t) caused by the post-cursor demphasis (using a difference filter) coefficient  $a_{m+k}$  can be expressed as

$$a_{m+k}TX(t) = \begin{cases} \Delta a_{m+k} & 0 \le t < T \\ 0 & T \le t < kT \\ -\Delta a_{m+k} & kT \le t < (k+1)T \\ 0 & (k+1)T \le t \end{cases}$$
(3.13)

 $\Delta w_m$  and  $\Delta a_{m+k}$  are the additive errors on  $w_m$  and  $a_{m+k}$ , respectively. T is the symbol period. Because  $w_m$ -TX(t) is a square pulse (Fig. 3.9(a)), its spectrum has a large energy concentration at low frequencies (Fig. 3.10).  $a_m$ -TX(t) is composed of one positive pulse and one negative pulse having the same magnitude  $\Delta a_{m+k}$  (Fig. 3.9(b)-(d)). Therefore, it has much lower energy at low frequency than the C-FFE's error signal (Fig. 3.10).

Fig. 3.10 depicts the near-end (at the TX output) and the far-end (at the RX input) error signals of the A-FFE and C-FFE in the frequency domain. The C-FFE has a +20 % quantization error on the main tap  $w_m$ . The A-FFE has a +20 % quantization error on the  $1^{\rm st}$  ( $a_{m+1}$ ),  $2^{\rm nd}$  ( $a_{m+2}$ ), and  $3^{\rm rd}$  ( $a_{m+3}$ ) post-cursor de-emphasis (using difference filters) coefficients, respectively, in Fig. 3.10(a)-(c).

The Fourier transform of  $w_m$ -TX(t) is

$$w_{m}TX(t) = \Delta w_{m}Tsinc(Tf)e^{-j\pi Tf}$$
(3.14)

The Fourier transform of  $a_{m+k}$ -TX(t) is

$$a_{m+k}TX(t) = \Delta a_{m+k}Tsinc(Tf)e^{-j\pi Tf}D(f)$$
(3.15)

where  $D(f)=0.5(e^{j\pi Tf}-e^{j2\pi(k-0.5)Tf})e^{(-jTf)}$  is the transfer function of the difference filter. Therefore, the difference filter, acting as a high-pass filter, attenuates the low-frequency components of the  $a_{m+k}$ -TX(t) (Fig. 3.10). It is noticeable that the A-FEE's frequency-domain transmitted error signal has much smaller low-frequency components than the C-FFE's.



Figure 3.10: The frequency-domain transmitted and received error signals of the C-FFE and the A-FFE caused (a) by 20 % quantization errors on the (m+1)-th C-FFE tap coefficient and (m+2)-th A-FFE tap coefficient, (b) by 20 % quantization errors on the (m+1)-th C-FFE tap coefficient and (m+3)-th A-FFE tap coefficient, (c) by 20 % quantization errors on the (m+1)-th C-FFE tap coefficient and (m+4)-th A-FFE tap coefficient, respectively.

SCIENCE AND 15 CHHOLOGY

Because the channel is linear time-invariant (LTI), we can derive the far-end received C-FFE and A-FFE error signals as

$$w_m RX(f) = H(f)w_m TX(f)$$
 (3.16)

and

$$a_{m+k}RX(f) = H(f)a_{m+k}TX(f)$$
 (3.17)

respectively, where H(f) is the transfer function of the channel. The A-FFE's difference filter suppresses low-frequency error components while the low-pass filter channel suppresses high-frequency error components like the B-FFE [26], [27] (Fig. 3.10(a)-(c)). In this manner, the A-FFE suppresses the error signals due to errors of the pre-cursor coefficients at the output taps of difference filters. On the other hand, the received C-FFE error signal is larger because its low-frequency component is rarely attenuated by the channel. Therefore, A-FFE is more robust to quantization errors of coefficients than C-FFE.

The A-FFE's robustness advantage regarding quantization errors of tap coefficients over the C-FFE becomes greater if the channel loss is larger. With a larger channel loss, the C-FFE eye size is smaller while the tap coefficients and thus their errors become larger. Therefore, the C-FFE suffers more from the quantization errors with a larger channel loss [26], [27]. On the other hand, with a larger channel loss, not only the quantization errors but also the attenuation of the error signals become larger in A-FFE. Therefore, A-FFE becomes much more robust to quantization errors of tap coefficients than C-FFE if the channel loss is large.

# 3.4 Transmitter Design

To verify the proposed FFE architecture, a 4-tap A-FFE TX was designed for a single-ended memory interface using inverter drivers. Fig. 3.11 shows a schematic diagram of the 4-tap A-FFE TX. The TX adopts a half-rate architecture, and consists



Figure 3.11: A schematic diagram of the implemented 4-tap A-FFE TX.

of latch-based half-rate shift registers, half-rate digital decoding blocks, serializing 2:1 multiplexers (MUXs), full-rate tap drivers, and a clocking circuit (Fig. 3.11).

The 4-tap A-FFE is composed of the 1<sup>st</sup> pre-tap, the main tap, the 1<sup>st</sup> post-tap, and the polarity-switchable 2<sup>nd</sup> post-tap to operate on a 15 dB PCB trace. The sign of the 2<sup>nd</sup> post-tap is controlled by the sign control bits of the XOR gates in the decoding block. The sizes of tap drivers are carefully designed to provide the driving strength needed for the desired data rate and the target channel loss.

The sub-filters of the A-FFE can be easily implemented by digital logic gates in the decoding block (Fig. 3.11). The inputs of the main tap and digital outputs of the decoding block are serialized by retiming 2:1 MUXs and then fed to the A-FFE TX tap drivers.

The tap drivers have two types: strong and weak drivers. A strong driver is an inverter bank that can provide large driving strength while a weak driver is a current-starved inverter whose strength can be precisely controlled by the tail current sources (Fig. 3.11) [33]. To allow for high swing voltage output, we used non-cascode current

SCIENCE



Figure 3.12: (a) The single-bit responses of the 4-tap A-FFE at TX output. (b) The single-bit responses of the 4-tap and 3-tap A-FFE at RX input.



Figure 3.13: Simulated and measured TX outputs with and without enabling a booster tap driver at 20 Gb/s with 1.1 V supply: (a) a simulated one bit pulse response, (b) a measured eye diagram with a disabled booster tap driver, (c) a measured eye diagram with a enabled booster tap driver.

sources in our prototype chip.

Because the magnitudes of the 1<sup>st</sup> pre-tap and 1<sup>st</sup> post-tap coefficients of the A-FFE are much larger than the main and 2<sup>nd</sup> post-tap coefficients, they are implemented with strong drivers for sufficient driving strength. In our proof-of-concept design, the 1<sup>st</sup> pre-tap and 1<sup>st</sup> post-tap drivers are binary 6-bit and 7-bit inverter banks, respectively, with digitally configurable driving strength. These configurations are chosen to produce the same coefficient error percentage to verify robustness to the quantization error of the tap coefficient in measurement. The NAND gates and NOR gates control pull-up and pull-down strengths of the strong drivers, respectively.

Because the A-FFE' coefficients of the main tap and the  $2^{nd}$  post-tap are smaller than the coefficients of the other taps, their tap coefficients must be precisely controlled. Therefore, these taps were implemented with the weak drivers. However, when only the main tap driver is turned on, the strength of the A-FFE's output driver is small. In Table 3.3, the input data patterns ( $D_{pre}$ ,  $D_{main}$ ,  $D_{post1}$ ,  $D_{post2}$ ) when only the main-tap driver drives the rest of the off FFE units are as follows: (-1, -1, -1, 1), (1, 1, 1, -1). Fig. 3.12 shows the single-bit response at the TX output and RX input with a 20 dB loss channel and output capacitors (200 fF for ESD and 100 fF for the pad) in the post-layout simulation. Although the limited driving strength of the maintap driver may cause some post-cursor ISI (Fig. 3.12(a)), the 2nd post-tap driver can effectively compensate for the induced ISI (Fig. 3.12(b)).

The strong drivers behave nonlinearly due to the change of the TX output impedance [23]. For example, as the TX output voltage approaches the supply, the pull-up strength of the strong driver diminishes due to the reduced drain voltage (Fig. 3.13(a)). Therefore, the TX output voltage is not sufficiently high. A booster tap driver is a strong driver that compensates for the nonlinear strength degradation of the inverter when the TX output voltage is near to the supply or ground (Fig. 3.11). In this case, the booster tap is activated to raise the TX output voltage to the appropriate level (Fig. 3.13(a)). In this example, the PMOS booster tap turns on when both 1<sup>st</sup> pre-tap and 1<sup>st</sup> post-tap are activated simultaneously to increase the output voltage. Likewise, the NMOS booster tap helps tap drivers pull the TX output down when needed.



Figure 3.14: Histograms of the A-FFE TX's output impedance and simulated eye diagrams of TX output across corners: (a) the pull-up impedance, (b) the pull-down impedance, and (c) simulated eye diagrams.

Fig. 3.14 shows histograms of the A-FFE TX's output impedance and the 20 Gb/s eye diagrams simulated with a 20 dB loss channel at multiple corners. The histograms were acquired by Monte Carlo simulation with 1000 samples. Thanks to the low impedance of the switch-MOS, when all taps are turned on, the pull-up and pull-down impedances of the A-FFE TX are 7.86  $\Omega$  and 8.27  $\Omega$ , respectively, which is very small compared to conventional 50  $\Omega$ . The 3 $\sigma$  variations of the pull-up and pull-down impedances are 0.84  $\Omega$  and 0.93  $\Omega$ , respectively. The eye height is reduced by the variance of the variance of the pull-up and pull-down impedances are 0.84  $\Omega$  and 0.93  $\Omega$ , respectively. The eye height is reduced by the variance of the pull-up and pull-down impedances are 0.84  $\Omega$  and 0.93  $\Omega$ , respectively. The eye height is reduced by the variance of the pull-up and pull-down impedances are 0.84  $\Omega$  and 0.93  $\Omega$ , respectively.



Figure 3.15: A schematic diagram of clocking circuits for serializing 2:1 MUXs.

ation of the output impedance without changing the tap coefficients (Fig. 3.14(c)). The maximum reduction rate of the eye height is 11 %. Despite the variations, we achieve the eye diagram with high eye height due to the high TX output voltage swing (Fig. 3.14(c)). The receiver-side termination of 50  $\Omega$  allows for relaxed impedance matching constraints, even with changes in transmitter impedance caused by PVT variation [5].

The clocking circuit consists of a duty cycle corrector (DCC), a digitally control delay line (DCDL), and clock drivers for serializing 2:1 MUXs (Fig. 3.15). The DCC consists of an always-on inverter and 4-bit coarse and 3-bit fine tri-state inverter banks with adjustable rise/fall delays. The DCDL is used to compensate for the skew between CLK\_OUT and CLKB\_OUT (Fig. 3.15). The DCDL consists of two inverters, each with MOS capacitor banks that are inserted between the inverter stages.

#### 3.5 Measurement Results

To verify that an A-FFE TX can employ inverter drivers and improve robustness against quantization errors of tap coefficients and power efficiency, we fabricated the

proposed 4-tap A-FFE TX in 28-nm CMOS technology. Fig. 3.16 shows the TX's die micrograph. Due to the absence of termination resistors, the TX driver and the TX core occupy only  $336 \, \mu m^2$  and  $1149 \, \mu m^2$ , respectively. A test environment of the TX is shown in Fig. 3.17. The chip was tested with a supply voltage of 1.1 V and the PRBS-31 data pattern. The PRBS-31 data is produced by using an on-chip PRBS generator [34]. The output data of the TX is applied to the oscilloscope via a PCB trace, an SMA cable, and a bias-tee. The PCB trace loss is measured to be 15 dB at 10 GHz (Fig. 3.17).

Fig. 3.13(b) and (c) show the measured eye diagrams of the TX without and with enabling a booster tap, respectively. The TX achieves a data rate of 20 Gb/s. The eye height and width are 55.1 mV and 0.44 UI, respectively, as shown in Fig. 3.13(c). With the disabled booster tap, the eye height is reduced to 30.9 mV, as shown in Fig. 3.13(b). Therefore, the eye height is improved by 78 % by the booster tap, which compensates for the non-linear behavior of the inverter drivers.

To evaluate the A-FFE's robustness against quantization errors of coefficients, we measured eye sensitivities [23]. The eye sensitivity is the percentage of eye size reduction divided by the percentage of a coefficient reduction [23]. Fig. 3.18 shows the measured eye diagrams without and with a 20 % error on the most sensitive tap coefficient. The errors of the strong and the weak drivers are given by changing the number of enabled inverters and the strength of current sources, respectively. The eye height was the smallest when the pre-cursor tap coefficient was reduced by 20 %. The measured worst eye sensitivity is 0.68 (Fig. 3.18). In simulations, the eye sensitivities of the 4-tap inverter-based C-FFE TX and the 4-tap inverter-based A-FFE TX are 3.8 and 0.7, respectively (Fig. 3.3). These data show that A-FFE is much more robust against quantization errors of coefficients than C-FFE.

Fig. 3.19 presents the measured A-FFE TX energy consumption versus data transition probability. The A-FFE TX drivers' energy consumption is changed by transition probability. When data transition occurs, the difference filters become active and generate the 1st post-cursor and 1st pre-cursor signals; if no data transition occurs,



Figure 3.16: A die micrograph.

these filters are deactivated. In Fig. 3.19(b), without data transition (0 % probability), the A-FFE TX drivers consume only 10 % of the power consumption with 25 % data transition probability. Excluding the main-tap and 2nd post-tap, all A-FFE taps are activated only when data transition occurs. As a result, the energy consumption increases linearly with the probability of data transition (Fig. 3.19). Therefore, in the idle state, the A-FFE TX can save unnecessary power dissipation.

Fig. 3.20 shows the power breakdown of the TX. The power consumption breakdown of each sub-circuit is estimated by the measured TX power and by prorating the simulation results. The TX consumes 1.18 pJ/bit at the maximum speed of 20 Gb/s. The clocking circuit (clock drivers, DCC, and DCDL) occupies the largest portion (43 %) of the TX power consumption whereas the decoding block consumes the smallest portion (3 %). Because the decoding block is placed before the serializers, its size and power consumption are small. The clock drivers, the DCC, and the DCDL occupy 22



Figure 3.17: The test setup.

%, 9 %, and 12 % portion of the TX power consumption, respectively (Fig. 3.20). Excluding the power consumption of the DCC and the DCDL, the TX consumes 0.93 pJ/b at a data rate of 20 Gb/s (Fig. 3.20). The power consumption of the DCDL increases the power consumption of the entire clocking circuit. Due to the limited design time, we utilized an existing clocking circuit that was not optimized in this prototype chip. If we had used an optimized clocking circuit, the power consumption would have been smaller.

A performance summary and comparison with prior works are shown in Table 3.4. The proposed A-FFE architecture entirely removes subtractions between 4-tap cursors whereas the C-FFE [2] and the B-FFE [27] do not. Since the A-FFE has an



Figure 3.18: Measured eye diagrams of the 4-tap A-FFE TX: (a) without and (b) with a 20 % error on the most sensitive tap ( $1^{st}$  pre-tap) coefficient.



Figure 3.19: The measured A-FFE TX energy consumption versus probability of data transition: the energy consumption of (a) the total TX circuit and (b) the only drivers.

addition-only property, the 4-tap A-FFE TX can use inverter drivers. The TX is robust to quantization errors of coefficients like the B-FFE [27] employing difference filters. Since using area-efficient inverter drivers, the TX occupies only 1149  $\mu$ m<sup>2</sup>. The TX occupies a smaller area than the passive-equalization TX [32]. The TX achieves a data



Figure 3.20: The power breakdown of the A-FFE TX.

rate of 20 Gb/s/pin although the TX in [28] achieves a slower data rate of 18 Gb/s/pin.

### 3.6 Summary

In this chapter, we proposed an addition-only FFE (A-FFE) TX architecture. The addition-only property of the A-FFE architecture has three advantages. First, it allows to use inverter drivers as FFE taps. Because the inverter drivers do not include termination resistors, the TX can fit in a very small area. Second, it saves unnecessary power consumption by tap subtractions. Finally, A-FFE is robust to quantization errors of tap coefficients because the error signals are suppressed by the channel loss.

To verify the A-FFE architecture, we designed an inverter-based 4-tap A-FFE TX in 28-nm bulk technology. The test chip achieves a data rate of 20 Gb/s/pin. The TX

drivers have inverter banks and current starved inverters without employing termination resistors. Therefore, it occupies a small area of 1149  $\mu m^2$ . Furthermore, the TX consumes low power when the probability of data transition is low. The A-FFE TX drivers consume 90 % less power when the data transition probability is 0 % than when it is 25 %. The TX achieves an eye sensitivity of 0.68 and its energy efficiency is 1.18 pJ/bit.

Table 3.4: Performance Summary and Comparison

|                               | ISSCC 2020 [28]                                    | ISSCC 2022 [2]                                            | TCAS I 2022 [5]           | JSSC 2016 [32]                                       | JSSC 2016 [27]             | This work                              |
|-------------------------------|----------------------------------------------------|-----------------------------------------------------------|---------------------------|------------------------------------------------------|----------------------------|----------------------------------------|
| Technology                    | 8 nm                                               | 65 nm                                                     | 28 nm LPP                 | 28 nm FD-SOI                                         | 65 nm                      | 28 nm LPP                              |
| Supply voltage (V)            | VDDQ = 1.35,<br>VDD = 0.85                         | 1                                                         | 1.13                      | N/A                                                  | 1.3                        | 1.1                                    |
| Single/Differential           | Single                                             | Single                                                    | Single                    | Single                                               | Differential               | Single                                 |
| Driver Type                   | Voltage-mode driver +<br>capacitive-peaking driver | Capacitive driver with a ground-forcing biasing technique | Low-impedance<br>inverter | High-impedance<br>Inverter + RC high-<br>pass filter | CML                        | Inverter + current<br>starved inverter |
| Equalization (TX)             | 1-tap de-emphasis, Edge<br>boost, FEXT EQ          | 2-tap FFE                                                 | X                         | Passive EQ                                           | 4-tap B-FFE                | 4-tap A-FFE                            |
| FFE Tap Addition-Only         | X                                                  | X                                                         | N/A                       | N/A                                                  | X                          | 0                                      |
| Data pattern                  | N/A                                                | PRBS 7                                                    | PRBS 31                   | PRBS 7                                               | N/A                        | PRBS 31                                |
| Data rate (Gb/s)              | 18                                                 | 12                                                        | 20                        | 20                                                   | 8                          | 20                                     |
| Channel loss (dB)             | 10                                                 | N/A                                                       | 8                         | 10.7                                                 | 25                         | 15 (PCB trace only)                    |
| Worst eye sensitivity         | N/A                                                | N/A                                                       | N/A                       | N/A                                                  | 0.56                       | 89.0                                   |
| Output Swing at TX Output     | N/A                                                | N/A                                                       | N/A                       | N/A                                                  | N/A                        | 870 mV <sub>pp</sub>                   |
| Output Swing at Far-End       | N/A                                                | N/A                                                       | $850~\mathrm{mV_{pp}}$    | $118^{\rm b}{\rm mV_{pp}}$                           | $131^{b}\mathrm{mVd_{pp}}$ | 253 mV <sub>pp</sub>                   |
| Eye Height                    | 130 mV (WRITE), 110 mV (READ)                      | 36 <sup>b</sup> mV                                        | 234 mV                    | 24 <sup>b</sup> mV                                   | 50 mV                      | 55.1 mV                                |
| Energy efficiency (TX) (pJ/b) | N/A                                                | 0.264                                                     | 1.18                      | 0.14                                                 | N/A                        | 1.18                                   |
| Area (um²)                    | 4151250 <sup>a</sup>                               | 3045                                                      | 1126                      | 4556                                                 | 2128                       | 1149                                   |
|                               | The street of the street of the street             |                                                           |                           |                                                      |                            |                                        |

<sup>a</sup>Area includes PLL, CA Slice, and data slice (16bit) <sup>b</sup>Estimated from an eye diagram.

# IV. Design of Compact Single-ended PAM4 Transmitters with Inverter-based Crosstalk Compensation for Memory Interfaces

This chapter presents a four-level pulse-amplitude modulation (PAM4) transmitter (TX) with crosstalk compensation (XTC) for short-reach memory interfaces. Simple encoders and transition detectors detect the data pattern causing crosstalk and appropriately activate inverter-based XTC taps. With gain and delay control of XTC, compensation error due to the mismatch between the victim and aggressor channels was minimized. The TX was fabricated in 28 nm LP CMOS and tested at 16 Gb/s. With XTC, the eye height and width were improved by 203 % and 396 %, respectively. Because it uses area-efficient inverter-based XTC taps, the TX occupies only 0.0067 mm², achieving an area per data rate of 0.00042 mm²/Gbps.

The rest of this chapter is organized as follows. Chapter 4.1 introduces the challenges and design considerations for single-ended PAM4 memory interfaces in massively parallel short-reach applications, particularly focusing on FEXT compensation techniques to address the increased sensitivity to crosstalk compared to traditional NRZ signaling. Chapter 4.2 explains the overall architecture of the PAM4 XTC TX. Chapter 4.3 describes the circuit design of the proposed PAM4 XTC TX. Chapter 4.4 shows the experimental results and comparison with the prior arts. Finally, Chapter 4.5 concludes this Chapter with summaries.

#### 4.1 Overview

With the increasing demand for high-speed, low-latency, and low-power memory access, single-ended four-level pulse-amplitude modulation (PAM4) memory inter-



Figure 4.1: PAM4 crosstalk compensation.

faces utilizing massively parallel short-reach interconnects such as interposers, high bandwidth memory (HBM), etc. are becoming attractive [35]. Because many short interconnects are densely placed in parallel in such applications, the channel loss is typically small, while the far-end crosstalk (FEXT) is large. By taking advantage of the small channel loss, PAM4 signaling can double data rate per pin at the cost of reduced eye opening compared with non-return-to-zero (NRZ) signaling.

However, the large FEXT is more problematic in PAM4 than in NRZ because the eye height is less than 1/3 of NRZ's while the peak crosstalk amplitude is the same as NRZ's (Fig. 4.1). In addition, active cancellation of PAM4 FEXT is more difficult than NRZ FEXT because the PAM4 FEXT amplitude dynamically changes with the data pattern (Fig. 4.1). Furthermore, PAM4 FEXT compensation requires more accu-

rate matching of driver strengths and interconnect skews between the victim and the aggressors than NRZ FEXT compensation [36], [37] (Fig. 4.1). Because variations in driver strengths and interconnect delays cause larger mismatches between crosstalk compensation (XTC) signal and FEXT in PAM4 signaling than in NRZ signaling, residual FEXTs can be large enough to close the eye-opening even after crosstalk compensation in PAM4 signaling (Fig. 4.1). In addition, the PAM4 FEXT compensation circuit must be integrated in a tight area because numerous inputs/ouputs (I/Os) must be integrated in a limited area in such massively parallel short-reach memory interface applications. Because of all these challenges and requirements, designing a single-ended PAM4 interface circuit for massively parallel short-reach memory interfaces is quite challenging.

In Fig. 4.2(a), two signals are shown propagating through multi-lanes with equal channel lengths. The crosstalk coupling from the aggressor to the victim lane is perfectly corrected by the XTC signal at time  $t_0$ . However, when the channel lengths differ, the crosstalk signal reaches the receiver terminal at a different time compared to its correction version due to channel skew ( $t_{\rm skew}$ ). In the scenario depicted in Fig. 4.2(b), where the aggressor lane is shorter than the victim lane, the XTC signal arrives at time  $t_0$ , while the crosstalk signal arrives at  $t_0 - t_{\rm skew}$ . This mismatch results in residual crosstalk that is not fully canceled out. Therefore, to achieve perfect crosstalk cancellation, it is necessary to compensate for the time skew between the aggressor lane and the victim lane and perform gain control to generate an XTC signal that has the same magnitude but opposite polarity as the crosstalk signal.

At the receiver (RX), the XTC [36], [37] uses high-pass RC (resistor-capacitor) filters to cancel NRZ FEXT. However, due to the termination resistor and RC high-pass filter connection, the RX input impedance does not match 50  $\Omega$ , causing signal integrity problems. Moreover, the RC filter resistor introduces additional parasitic capacitance, which degrades both the bandwidth of the high-pass filter and the area efficiency metric. [36] demonstrated delay adjustment for NRZ FEXT cancellation by RC-delay control to compensate for delay mismatches between the victim and ag-



Figure 4.2: Crosstalk compensation for (a) un-skewed and (b) skewed lanes.

gressors. [37] demonstrated by simulation that decision-feedback crosstalk canceller (DFXC) reduces the sensitivity to the delay mismatches. However, these techniques are only introduced for NRZ FEXT cancellation [36], [37].

At the transmitter (TX), an XTC signal is added to the victim signal during the transition time of the aggressor signal to reduce the FEXT [7], [8], [34], [38]. Also, [39] employed capacitive coupling to eliminate crosstalk-induced jitter (CIJ) due to its opposing characteristics to inductive coupling. However, [7], [8], [39] used current-mode logic (CML) drivers, which are power-hungry. In [38], an XTC utilized deemphasis source-series termination (SST) drivers, hence, decreasing output swing. [8] utilized capacitive peaking XTC. However, the capacitors occupy an additional TX area [8], [39]. In [33], a full-rate clock is used to generate a return-to-zero (RZ) transition at the pulse generator input to produce narrow XTC pulses, and thus, [33] is not efficient for reaching high data rates. In addition, none of these techniques [7], [8], [33], [38], [39] demonstrated delay adjustment circuits for FEXT cancellation, although delay matching is critical in the compensation of a PAM4 FEXT.

We present an area-efficient single-ended PAM4 TX with inverter-based XTC for short-reach memory interfaces. Each TX consists of one main tap and two XTC taps to cancel FEXT signals from two aggressors. For precise crosstalk cancellation required in PAM4 signaling, each XTC tap employs accurate delay and gain control circuits. Because compact inverter-based XTC taps are utilized in our design, our TX achieved the smallest area occupancy of 0.0067 mm<sup>2</sup>.

#### 4.2 Architecture

Fig. 4.3 presents a comprehensive high-speed communication system with crosstalk compensation mechanisms, incorporating both PAM4 transmission and XTC control. The system consists of three main components: transmitters (TX1-TX3), cross-coupled channels, and receivers (RX1-RX3). Each transmitter incorporates one main driver for PAM4 signals and two XTC taps with dedicated delay control (green)



Figure 4.3: The overall architecture of the proposed TX.

and gain control (blue) circuits. The signals traverse through cross-coupled channels characterized by specific transfer functions H(f). The channel characteristics are defined by  $H_{\rm ch1}(f)$ , representing the transfer function of the first channel, with  $\Delta H_{\rm ch1}(f)$  accounting for variations due to channel length mismatch. The crosstalk effect between channels is described by transfer functions like  $H_{XT_{21}}(f)$ , which represents the FEXT affecting the second channel's output from the first channel's input. The variation in these crosstalk transfer functions, denoted as  $\Delta H_{XT_{21}}(f)$ , occurs due to changes in both the source and destination channel characteristics ( $\Delta H_{\rm ch1}(f)$  and  $\Delta H_{\rm ch2}(f)$ ). Each XTC tap generates compensation signals (XTC1-XTC6) to cancel



Figure 4.4: The schematic diagram of the transmitter and the S-parameters of the interconnects.

corresponding FEXT signals (FEXT1-FEXT6), with the delay and gain control ensuring precise crosstalk cancellation despite variations in driver strength and interconnect characteristics.

# 4.3 Transmitter Design

To verify the proposed concept, three TXs were designed for three single-ended interconnects (Fig. 4.4). A TX consists of one main tap and two XTC taps as well as

clock circuits. The main tap is composed of 11 LSB segments and 20 MSB segments, which take inputs of 4 LSB bits and 4 MSB bits at every 1/4x clock cycle, respectively (Fig. 4.4). Because all segment designs are identical, the main tap produces PAM4 signal by the 1:2 strength ratio between the LSB and MSB segments. Each segment consists of a 4:2 MUX, a 2:1 MUX, a pre-driver, and a SST driver. The quarter-rate input 4 bits are serialized to a full-rate 4-bit stream by 4:1 serialization of the two MUXs, and then fed to the pre-driver followed by the SST driver. The strength of the SST driver can be statically controlled by 4-bit transistor banks. One additional segment is assigned for LSB to improve the ratio of level mismatches (RLM). Because the loss of the short-reach interconnect is small (Fig. 4.4), equalization was not employed.

Fig. 4.5 shows a schematic diagram of the 4:2 MUX with the encoder for XTC segments. The encoder is implemented by simple digital logic gates. The primary design goal for the serializer is to satisfy the critical timing path, defined by:

$$t_{\text{CK-Q}} + t_{\text{LOGIC}} + t_{\text{SETUP}} < 0.5T_{\text{CK4}} \tag{4.1}$$

where  $t_{\rm CK-Q}$ ,  $t_{\rm LOGIC}$ ,  $t_{\rm SETUP}$ , and  $t_{\rm CK4}$  are the clock-to-q delay of the retimer, the largest digital logic delay in the encoder, MUX setup time, and divided clock period, respectively. In the post-layout simulation with a supply voltage of 1 V, a baud rate of 8 Gb/s, and the NN corner,  $t_{\rm CK-Q}$ ,  $t_{\rm LOGIC}$ ,  $t_{\rm SETUP}$ , and  $t_{\rm CK4}$  are 13.4 ps, 26 ps, 3 ps, and 500 ps, respectively. Using equation(4.1), we calculated a delay margin of 207.6 ps. We have verified that the encoder's logic delay meets the timing constraint in all corners. The encoder's placement before the 4:2 MUX provides a larger delay margin compared to placement before the 2:1 MUX.

An inverter-based XTC design is proposed to cancel a PAM4 FEXT (Fig. 4.4). Because a PAM4 FEXT is a narrow pulse of which amplitude differs from an NRZ FEXT (Fig. 4.1), we adopted the narrow pulse generation method from the prior XTC for an NRZ FEXT [33] and appropriately modified it for faster PAM4 FEXT cancellation.



Figure 4.5: A schematic diagram of the 4:2 MUX with the encoder for XTC segments.

ADDR SCIENCE WAS ADDRESSED OF SCIENCE WAS ADDR



 $M_n \ and \ L_n \ are \ the \ MSB \ and \ LSB \ bits \ at \ time \ index \ n. \qquad XT \ is \ the \ minimum \ magnitude \ of \ the \ FEXT.$ 

Figure 4.6: Encoding table for XTC segments, and the Boolean expressions of the encoded input (INR, INF) to the four XTC segments (SEG1, SEG2, SEG3, and SEG4).

In [33], high-speed RZ control signals were generated for XTC operation at the speed of 4 Gb/s. Because the pulse widths of these RZ signals must be narrower than 1 UI, producing such a narrow pulse is not energy-efficient at our target speed of 16 Gb/s. Therefore, in our design, we appropriately modified the technique to produce high-speed NRZ control signals instead of RZ control signals. The proposed XTC tap consists of an encoder and four identical XTC segments (Fig. 4.4).

Each segment is composed of two 4:2 MUXs and two 2:1 MUXs, one rising transition detector, one falling transition detector, and a bank of inverter-based XTC drivers. Because there are seven possible voltage values (-3XT, -2XT, -XT, 0, XT, 2XT, 3XT) of a PAM4 FEXT depending on the aggressor data pattern, the four XTC segments are differently controlled to produce the opposite pulse according to the encoding table (Fig. 4.6). The input quarter-rate 8 bits are encoded to the quarter-rate 32

control bits for the XTC segments. Each XTC segment has an input of quarter-rate 8 bits (4 bits for pull-up and 4 bits for pull-down). The four MUXs serialize them to feed two full-rate bits (INR and INF) to the rising and falling transition detectors, respectively. The Boolean expressions of these inputs are summarized for each XTC segment in Fig. 4.3. For faster speed, energy efficiency, and design simplicity, a narrow XTC pulse is generated by the NRZ-based transition detectors instead of the RZ-based pulse generators [33] while four XTC segments are utilized. To achieve energy-efficient operation at a higher speed, we employ a half-rate clock. Instead, the encoder guarantees the necessary NRZ transition of inputs of the transition detectors for a narrow pulse generation (Fig. 4.4). The pulse width is also statically controlled for precise FEXT cancellation for higher speed (Fig. 4.4).

The XTC driver is a bank of four binary-weighted inverters with foot switches that allow the driver strength control. To minimize the timing error, no buffer is inserted between a transition detector and an XTC driver. For the delay control of the XTC signal, the last re-timing clock of the XTC tap can be adjusted by the digitally controlled delay line (DCDL) (Fig. 4.7). Fig. 4.7 presents a schematic diagram of the DCDL and delay increments of the DCDL under different corners, supply voltages, and temperatures. The DCDL consists of three inverters with MOS capacitor banks inserted between the inverter stages. The delays are coarsely controlled by 3 binary bits and finely controlled by 4 binary bits.

Fig. 4.8(a) and (b) show the proposed 2-to-1 MUX and its timing diagram. By using a DCDL, the delay of CK2\_XTC, which is the MUX selection clock, can be adjusted. The critical timing constraint is defined by:

$$t_{\text{CK-Q}} + t_{\text{INV}} + t_{\text{SETUP}} < T_{\text{D}} < t_{\text{CK-Q}} + t_{\text{INV}} + 0.5T_{\text{CLK}} - t_{\text{HOLD}}$$
 (4.2)

where  $t_{\text{CK-Q}}$ ,  $t_{\text{INV}}$ ,  $t_{\text{SETUP}}$ ,  $t_{\text{HOLD}}$ ,  $T_{\text{CLK}}$ , and  $T_{\text{d}}$  are the clock-to-q delay of the retimer, inverter buffer delay, MUX setup time, MUX hold time, clock period, and DCDL delay, respectively.



Figure 4.7: (a) A schematic diagram of the DCDL and (b) delay increments of the DCDL versus fine and coarse digital codes under different corners and temperatures.

OF SCIENCE AND TECHNOLOGY



Figure 4.8: (a) Retimer and 2-to-1 MUX. (b) 2-to-1 MUX timing.

AND SULL OF SCIENCE WIND LECHNOSOLE

In the post-layout simulation with a supply voltage of 1 V, a baud rate of 8 Gb/s, and the NN corner,  $t_{\rm CK-Q}$ ,  $t_{\rm INV}$ ,  $t_{\rm SETUP}$ ,  $t_{\rm HOLD}$ , and  $T_{\rm CLK}$  are 13.4 ps, 5 ps, 3 ps, 3 ps, and 250 ps, respectively.  $T_{\rm d}$  can be adjusted from a minimum of 27.7 ps to a maximum of 54.2 ps. From equation(4.2),  $T_{\rm d}$  must be between 21.4 ps and 140.4 ps. Therefore, the DCDL delay satisfies the timing constraint and does not affect the sampling margin. We have verified that the DCDL delay meets the timing constraint in all corners.

#### 4.4 Measurement Results

The TXs were fabricated in 28nm LP CMOS technology. Fig. 4.9 shows the TX's die micrograph and the power breakdown. The TX consumes 1.6 pJ/bit/lane at the maximum speed of 16 Gb/s. The XTC circuit occupies 10.8 % of the TX power consumption. A TX occupies the smallest area of only 0.0067 mm<sup>2</sup> in comparison with similar prior arts and was tested via 4-cm PCB traces with a PRBS31 pattern. The channel space and width are 602 µm and 430 µm, respectively. The measured S-parameters of the signal (S52) and FEXT (S51) paths of the PCB traces are -1.1 dB and -18 dB, respectively, at 4 GHz (Fig. 4.4). Because the parasitic capacitances from PADs, ESDs, solder bumps, and etc. are included in this measurement, the overall channel characteristics would be worse than the S-parameters in Fig. 2. A test environment of the TX is shown in Fig. 6. The two aggressors are located on either side of the victim channel (Fig. 4.10). Fig. 4.11 illustrates the measured FEXT eye diagram when two TXs at the adjacent aggressor channels generate 16 Gb/s PAM4 signals. Although the FEXT signal attenuation of the channel is -18 dB, the measured peakto-peak FEXT of 234 mV indicates that two aggressors transmitting PRBS31 patterns can cause a significant level of crosstalk interference when the crosstalk signals are accumulated: this peak-to-peak FEXT of 234 mV is the largest among the prior arts that clearly reported the FEXT voltage amplitudes. Because the theoretical eye-opening without inter-symbol interference (ISI) is only about 166.7 mV in PAM4 signaling, this level of FEXT can completely close the eye diagram.





Figure 4.9: (a) A chip microphotograph and (b) the power breakdown of the proposed PAM4 TX with XTC taps.

1986 WHATH OF SCIENCE WAS 1986 ST.



Figure 4.10: A measurement setup.



Figure 4.11: A measured FEXT eye diagram when two TXs at the aggressor channels generate 16 Gb/s PAM4 signals.

Fig. 4.12(a) and (b) present the measured FEXT pulses without and with the XTC, respectively, when the TX at the aggressor channel generates a single-bit pulse at a

Collection @ po



Figure 4.12: Measured single-bit responses at the aggressor and FEXT pulses at the victim (a) without and (b) with XTC.

symbol rate of 8G symbols per second. Without the XTC, the peak-to-peak voltage of the FEXT pulse caused by a single-bit aggressor pulse is measured 108.9 mV. However, when the XTC gain is precisely adjusted, the peak-to-peak voltage of the FEXT is reduced to 19.4 mV, showing 82 % reduction of FEXT.

The far-end eye diagram of the victim was measured with a high-speed oscilloscope in various conditions, and we report the smallest eye-opening among three PAM4 eyes in this brief. In our proof-of-concept design, the XTC driver strengths for gain control and the DCDLs for timing control are manually controlled without any adaptive algorithm. The TX achieved the maximum data rate of 16 Gb/s, the eye height of 40.3 mV, and the eye width of 0.352 UI, canceling FEXTs from the two neighbor aggressors (Fig. 4.13). However, the upper and middle eye heights are relatively smaller than the bottom eye height due to ISI caused by parasitic capacitance.

When XTC taps were turned off, the eye height and width were reduced to 13.3 mV and 0.071 UI, respectively. This result shows that the proposed crosstalk compensation improves the eye height and width by 203% and 396%, respectively. Without the aggressors, the eye height and width were measured 60.2 mV and 0.376 UI, respectively.

tively. We emulated the delay-mismatch scenario to demonstrate the importance of the delay matching for FEXT cancellation in practical applications where the lengths of many interconnects can hardly be identical due to differences in routing paths caused by various practical constraints such as different bump locations and different signaling layers, etc. In the worst delay-mismatch scenario, the skew mismatches between the victim and the two aggressors were adjusted for the smallest eye. When the XTC taps were turned off in the worst delay-mismatch scenario, the eye diagram was completely closed (Fig. 4.13). Even though the gains of XTC taps were adjusted for the best eye-opening in the worst delay scenario, the eye diagram was almost closed (Fig. 4.13). This result clearly shows that the eye-opening can be almost closed no matter how precisely we control the FEXT XTC gains unless the delay mismatches between the victim and aggressors are appropriately adjusted. However, the eye diagram of the aggressors was opened because the FEXT magnitude between the two outermost channels is -30.9 dB, which is smaller than the FEXT magnitude of -18 dB between the closest channels (Fig. 4.14).

The performances of the proposed TX and the prior arts are summarized and compared in Table 4.1. Among the similar prior arts that clearly reported FEXT amplitudes, the proposed TX compensates for the peak-to-peak FEXT of 234 mV from two aggressors. Except [33] and [39], only our TX compensates for the crosstalks from both aggressors, while others [7], [8], [38] compensate for only one aggressor. Although the prior art [33] compensates for crosstalks from 4 aggressors, it can only compensate for NRZ crosstalk at the maximum speed of 4 Gb/s. In comparison with the similar prior arts [7], [8], [33], [38], [39], only our TX employs an XTC timing control circuit.

Although the TX designs in [7], [8], [38] achieve faster speeds than the proposed TX, they occupy more than 3.55 times, 35.4 times, and 12.7 times of the chip area of the proposed design, respectively, because the XTCs [7], [8], [38] employ large SST drivers, large CML drivers, and large capacitors, respectively. In contrast, the proposed TX uses area-efficient inverter banks to generate XTC signals.



Figure 4.13: Measured PAM4 TX eye diagrams with and without XTC taps, delay control, and aggressors.



Figure 4.14: Measured PAM4 TX eye diagrams of the victim and two aggressors without XTC taps and delay control.

1986 TECHNOLOGIES

As a result, the proposed TX occupies the smallest area of 0.0067 mm<sup>2</sup> and achieved the best area efficiency (area per data rate) of 0.00042 mm<sup>2</sup>/Gbps, among the prior arts [7], [8], [33], [38], [39]. In the future advanced memory packaging applications where silicon area for I/O circuits is a critically limited resource, the proposed TX design can achieve the highest data rate.

#### 4.5 Summary

In this chapter, we proposed the area-efficient PAM4 TX with compact inverterbased XTCs for short-reach memory interfaces. To address the delay mismatches between the victim and the aggressors, we also introduce delay adjustment circuits for PAM4 FEXT compensation for the first time.

To verify the proposed XTC scheme, three transmitters were designed and fabricated in 28 nm LP CMOS technology. The TXs successfully transmitted a PRBS31 pattern at 16 Gb/s while compensating for PAM4 FEXT signals produced by two aggressors. The compensated peak-to-peak PAM4 FEXT signal was measured 234 mV, which is the largest among the prior arts that clearly reported the peak-to-peak FEXT amplitude. At 16 Gb/s, it achieved an energy efficiency of 1.6 pJ/bit/lane with horizontal and vertical eye opening of 0.352 UI and 40.3 mV, respectively. With gain and delay control of XTC, the eye height and width were improved by 203 % and 396 %, respectively. With the worst delay mismatch, the eye diagram was almost closed no matter how precisely we optimized the XTC gain, showing the importance of the delay adjustment for XTC. Due to the compact inverter-based XTC design, the proposed TX occupies the smallest area of only 0.0067 mm² and achieves the best area efficiency per data rate (0.00042 mm²/Gbps). Therefore, in the future advanced memory packaging applications where silicon area for I/O is a limited resource, the proposed compact TX design can achieve the fastest data rate for a given silicon area cost.

Table 4.1: Performance Summary and Comparison With Other Reported Transmitter-side XTC Designs

|                                                   | This                       | This Work                         | TCASI                     | TCASI'14 [39]                              | TCASI                         | TCASI'16 [38]                  |            | ISSCC'24 [7]      | 24 [7]                                                      |       | ISS      | ISSCC'24 [8]                          | _                | ISSCC'20 [33]                   | 20 [33]                           |
|---------------------------------------------------|----------------------------|-----------------------------------|---------------------------|--------------------------------------------|-------------------------------|--------------------------------|------------|-------------------|-------------------------------------------------------------|-------|----------|---------------------------------------|------------------|---------------------------------|-----------------------------------|
| Technology (nm)                                   | 28                         | 28 LP                             | Ť                         | 130                                        | 9                             | 65                             |            | 78                | _                                                           |       |          | 28                                    |                  | 65                              | 2                                 |
| Modulation                                        | PΑ                         | PAM4                              | Z                         | NRZ                                        | NRZ                           | Z                              |            | PAM4 / NRZ        | / NRZ                                                       |       | PAI      | PAM4 / NRZ                            | 2                | NRZ                             | 72                                |
| Data Rate (Gb/s/lane)                             |                            | 16                                |                           | 5                                          | 2                             | 25                             | 112        | (PAM4)            | 112 (PAM4), 56 (NRZ)                                        |       | 64 (PAN  | 64 (PAM4), 32 (NRZ)                   | NRZ)             | 4                               |                                   |
| Supply (V)                                        |                            | 1                                 | _                         | 1.2                                        | -                             | 1.2                            |            | N/A               | A                                                           |       |          | N/A                                   |                  | 1.2                             | 2                                 |
| Single/Differential                               | Sin                        | Single                            | Sir                       | Single                                     | Sin                           | Single                         |            | Single            | gle                                                         |       | 0,       | Single                                |                  | Single                          | gle                               |
| XTC Type                                          | Η̈́                        | FIR-XTC                           | Capacitive                | Capacitive Coupling                        | Æ                             | FIR-XTC                        |            | FIR-XTC           | <u>1</u> 2                                                  |       | Merged ( | Merged C-peaking XTC                  | 3 XTC            | FIR-XTC                         | χŢ                                |
| Number of aggressor channels                      |                            | 2                                 |                           | 2                                          |                               |                                |            | _                 |                                                             |       |          | _                                     |                  | 4                               |                                   |
| Pin Efficiency                                    | 20                         | 200%                              | 10                        | 100%                                       | 10                            | 100%                           |            | 200%              | <b>%</b>                                                    |       | 7        | 200%                                  |                  | 100                             | 100%                              |
| Architecture                                      | ×                          | XTC                               | ×                         | XTC                                        | XTC                           | XTC + FFE                      |            | XTC + FFE         | 먪                                                           |       | Ż        | XTC + FFE                             | ļ                | XTC + FFE                       | ᆵ                                 |
| The worst peak-to-                                | Ċ                          | 224                               | 2                         | 4/14                                       | 2                             | V/N                            |            | N N               |                                                             |       |          | \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |                  | Ì                               |                                   |
| amplitude (mV)                                    | 4                          | ţ                                 | Ž                         | <u> </u>                                   | Ž                             | <b>T</b>                       |            | Ž                 | 1                                                           |       |          | Į<br>Ž                                |                  | Ĭ<br>Ž                          | ۲                                 |
| Channel Loss (dB)                                 | 7                          | -1.1                              | Z                         | N/A                                        | ڞؚ                            | -8.9                           |            | 4                 | _                                                           |       |          | <del>-</del>                          |                  | -25.6                           | 9.                                |
| FEXT (dB)                                         | `.                         | -18                               | Z                         | N/A                                        | Z                             | N/A                            |            | ę.                |                                                             |       | ľ        | -15.8                                 |                  | -31.8                           | œ.                                |
| Channel Loss-to-<br>FEXT Ratio (dB)               | 16                         | 16.9                              | Z                         | N/A                                        | Ž                             | N/A                            |            | 2                 |                                                             |       |          | 8.4                                   |                  | 9                               | 6.2                               |
| XTC Timing Control                                | (DCDL                      | Yes<br>(DCDL Control)             | Z                         | N/A                                        | Ž                             | N/A                            |            | N/A               | d d                                                         |       |          | N/A                                   |                  | N/A                             | <                                 |
| XTC Gain Control                                  | Y<br>(Inverti<br>Cor       | Yes<br>(Inverter Bank<br>Control) | Y<br>(Level S<br>Buffer ( | Yes<br>(Level Switching<br>Buffer Control) | Yes<br>(SST Drive<br>Control) | Yes<br>(SST Driver<br>Control) | )<br>(Bi   | Yes<br>as Current | Yes<br>(Bias Current Control)                               |       | (Capacit | Yes<br>(Capacitance Control)          | ntrol)           | Yes<br>(Inverter Ba<br>Control) | Yes<br>(Inverter Bank<br>Control) |
| PRBS                                              | 69                         | 31                                |                           |                                            | -                             |                                |            | N/A               | 4                                                           |       |          | A/A                                   |                  | 7                               |                                   |
| Horizontal Eye                                    | w/o XTC &<br>Delay Control | w/ XTC &<br>Delay Control         | w/o XTC                   | w/ XTC                                     | w/o XTC                       | w/ XTC                         | w/o<br>XTC | M4<br>xTC         | NRZ<br>w/o x                                                | x v x | M/o w/   | w/o<br>XTC                            | NRZ<br>w/<br>XTC | w/o XTC                         | w/<br>XTC+FFE                     |
| Opening (UI)                                      | 0                          | 0.352                             | 0.479                     | 0.569                                      | 0.28⁴                         | 0.444₫                         | 0          | 0.31              | 0.                                                          | 0.42  | 0 0.36   | 6 0.32                                | _                | 0                               | 0.4a                              |
| Vertical Eye<br>Opening (mV)                      | 0                          | 40.3                              | A/N                       | N/A                                        | 85 <sup>d</sup>               | 180⁴                           | 0          | 18ª               | 0 20                                                        | 20.4ª | 96 0     | 100                                   | 180              | 0                               | 26ª                               |
| Inductor-less                                     | lk                         | YES                               | Ιλ                        | YES                                        | YE                            | YES                            |            | NO                |                                                             |       |          | YES                                   |                  | YES                             | S                                 |
| Energy Efficiency (pJ/bit/lane)                   | _                          | 1.6                               | <del>-</del>              | 1.6⁵                                       | 0.87                          | 37                             |            | 1.55              | iδ                                                          |       |          | 1.27                                  |                  | <del>-</del>                    | 1.4                               |
| TX area / Lane<br>(mm²/lane)                      | 0.0                        | 0.0067                            | 0.1                       | 0.1505℃                                    | 0.0                           | 0.0238                         |            | 0.237             | 37                                                          |       | 0        | 0.085                                 |                  | 0.0077                          | 77                                |
| TX area / Data rate (mm²/Gbps)                    | 0.00                       | 0.00042                           | 0.0                       | 0.0301                                     | 0.00                          | 0.00095                        |            | 0.00212           | 212                                                         |       | 0.       | 0.00133                               |                  | 0.00193                         | 193                               |
| <sup>a</sup> Estimated from measured eye diagrams | measured e                 | ye diagrams                       |                           | b Including the RX power                   | RX powe                       |                                | imatec     | from t            | $^{\mbox{\tiny c}}$ Estimated from the chip microphotograph | micro | shotogr  | abh                                   | d Dat            | d Data rate is 20 Gb/s          | 0 Gb/s                            |

#### V. Conclusion

In conclusion, this thesis presents comprehensive design techniques for developing compact and energy-efficient inverter-based high-speed TXs tailored for advanced memory interfaces. The research encompasses two pivotal components: the introduction of an addition-only feed-forward equalizing (A-FFE) TX architecture, and the development of a four-level pulse-amplitude modulation (PAM4) TX featuring inverter-based crosstalk compensation (XTC) taps.

The A-FFE TX architecture eliminates subtractions between FFE taps, enabling the use of inverter drivers without the need for termination resistors. This approach significantly reduces both the area and power consumption of the TX while enhancing robustness to coefficient quantization errors, as error signals are suppressed by channel loss. A prototype of the inverter-based 4-tap A-FFE TX was implemented in 28-nm bulk CMOS technology, achieving a data rate of 20 Gb/s per pin. The TX occupies a compact area of 1149  $\mu$ m² and demonstrates an energy efficiency of 1.18 pJ/bit with an eye sensitivity of 0.68. Furthermore, it exhibits low power consumption when the data transition probability is low, consuming 90 % less power when the transition probability drops from 25 % to 0 %.

The PAM4 TX with inverter-based XTC taps addresses the challenges of crosstalk in short-reach memory interfaces. Efficient encoders and transition detectors identify crosstalk-inducing patterns, enabling precise control of the XTC taps. The inclusion of delay adjustment circuits for PAM4 far-end crosstalk (FEXT) compensation is a novel contribution, mitigating errors caused by delay mismatches between victim and aggressor channels. Implemented in 28 nm LP CMOS technology, the TX operates at 16 Gb/s and, with XTC enabled, achieves significant performance improvements: eye height and width increase by 203 % and 396 %, respectively. The design compensates for a peak-to-peak PAM4 FEXT signal of 234 mV, the largest reported among com-

parable works. The TX's compact inverter-based XTC design occupies only 0.0067 mm<sup>2</sup>, achieving an area efficiency of 0.00042mm<sup>2</sup>/Gbps.

Collectively, these innovations demonstrate the efficacy of inverter-based design approaches in achieving high-performance, area-efficient, and energy-efficient TXs for high-speed memory interfaces. By eliminating termination resistors and carefully managing crosstalk through gain and delay control, the proposed TXs offer robust signal integrity in a compact form factor. This is particularly advantageous for future advanced memory packaging applications where silicon area for I/O is a constrained resource. The techniques presented herein provide a pathway to achieving higher data rates within limited silicon areas, contributing to the advancement of high-speed memory interface technology.

# 요약문

본 학위 논문은 메모리 인터페이스를 위한 컴팩트하고 에너지 효율적인 인버터기반 고속 송신기의 설계 기법을 제안한다. 본 연구는 두 가지 주요 구성요소로나뉜다: 덧셈만을 사용하는 새로운 피드포워드 이퀄라이징 (A-FFE) 송신기 (TX) 구조에 대한 연구와 인버터 기반 크로스토크 보상 (XTC) 탭을 이용한 XTC가 구현된 4레벨 펄스 진폭 변조 (PAM4) 송신기의 개발이다.

첫 번째 구성요소는 컴팩트하고 전력 효율적인 단일 종단 인터페이스를 위해 설계된 인버터 기반 4탭 A-FFE TX를 제안하였다. 기존의 FFE (C-FFE) TX는 일반적으로 소스 직렬 종단 (SST) 드라이버를 사용한다. 하지만 이러한 선형 저항은 상당한 면적을 차지하고 큰 기생 커패시턴스를 발생시켜 전력 효율과 출력 대역폭을 저하시킨다. 이러한 문제를 해결하기 위해, FFE 탭 간의 감산을 제거하고 계수양자화 오류에 대한 견고성을 향상시키는 A-FFE 구조를 개발하였다. 이러한 개선으로 FFE에서 면적과 전력 효율적인 인버터 드라이버의 구현이 가능해졌다. 28-nm LP CMOS 공정으로 제작된 프로토타입은 15 dB PCB 트레이스에서 20 Gb/s/pin의데이터 전송률, 55.1 mV의 아이 높이, 0.44 UI의 아이 폭을 달성하면서 1.18 pJ/b의전력 효율과 68 %의 최악 아이 감도를 유지하였다. 특히, 가장 민감한 FFE 계수를 20 % 감소시켰을 때 아이 개방이 단지 13.6 % 감소하는데 그쳤다. 인버터 드라이버를 사용한 저항이 없는 설계로 단 1149 µm²의 컴팩트한 레이아웃을 달성했다.

두 번째 구성요소는 단거리 메모리 인터페이스를 위해 설계된 XTC가 포함된 PAM4 TX를 제안하였다. 이 설계는 크로스토크를 유발하는 패턴을 인식하고 그에

Collection @ poste

따라 인버터 기반 XTC 탭을 제어하는 효율적인 인코더와 전이 감지기를 포함한다. XTC 시스템의 정밀한 이득 및 지연 제어를 통해 피해 채널과 가해 채널 간의 불일 치로 인한 보상 오류를 최소화하였다. 28 nm LP CMOS 공정으로 구현된 TX는 16 Gb/s에서 동작하며, XTC 활성화 시 아이 높이와 폭이 각각 203 %와 396 % 증가하는 등 상당한 성능 향상을 보여주었다. 제안하는 인버터 기반 XTC 탭을 사용한면적 효율적인 설계는 단지 0.0067 mm²의 면적만을 차지하며, 데이터 전송률 대비면적이 0.00042 mm²/Gbps에 불과하다.

#### References

- [1] T. M. Hollis et al. "Recent evolution in the dram interface: Milemarkers along memory lane,". *IEEE Solid-State Circuits Mag*, 11(2):14–30, Spring 2019.
- [2] S. Lee et al. "A 78.8 fj/b/mm 12.0 Gb/sb/s/wire capacitively driven on-chip link over 5.6 mm with an ffe-combined ground-forcing biasing technique for dram global bus line in 65 nm CMOS,". *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 65:454–455, Feb. 2022.
- [3] C. Moon et al. "A 20 Gb/s/pin 1.18 pj/b 1149 µm<sup>2</sup> single-ended inverter-based 4-tap addition-only feed-forward equalization transmitter with improved robustness to coefficient errors in 28 nm CMOS,". *IEEE Int. Solid-State Circuits Conf.* (*ISSCC*) *Dig. Tech. Papers*, pages 450–451, Feb. 2022.
- [4] C. Moon et al. "A single-ended inverter-based addition-only feed-forward equalization transmitter,". *IEEE J. Solid-State Circuits*, 59(11):3741–3751, Nov. 2024.
- [5] M. Lee et al. "A compact single-ended inverter-based transceiver with swing improvement for short-reach links,". *IEEE Trans. Circuits Syst. I, Reg. Papers*, 69(9):3679–3688, Sep. 2022.
- [6] C. Moon et al. "3 × 16 Gb/s compact single-ended PAM4 transmitters with inverter-based crosstalk compensation for memory interfaces,". *IEEE Trans. Circuits Syst. II, Exp. Briefs*, 71(12):4884–4888, Dec. 2024.
- [7] L. Zhong et al. "A 112 Gb/s/pin single-ended crosstalk-cancellation transceiver with 31dB loss compensation in 28 nm CMOS,". *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pages 134–135, Feb. 2024.

- [8] W. Wu et al. "A 64 Gb/s/pin PAM4 single-ended transmitter with a merged pre-emphasis capacitive-peaking crosstalk-cancellation scheme for memory interfaces in 28 nm CMOS,". *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pages 450–451, Feb. 2024.
- [9] H. Johnson and M. Graham. "High-speed Signal Propagation: Advanced Black Magic,". NJ: Prentice Hall, 2003.
- [10] M. Choi et al. "An approximate closed-form channel model for diverse interconnect applications,". *IEEE Trans. Circuits Syst. I, Reg. Papers*, 61(10):3034–3043, Oct. 2014.
- [11] E. Sayre. "Understanding skin effect and dielectric loss material effects on digital interconnects,". *in Proc. DesignCon*, Santa Clara, CA, USA, 2018.
- [12] A. Deutsch. "Electrical characteristics of interconnections for high-performance systems,". *Proc. IEEE*, 86(2):315–355, Feb. 1998.
- [13] S. Ramo and J. R. Whinnery. "Fields and waves in modern radio,". *New York: John Wiley & Sons*, 1944.
- [14] M. Choi et al. "Analytical formulas for tradeoff among channel loss, length, and frequency of rc-and lc-dominant single-ended interconnects for fast equalized link tradeoff estimation,". *IEEE Trans. Compon., Packag. Manuf. Technol*, 5(10):1497–1506, Oct. 2015.
- [15] J. Kim et al. "A 224Gb/s dac-based pam-4 transmitter with 8-tap ffe in 10nm cmos,". IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pages 126–127, Feb. 2021.
- [16] Y. Segal et al. "A 1.41pj/b 224Gb/s pam-4 serdes receiver with 31db loss compensation,". *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pages 114–115, Feb. 2022.

- [17] Y. Wei et al. "Nvlink-c2c: A coherent off package chip-to-chip interconnect with 40 gbps/pin single-ended signaling,". *IEEE Int. Solid-State Circuits Conf.* (*ISSCC*) *Dig. Tech. Papers*, pages 160–161, Feb. 2023.
- [18] J. Wilson et al. "A 1.17pj/b 25 Gb/s/pin ground-referenced single-ended serial link for off- and on-package communication in 16nm cmos using a process- and temperature-adaptive voltage regulator,". *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, pages 276–277, Feb. 2018.
- [19] J. Poulton et al. "A 0.54pj/b 20 Gb/s ground-referenced single-ended short-haul serial link in 28nm cmos for advanced packaging applications,". *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pages 404–405, Feb. 2013.
- [20] M. Choi et al. "An FFE TX with 3.8x eye improvement by automatic impedance adaptation for universal compatibility with arbitrary channel and rx impedances,". *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, pages 58–59, Jun. 2017.
- [21] M. Choi et al. "An FFE transmitter which automatically and adaptively relaxes impedance matching,". *IEEE J. Solid-State Circuits*, 53(6):1780–1792, Feb. 2018.
- [22] S. Han et al. "GUI-enhanced layout generation of ffe sst txs for fast high-speed serial link design,". *IEEE ACM/IEEE Design Automation Conference (DAC)*, July. 2020.
- [23] B. Kim et al. "An energy-efficient equalized transceiver for RC-dominant channels,". *IEEE J. Solid-State Circuits*, 45(6):1186–1197, Jun. 2010.
- [24] B. Kim et al. "A 4Gb/s/ch 356fj/b 10mm equalized on-chip interconnect with nonlinear charge-injecting transmitter filter and transimpedance receiver in 90nm CMOS technology,". *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pages 66–67, Feb. 2009.

- [25] B. Kim et al. "A 10-Gb/s compact low-power serial I/O with DFE-IIR equalization in 65-nm CMOS,". IEEE J. Solid-State Circuits, 44(12):3526–3538, Jun. 2010.
- [26] S. Han et al. "A coefficient-error-robust FFE TX with 230 *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pages 50–51, Feb. 2014.
- [27] S. Han et al. "A coefficient-error-robust feed-forward equalizing transmitter for eye-variation and power improvement,". *IEEE J. Solid-State Circuits*, 51(8):1902–1914, Aug. 2016.
- [28] S.-M. Lee et al. "An 8 nm 18 Gb/s/pin GDDR6 PHY with TX bandwidth extension and RX training technique,". *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pages 338–339, Feb. 2020.
- [29] C. Menolfi et al. "A 16gb/s source-series terminated transmitter in 65nm cmos soi,". *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pages 446–447, Feb. 2007.
- [30] J. Seo et al. "A 20-uppercaseGb/s/pin 0.0024-mm2 single-ended uppercaseDECS TRX with uppercaseCDR-less self-slicing/auto-deserialization to improve tolerance on duty cycle error and uppercaseRX supply noise for uppercaseDCC/uppercaseCDR-less short-reach memory interfaces,". *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pages 456–457, Feb. 2022.
- [31] J. Seo et al. "A 20-Gb/s/pin compact single-ended DCC-less DECS transceiver with CDR-less RX front-end for on-chip links,". *IEEE J. Solid-State Circuits*, 58(11):3253–3265, Nov. 2023.
- [32] Behzad Dehlaghi and Anthony Chan Carusone. "A 0.3 pj/bit 20 Gb/s/wire parallel interface for die-to-die communication,". *IEEE J. Solid-State Circuits*, 51(11):2690–2701, Nov. 2016.

- [33] H.-G. Ko et al. "An 8 Gb/s/um FFE-combined crosstalk-cancellation scheme for HBM on silicon interposer with 3D-staggered channels,". *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pages 128–129, Feb. 2020.
- [34] M. Lee et al. "A 10-GHz multi-purpose reconfigurable built-in self-test circuit for high-speed links,". *IEEE Asian Solid-State Circuits Dig. Tech. Papers*, pages 76–77, Nov. 2017.
- [35] J. Jin et al. "A 4nm 16 Gb/s/pin single-ended PAM4 parallel transceiver with switching-jitter compensation and transmitter optimization,". *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, pages 404–405, Feb. 2023.
- [36] Y.-U. Jeong et al. "Single-ended receiver-side crosstalk cancellation with independent gain and timing control for minimum residual FEXT,". *IEEE Trans. Circuits Syst. I: Reg. Papers*, 70(12):4793–4803, Dec. 2023.
- [37] C. Aprile et al. "An eight-lane 7-Gb/s/pin source synchronous single-ended RX with equalization and far-end crosstalk cancellation for backplane channels,". *IEEE J. Solid-State Circuits*, 53(3):861–872, Mar. 2018.
- [38] S. Yuan et al. "A 70 mW 25 Gb/s quarter-rate serdes transmitter and receiver chipset with 40 dB of equalization in 65 nm CMOS technology,". *IEEE Trans. Circuits Syst. I: Reg. Papers*, 63(7):939–949, Jul. 2016.
- [39] Kyu-Dong Hwang and Lee-Sup Kim. "A 5 Gbps 1.6 mW/Gbps/CH adaptive crosstalk cancellation scheme with reference-less digital calibration and switched termination resistors for single-ended parallel interface,". *IEEE Trans. Circuits Syst. I: Reg. Papers*, 61(10):3016–3024, Oct. 2014.

# Acknowledgements

박사학위논문 감사의 글을 쓰기까지 통합 6년이라는 시간이 걸렸습니다. 감사의 글을 한 글자 한 글자 적으니 감회가 새롭습니다. 박사과정을 하면서 많은 도움을 받고 힘이 되어주셨던 여러 분들께 감사 인사를 전하고 싶습니다. 페이지에 모든 분을 담을 수는 없지만, 적지 못한 부분은 개인적으로 감사의 말씀을 드리겠습니다.

먼저, 때로는 엄하고 때로는 부드러운 모습으로 지도해주신 지도교수님이신 김병섭 교수님께 감사드립니다. 연구하는 자세와 태도부터 논문을 어떻게 잘 쓸 수 있는지까지, 교수님의 섬세한 지도와 열정 덕분에 많이 성장할 수 있었습니다. 졸업후에도 교수님의 말씀과 조언을 잊지 않고 발전할 수 있는 사람이 되도록 노력하겠습니다. 다음으로, 저의 박사학위 논문심사위원을 흔쾌히 맡아주신 박홍준 교수님, 심재윤 교수님, 송호진 교수님께 감사드립니다. 박홍준 교수님과 심재윤 교수님은 여러 수업을 통해 지도를 받을 수 있어서 좋았습니다. 송호진 교수님은 같은 연구 분야는 아니지만 흔쾌히 수락해주시고, 여러 통찰력과 식견으로 제 박사 논문을 심사해주셔서 감사합니다. 마지막으로 저의 연구실 선배이자 친한 형이자 외부심사위원을 맡아 준 서재영 박사님께 감사드립니다. 교수님들과 박사님의 조언 덕분에본 학위논문과 더불어 연구 시야를 더욱 넓힐 수 있었으며, 앞으로 연구자로서 제몫을 다해 나갈 수 있도록 성실히 정진하겠습니다.

대학원 생활 동안 함께하며 많은 도움을 준 우리 BEVIL 연구실 구성원분들께 진심으로 감사드립니다. 제가 처음 연구실에 들어왔을 때 랩장이었던 수은 누나, 언제나 반갑게 맞이해주시고 많은 질문을 잘 알려주셔서 감사합니다. 묵묵히 자기 할일 열심히 하신 승호 형, 연구할 때 많은 조언을 주셔서 감사합니다. 명국이 형, 제가처음 칩을 낼 때나 연구나 실험할 때 모르는 점들을 형에게 많이 물어보고 의지했습니다. 항상 감사합니다. 같은 High-Speed I/O 분야라 형과 함께 일할 기회가 있길바랍니다. 든든한 형이자 친구 같은 재영이 형께 감사드립니다. 형과 술도 많이 마시고 연구와 여러 시시콜콜한 이야기들을 나누면서 대학원 생활이 엄청 빨리 지나간 것 같습니다. 논문심사위원을 부탁했을 때 고민도 하지 않고 수원에서 바로 포항으

Collection @ posted

로 내려와 주셔서 감사합니다. 학위논문에 있는 형 사인이 먹칠이 되지 않게 멋진 엔지니어가 되겠습니다. 많은 분야에서 열심히 연구하는 (고)재현 형, 함께 첫 칩을 낼 때 많이 배우고 즐거웠습니다. Tape-out하고 제주도 간 게 엊그제 같은데 시간이 진짜 빠르네요. 요즘 함께 운동도 하고 여러 이야기를 나눌 수 있어서 즐겁습니다. 형은 어디서든 잘할 거라고 믿습니다. 연구실에서 가장 창의적이고 스마트하다고 생각하는 순규 형, 지금 하고 있는 연구 잘 마무리해서 형 꿈을 펼치길 바랄게. 형 이 창업하면 나중에 합류할게. 항상 응원할게, 순규 형. 묵묵히 열심히 일하는 익수 형, 이번 칩 잘 마무리하고 좋은 성과가 있기를 바랄게요. 같은 연구 분야를 하면서 많은 이야기를 나눌 수 있어서 좋았습니다. 제주도, 하와이 학회 갔던 것이 아직도 생생해요. 함께한 좋은 추억들이 많아서 기쁩니다. 항상 호탕한 웃음을 짓는 찬형이 형, 우리가 함께 훈련소 간 게 얼마 안 된 것 같은데 벌써 2025년이네. 형의 연구가 잘 마무리되기를 바랄게. 다방면에서 다재다능한 민수 형, 형의 연구도 좋은 결실을 맺기를 바랄게. 6분반 창윤아, 분반 후배와 같은 연구실에서 오랜 시간 함께해서 좋 았어. 연구실에서 많은 일들을 겪었지만 네가 원하는 바를 이룰 거라 생각해. 열심히 연구하고 일하는 준웅아, 네가 하고 있는 연구도 잘 마무리되길 바란다. 동갑 친구라 연구실에서 더 즐거웠던 것 같아. 아스날이 1위하기를 응원한다. 긍정적인 효석이 와 연구에 대한 열의가 넘치는 원준이도 원하는 바를 다 이루기를 바랄게. 우리 I/O 팀인 지윤아 (정)호준아, 너희는 정말 열심히 하고 열정이 있으니 좋은 연구 성과를 이룰 거라 생각해. 나도 1년 포닥하는 동안 함께 열심히 연구해 보자. 창훈이, (안) 지훈이, Anik도 짧은 시간이었지만, 좋은 인연을 맺고 함께 연구실 생활할 수 있어 서 좋았어. 함께 디펜스한 예준이와 준범이도 수고 많았어. 사회에서도 연구실에서 보여준 열정으로 임하면 잘할 수 있을 거라 생각해. 내 동기들인 선규 형, (박)지훈아 고마워. 연구실 적응하는 데 동기들이 없었다면 이렇게 대학원 생활을 잘할 수 없었 을 거야. 동기들과 술도 많이 마시고, 으쌰으쌰했던 게 얼마 안 된 것 같다. 서울에서 다 같이 한번 보자. 졸업한 재익이 형, 재우 형, 성민아 연구실에서 고생 많았고 위에 올라가서 만나자.

박사과정 동안 연구실 밖에서 많은 도움을 주신 분들께도 감사의 말씀을 전합니다. 스터즈의 회장으로서 든든하게 중심을 잡아주시는 맏형 영재 형, 항상 긍정적이고 웃음이 많으며 함께 운동하는 (박)재현 형, 늦게까지 연구실에서 함께하고 운동

Collection @ posted

메이트인 연태 형, 그리고 재미있는 이야기를 들려주시는 현서 형, 대학원 생활 동안 즐거움과 행복을 준 스터즈 형들께 감사드립니다. 동기인 (김)호준이 형, 우리가 함께 대학원에 입학하고 같이 졸업하게 되었네. 같은 분야를 연구하며 많은 이야기를 나눌 수 있어서 좋았어. 연구실 밖에서도 연락 많이 하자. 중학교, 고등학교, 대학교, 대학원까지 함께한 태엽이와 같은 학원, 고등학교, 대학교, 대학원까지 함께한 민석아, 정말 오래 함께했다. 항상 고마워. 무뚝뚝해서 말로는 잘 표현하지 못하지만, 함께하며 많은 추억을 쌓을 수 있어서 좋았어. 고민이 있을 때마다 함께 나누고, 종종 밥도 먹고 커피도 마시고 코노도 가는 게 즐거웠어. 사회에 나가서도 이렇게 많이 함께하자. 6분반 다솔이 형, 찬이 형, 원종이 형, 승훈이 형,형우 형, 성웅이 형, 인수야 고마워. 학부 때 이렇게 좋은 형들과 친구를 사귈 수 있어서 운이 좋았던 것같아. 모두 졸업 마무리 잘하고 사회에 나가서도 연락 많이 하자. 항상 응원할게.

무엇보다도 오늘날 제가 이 자리에 있기까지 많은 헌신과 사랑으로 지지해 주신 가족들께 감사의 말씀을 드립니다. 대학원 생활을 하면서 어머니의 사랑과 응원이 없었다면 저는 학위 마무리를 잘하지 못했을 것입니다. 어머니가 계셨기에 오늘 날의 제가 있고, 항상 묵묵히 뒤에서 지지해 주시고 사랑해 주시고 응원해 주셔서 감사합니다. 감정 표현을 잘 못하는 무뚝뚝한 아들이지만, 이 학위 논문을 통해 이렇게 표현하고 싶습니다. 어머니, 정말 사랑합니다. 앞으로 효도 더 많이 하겠 습니다. 내가 대학원 생활한다고 나보다 가족에 더 관심을 가져준 내 동생 창하야. 말로는 표현하지 못했지만 항상 고맙게 생각하고 있어. 먼저 회사 생활과 사회생활 을 하면서도, 내가 형이지만 정말 많이 배울 점이 많은 것 같아. 앞으로 힘든 일이 있으면 나에게 말하고 함께 이겨내 보자. 우리 형제간의 우애가 영원하기를 바란다. 창하야, 원하는 곳으로 이직한 것 축하해. 이직에 그치지 말고 네 꿈에 더 가까운 길로 도전했으면 좋겠어. 나도 함께 도전할게. 우리 막내 동생 창환아, 고3이라 너무 힘들었지? 정말 수고 많았어. 내가 육지에 있어서 많이 얼굴 보지 못하고 큰 형 노릇 제대로 못해서 항상 미안하게 생각하고 있어. 걱정거리가 있으면 언제든 형에게 말 하고, 함께 해결해 보자. 제가 박사학위를 받을 수 있었던 영광을 우리 가족들에게 돌리고 싶습니다. 감사합니다. 사랑합니다.



## Curriculum Vitae

Name : Changjae Moon

#### **Education**

- 2019. 3. 2025. 2. Department of Electrical Engineering, Pohang University of Science and Technology (Ph.D.)
- 2014. 3. 2018. 8. Department of Electrical Engineering, Pohang University of Science and Technology (B.S.)

## Experience

- 2024. 10. 2024. 11. Technical Research Personnel, Alternative military service
- 2022. 02. 2022. 02. Presented at 2022 IEEE International Solid-State Circuits Conference (ISSCC)

