# A Single-Ended Inverter-Based Addition-Only Feed-Forward Equalization Transmitter

Changjae Moon<sup>®</sup>, Graduate Student Member, IEEE, Jaeyoung Seo, Myungguk Lee<sup>®</sup>, Graduate Student Member, IEEE, Iksu Jang<sup>®</sup>, Graduate Student Member, IEEE, and Byungsub Kim<sup>®</sup>, Senior Member, IEEE

Abstract—This article presents an inverter-based four-tap addition-only feed-forward equalizing (A-FFE) transmitter (TX) for compact and power-efficient single-ended interfaces. Sourceseries terminated (SST) drivers are widely used in conventional FFE (C-FFE) TXs. However, linear resistors in C-FFE SST TXs occupy too much area and add significant parasitic capacitance, degrading power efficiency, and output bandwidth. To overcome these problems, we propose a new feed-forward equalization (FFE) architecture dubbed A-FFE that completely eliminates subtractions between FFE taps and improves the robustness to quantization errors of coefficients. These advantages of the proposed architecture allow to utilize area-and-power-efficient inverter drivers in FFE. An inverter-based four-tap A-FFE TX was designed and fabricated in a 28-nm CMOS process. The TX achieved a data rate of 20 Gb/s/pin, an eye height of 55.1 mV, and an eye width of 0.44 UI with a 15-dB PCB trace, while consuming 1.18 pJ/b and achieving the worst eye sensitivity of 68%. The eye-opening was decreased only by 13.6%, when the most sensitive FFE coefficient was reduced by 20%. Because it uses area-efficient inverter drivers without resistors, the TX occupies only 1149  $\mu$ m<sup>2</sup>.

Index Terms—Addition-only feed-forward equalization (A-FFE), eye sensitivity, high-speed interface, relaxed impedance matching, robustness against quantization errors, single-ended signaling.

Manuscript received 4 December 2023; revised 13 March 2024; accepted 28 April 2024. Date of publication 14 May 2024; date of current version 24 October 2024. This article was approved by Associate Editor Sam Palermo. This work was supported in part by the Institute of Information and Communications Technology Planning and Evaluation Grant funded by the Korean Government under Grant 2022-0-01171; in part by the Next-Generation Intelligence Semiconductor Foundation under Grant RS-2023-00258227; in part by the National Research and Development Program through the NRF of Korea funded by the Ministry of Science and ICT under Grant 2020M3H2A107804514; in part by the Department of Electrical Engineering, POSTECH, through the BK21 FOUR Project of NRF; in part by the Design and Application of Next Generation Non-Volatile Memory Hierarchy Cluster Academia Collaboration Program funded by Samsung Electronics; and in part by Samsung Electronics Company Ltd. under Grant IO201211-08055-01. (Corresponding author: Byungsub Kim.)

Changiae Moon, Myungguk Lee, and Iksu Jang are with the Department of Electrical Engineering, Pohang University of Science and Technology, Pohang 37673, South Korea (e-mail: moonchangiae@postech.ac.kr).

Jaeyoung Seo is with Samsung Electronics, Hwaseong 18448, South Korea. Byungsub Kim is with the Department of Electrical Engineering, the Department of Convergence IT Engineering, the Department of Semiconductor Engineering, and the Graduate School of Artificial Intelligence, Pohang University of Science and Technology, Pohang 37673, South Korea, and also with the Institute for Convergence Research and Education in Advanced Technology, Yonsei University, Seoul 03722, South Korea (e-mail: byungsub@postech.ac.kr).

Color versions of one or more figures in this article are available at https://doi.org/10.1109/JSSC.2024.3397046.

Digital Object Identifier 10.1109/JSSC.2024.3397046

#### I. Introduction

THE demand for area-and-power-efficient input/output (I/O) circuits has consistently increased in single-ended high-speed interfaces like graphic double data rate (GDDR) [1]. A feed-forward equalization (FFE) transmitter (TX) is a key circuit to overcome the bandwidth limitations in such applications [2], [3]. As the design constraints on area and power consumption are becoming more stringent, the efficiency of FFE drivers must be improved.

Source-series terminated (SST) drivers are commonly adopted in conventional FFE (C-FFE) TXs. A differential SST driver theoretically consumes only a quarter-power of a differential current-mode logic (CML) driver for the same output swing because termination resistors are inserted in series rather than parallel [11]. While an SST driver offers good linearity and impedance matching, the termination resistor occupies too much area and adds significant parasitic capacitance, dissipating additional power and degrading the output bandwidth, as shown in Fig. 1(a).

On the other hand, an inverter driver [6] has a small area, good power efficiency, and a large output voltage swing because it does not have a series resistor for termination as in Fig. 1(b). An inverter driver may suffer from a signal integrity problem because the driver's impedance is not necessarily 50  $\Omega$ , and it changes with the output voltage level [6]. This problem can be easily solved by the relaxed impedance matching [3], [4], [5], [6], [7], [8]. Utilizing only the receiver-side termination of 50  $\Omega$  improves the voltage swing and driver area at the cost of a negligible penalty in signal integrity [6].

Fig. 2 shows the schematic of an SST driver and an inverter driver. A single-ended SST driver with matched termination on both RX and TX consumes  $(VDD^2)/(4Z_0)$ , and the output swing amplitude of the TX is VDD/2, where  $Z_0$  is the characteristic impedance of the channel. However, a single-ended inverter driver with matched RX termination and without matched TX termination consumes  $(VDD^2)/[2(R_{TX} + Z_0)]$ , and the output swing amplitude of the TX is  $(VDD \cdot Z_0)/(R_{TX} + Z_0)$ , where  $R_{TX}$  is the impedance of the TX [6]. Therefore, the inverter output driver, whose impedance is lower than  $Z_0$ , has a larger output swing at the cost of more power than the conventional SST driver [6].

However, the prior single-ended inverter-based TX [6] does not contain an FFE because two major disadvantages of the inverter-based C-FFE TX [Fig. 1(b)] have not been solved: 1) the inverter-based C-FFE is very sensitive to quantization

0018-9200 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.



Fig. 1. Comparison of four-tap TX FFE design options. (a) SST-based C-FFE, (b) inverter-based C-FFE, and (c) proposed inverter-based A-FFE.



Fig. 2. Schematic of (a) SST driver and (b) inverter driver. The output swing amplitudes and average power consumption of the drivers are shown.

errors of FFE coefficients and 2) the inverter-based C-FFE TX consumes large power in FFE tap subtraction.

First, the inverter-based C-FFE TX is very sensitive to quantization errors of tap coefficients. Fig. 3 shows 20-Gb/s eye diagrams simulated with a 20-dB loss channel. Fig. 3(a) and (b) shows the eye diagrams of an inverter-based four-tap C-FFE TX without and with a 20% error in the size of the tap driver of the most sensitive FFE coefficient. The eye height is almost 100 mV without the quantization error [Fig. 3(a)]. However, when the most sensitive tap coefficient (the main tap) is reduced by 20%, the eye height decreases by 76% [Fig. 3(b)], showing that the inverter-based C-FFE TX is terribly vulnerable to quantization errors of tap coefficients.

Moreover, the output of the inverter-based C-FFE TX cannot be accurately controlled due to the tap subtraction. Because of the non-linear characteristics of inverter drivers, tap coefficients and the output voltage are nonlinearly affected by errors of tap coefficients. Fig. 4 shows output voltage histograms when the input data pattern ( $D_{\rm pre}$ ,  $D_{\rm main}$ ,  $D_{\rm post1}$ ,  $D_{\rm post2}$ ) is (-1, -1, -1, -1). The histograms were acquired by Monte Carlo simulation with 1000 samples. The output voltage and the  $3\sigma$  output voltage variation of the C-FFE TX are 454.2 and 61.77 mV, respectively [Fig. 4(a)]. Because both PMOS and NMOS transistors of inverter-based drivers are turned on, the C-FFE TX's output voltage is sensitive to



Fig. 3. 20-Gb/s eye diagrams of an inverter-based four-tap C-FFE TX (a) without and (b) with a 20% error on the most sensitive tap coefficient (the main cursor). The 20-Gb/s eye diagrams of the inverter-based four-tap A-FFE TX (c) without and (d) with a 20% error on the most sensitive tap coefficient (1st post-cursor).



Fig. 4. Output voltage histograms of (a) inverter-based four-tap C-FFE TX and (b) inverter-based four-tap A-FFE TX, when the input data pattern ( $D_{\text{pre}}$ ,  $D_{\text{main}}$ ,  $D_{\text{post1}}$ ,  $D_{\text{post2}}$ ) is (-1, -1, -1, -1).



Fig. 5. Current flows of (a) inverter-based C-FFE TX and (b) corresponding proposed inverter-based A-FFE TX for the same FFE operation. The C-FFE TX is subtracting FFE taps. The average power consumptions of the drivers are also shown.

changes in the characteristics of PMOS and NMOS transistors due to the nonlinearity of the inverter-based drivers. Therefore, the output voltage cannot be controlled accurately.

Second, the inverter-based C-FFE TX consumes large power due to FFE tap subtraction. For example, when pull-up PMOSs and pull-down NMOSs turn on simultaneously during FFE tap subtraction [Fig. 5(a)], the current flowing from the power



Fig. 6. (a) Example design of the four-tap B-FFE architecture. (b) Single-bit response and tap driver outputs of the four-tap B-FFE example.

supply to the ground is wasted as it does not contribute to pull-up or pull-down of the output signal. Consequently, unnecessary driver power is consumed.

To address the problems above, Han et al. [9], [10] proposed a coefficient-error-robust FFE (B-FFE). B-FFE employs a transition detection filter to improve robustness against FFE coefficient errors [Fig. 6(a)] and also to reduce the power consumption due to tap subtraction [Fig. 6(b)]. However, the prior B-FFE work has three problems: 1) it still has subtractions between its taps; 2) the sum of the magnitudes of the B-FFE's coefficients is bigger than the C-FFE, and thus, the tap driver sizes are unnecessarily larger than the ideal design; and 3) B-FFE was not demonstrated with a voltage-mode driver, which is appropriate for single-ended design. Instead, the prior B-FFE was designed only with a CML-type current-mode driver. Therefore, additional improvements are required for single-ended inverter-based FFE TXs.

In this article, we propose a new FFE [3] architecture [Fig. 1(c)] that can solve the aforementioned problems of the inverter-based FFE. For convenience, the proposed FFE will be referred to as addition-only FFE (A-FFE). If there is no quantization error of FFE coefficients, we can always find an A-FFE architecture in which the input/output response is mathematically identical to the ones of any C-FFE or B-FFE. However, A-FFE differs from the other two FFEs in that its output can be produced only by the addition of taps in most practical applications. With the quantization errors of FFE coefficients, error signals caused by FFE coefficient errors are also effectively suppressed by the channel loss as in B-FFE [9], [10]. Due to this merit of the A-FFE, the eye height of an inverter-based A-FFE TX does not seriously decrease with the quantization errors of FFE coefficients [Fig. 3(d)]. Without the quantization errors, its eye height is almost the same as the one of a C-FFE TX [Fig. 3(c)]. When the most sensitive coefficient (1st post-cursor) has an error of -20%, the eye height decreases by only 14% [Fig. 3(d)]. Also, the A-FFE TX is more robust to tap coefficient errors due to process variation and mismatch than the C-FFE TX. The output voltages of the C-FFE TX and A-FFE TX have nearly the same average values of 454.2 and 450.7 mV, when the input data pattern

TABLE I A-FFE Sub-Filter Outputs

| x[n-k] $x[n-m]$ | Difference filter output (b[n-k]) | Average filter output (b[n-k]) |
|-----------------|-----------------------------------|--------------------------------|
| -1 -1           | 0                                 | -1                             |
| -1 +1           | +1                                | 0                              |
| +1 -1           | -1                                | 0                              |
| +1 +1           | 0                                 | +1                             |

 $(D_{\text{pre}}, D_{\text{main}}, D_{\text{post1}}, D_{\text{post2}})$  is (-1, -1, -1, -1), respectively. However, the  $3\sigma$  output voltage variation of the A-FFE TX is 3.8 times less than that of the C-FFE TX [Fig. 4(b)] because either the PMOS or NMOS transistors of the A-FFE TX are turned on.

In addition, the A-FFE has no subtraction between FFE taps and thus consumes less power. For example, when a pseudorandom binary sequence-31 (PRBS-31) data pattern is transmitted, the proposed FFE driver consumes only about 30% of the average power of the conventional driver (Fig. 5).

The rest of this article is organized as follows. Section II mathematically explains the A-FFE architecture. Section III theoretically analyzes the robustness to quantization errors of coefficients compared with the C-FFE. Section IV describes the circuit design of the proposed inverter-based A-FFE TX. Section V shows the experimental results and comparison with the prior arts. Section VI provides the conclusion.

#### II. ARCHITECTURE

Fig. 7 illustrates the architectures of an N-tap C-FFE TX and the corresponding N-tap A-FFE TX. The A-FFE is composed of a shift register consisting of N delay units (D), an adder, and N-1 simple digital sub-filters, whereas the corresponding C-FFE does not have sub-filters. In Fig. 7,  $x_{[n]}$ is the digital data input of which value is either "1" or "-1," representing a binary number of "1" or "0," respectively.  $v_{[n]}$ is the output of the FFE TX. k is the tap position index of the shift resistor and also corresponds to the elapsed delay time from the input  $x_{[n]}$  to the (k+1)th data tap  $x_{[n-k]}$  of the A-FFE. For convenience, we will use "m" as the tap position index of the main tap;  $x_{[n-m]}$  is the main data tap in Fig. 7. To design the A-FFE to produce the same output of the C-FFE, they must have the same main tap position. In the A-FFE, a sub-filter is assigned to every pair of the main data tap  $x_{[n-m]}$ and non-main data tap  $x_{[n-k]}$ , where  $k \neq m$ . It takes these two inputs and produces one output  $b_{[n-k]}$ . The A-FFE output is the weighted sum of all sub-filter output taps  $(b_{[n-k]})$  where  $k = 0, \ldots, N - 1$ .

There are two types of digital sub-filters [Fig. 7(b)]: a difference filter (red) and an average filter (blue). The output of a difference filter is the difference between its two inputs divided by 2. The output of the average filter is the average of its two inputs. Table I presents the mapping between inputs  $(x_{[n-k]} \text{ and } x_{[n-m]})$  and outputs of the A-FFE sub-filters. To make the outputs of the A-FFE and the C-FFE identical, a difference filter must be used for the main data tap  $x_{[n-m]}$ 

x[n-m]: the main tap. Red w: if the k-th C-FFE coefficient is negative, then use -0.5 for the non-main tap. x[n-k],  $k \neq m$ : a non-main tap. Blue w: if the k-th C-FFE coefficient is positive, then use  $\pm 0.5$  for the non-main tap. b[n-m] = x[n-m]⊗  $a_k$  $b_{[n-m+1]}$ ⊗ sub-filter  $v_{[n]}$  $a_m$ ⊗⊦  $x_{[n-m]}$ Ø  $a_{m+1}$ sub-filter

Fig. 7. Block diagrams of (a) N-tap C-FFE TX and (b) N-tap A-FFE TX.

(a)

and the (k+1)th data tap  $x_{[n-k]}$  of the A-FFE if the (k+1) th coefficient  $w_k$  of the corresponding C-FFE is negative. In contrast, if the (k+1)th coefficient  $w_k$  of the C-FFE is positive, the average filter must be used for  $x_{[n-m]}$  and  $x_{[n-k]}$  of the A-FFE. It is noticeable that  $b_{[n-m]} = x_{[n-m]}$ , which is the output of the average filter with two identical inputs of the main data taps  $(x_{[n-m]}$  and  $x_{[n-m]})$ . Including  $b_{[n-m]}$ , we will simply refer to the sub-filter output tap as  $b_{[n-k]}$  where  $k=0,\ldots,N-1$ . All sub-filter output taps  $b_{[n-k]}$  are multiplied by the corresponding A-FFE coefficients  $a_k$  and then added up. The summation result is the output  $v_{[n]}$  of the A-FFE, as illustrated in Fig. 7(b).

We can always find the A-FFE coefficients  $a_k$ s (k = 0, ..., N-1) that make both FFE outputs mathematically identical. For simple derivation, we will use the following vector variables to describe the architectures of A-FFE and C-FFE.  $\underline{b}$  is the column vector of the sub-filter output taps of A-FFE:  $\underline{b}[b_{[n]}b_{[n-1]}\cdots b_{[n-m]}\cdots b_{[n-N+1]}]^T$ .  $\underline{x}$  is the column vector of the data taps:  $\underline{x} = [x_{[n]}x_{[n-1]}\cdots x_{[n-N+1]}]^T$ . Note that both C-FFE and A-FFE have the same  $\underline{x}$  for the same data inputs.  $\underline{w}$  is the column vector of the normalized C-FFE coefficients:  $\underline{w} = [w_0 \ w_1 \ \cdots \ w_{N-1}]^T$  and  $\sum_{k=0}^{N-1} |w_k| = 1$ .  $\underline{a}$  is the column vector of the A-FFE coefficients:  $\underline{a} = [a_0 \ a_1 \ \cdots \ a_{N-1}]^T$ . Because the output  $b_{[n-k]}$  of the (k+1)th sub-filter of A-FFE can be expressed as follows:

$$b_{[n-k]} = 0.5 \left( x_{[n-m]} + \frac{w_k}{|w_k|} x_{[n-k]} \right) \tag{1}$$

where  $\underline{b}$  can be described in terms of x using an  $N \times N$  matrix  $\mathbf{A}$  as follows:

$$b = \mathbf{A}x\tag{2}$$

where

(b)

$$\mathbf{A} = \begin{bmatrix} 0.5 & \cdots & 0 & 0.5 & 0 & \cdots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \cdots & -0.5 & 0.5 & 0 & \cdots & 0 \\ 0 & \cdots & 0 & 1 & 0 & \cdots & 0 \\ 0 & \cdots & 0 & 0.5 & -0.5 & \cdots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \cdots & 0 & 0.5 & 0 & \cdots & 0.5 \end{bmatrix}$$

$$= 0.5 \begin{bmatrix} 1 & \cdots & 0 & 0 & 0 & \cdots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \cdots & -1 & 0 & 0 & \cdots & 0 \\ 0 & \cdots & 0 & 1 & 0 & \cdots & 0 \\ 0 & \cdots & 0 & 0 & -1 & \cdots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \cdots & 0 & 1 & 0 & \cdots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \cdots & 0 & 1 & 0 & \cdots & 0 \\ 0 & \cdots & 0 & 1 & 0 & \cdots & 0 \\ 0 & \cdots & 0 & 1 & 0 & \cdots & 0 \\ 0 & \cdots & 0 & 1 & 0 & \cdots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \cdots & 0 & 1 & 0 & \cdots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \cdots & 0 & 1 & 0 & \cdots & 0 \\ 0 & \cdots & 0 & 1 & 0 & \cdots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \cdots & 0 & 1 & 0 & \cdots & 0 \end{bmatrix}$$

$$= 0.5(\mathbf{W_{sign}} + \mathbf{C}). \tag{3}$$

In (3),  $\mathbf{W_{sign}}$  is an  $N \times N$  diagonal matrix of which the (k+1)th diagonal element is the sign of the corresponding (k+1)th C-FFE coefficient  $w_k$ .  $\mathbf{C}$  is an  $N \times N$  matrix in which the (m+1)th column vector is filled with "1"s, and the other elements are "0"s. From Fig. 7(b), the output  $v_{[n]}$  of the

A-FFE can be expressed in terms of  $\underline{a}$  and  $\underline{b}$ , and then can be reformulated in terms of a, x, and  $\mathbf{A}$  as follows:

$$v_{[n]} = \underline{a}^{\mathsf{T}}\underline{b} = \underline{a}^{\mathsf{T}}(\mathbf{A}\underline{x}) = (\underline{a}^{\mathsf{T}}\mathbf{A})\underline{x}$$
 (4)

by using (2). By assuming that the C-FFE and the A-FFE have the identical output  $v_{[n]}$ , the same output  $v_{[n]}$  of C-FFE can be also expressed in terms of  $\underline{w}$  and  $\underline{x}$  as follows:

$$v_{[n]} = \underline{w}^{\mathrm{T}} x. \tag{5}$$

From (4) and (5), the outputs of both FFEs are identical if

$$\underline{w} = \mathbf{A}^{\mathrm{T}} \underline{a}. \tag{6}$$

By using (6), we can always find the C-FFE that produces the same output as any A-FFE.

More explicit closed-form formulas of the A-FFE coefficients can be derived from (6) in terms of the C-FFE coefficients. The column vector of C-FFE coefficients  $\underline{w}$  can be expressed as follows:

$$\underline{w} = \mathbf{W_{sign}} w_{abs} \tag{7}$$

where  $\underline{w_{abs}}$  is a column vector of the absolute values of the C-FFE coefficients:  $\underline{w_{abs}} = [|w_0| \ |w_1| \ |w_2| \ \cdots \ |w_{N-1}|]^T$ . By substituting (3) and (7) into (6), we can derive (8) because  $\mathbf{W_{sign}}$  is a symmetric matrix

$$\mathbf{W_{sign}} w_{abs} = 0.5(\mathbf{W_{sign}} + \mathbf{C}^{\mathrm{T}})\underline{a}. \tag{8}$$

By multiplying  $W_{sign}$  to both sides of (8), (9) is acquired

$$\mathbf{W}_{\text{sign}}^2 w_{\text{abs}} = 0.5 (\mathbf{W}_{\text{sign}}^2 + \mathbf{W}_{\text{sign}} \mathbf{C}^{\text{T}}) \underline{a}. \tag{9}$$

Because  $\mathbf{W}_{\text{sign}}\mathbf{C}^{\text{T}} = \mathbf{C}^{\text{T}}$  and  $\mathbf{W}_{\text{sign}}^2 = \mathbf{I}$ , where  $\mathbf{I}$  is the  $N \times N$  identity matrix, (9) can be simplified to the following equation:

$$\underline{w_{\text{abs}}} = 0.5 (\mathbf{I} + \mathbf{C}^{\text{T}}) \underline{a}$$

$$= 0.5 \begin{bmatrix}
1 & \cdots & 0 & 0 & 0 & \cdots & 0 \\
\vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\
0 & \cdots & 1 & 0 & 0 & \cdots & 0 \\
1 & \cdots & 1 & 2 & 1 & \cdots & 1 \\
0 & \cdots & 0 & 0 & 1 & \cdots & 0 \\
\vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\
0 & \cdots & 0 & 0 & 0 & \cdots & 1
\end{bmatrix}
\underline{a}. \quad (10)$$

By solving (10) for each element of a, we can get the explicit closed-form formulas for A-FFE's coefficients in terms of the absolute values of C-FFE coefficients as follows:

$$a_{k\neq m} = 2|w_k|, \quad a_m = w_m - \sum_{\substack{k=0\\k\neq m}}^{N-1} |w_k|.$$
 (11)

By using (6) or (11), for the given C-FFE design, we can always find the A-FFE design that produces the identical output and vice versa. Although the structures and coefficients of the two FFEs acquired by (6) or (11) are different, both FFEs have mathematically identical outputs  $v_{[n]}$  if there is no coefficient error. Therefore, the mapping between C-FFE and A-FFE always exists.

Equations (1) and (11) also prove that A-FFE does not have tap subtraction in most practical applications where the

TABLE II
TAP COEFFICIENTS OF C-FFE AND A-FFE FOR
VARIOUS CHANNEL LOSSES

| Channel<br>Loss | C-FFE's Tap Coefficients<br>(W <sub>pre</sub> , W <sub>main</sub> , W <sub>post1</sub> , W <sub>post2</sub> ) | A-FFE's Tap Coefficients<br>(A <sub>pre</sub> , A <sub>main</sub> , A <sub>post1</sub> , A <sub>post2</sub> ) |
|-----------------|---------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------|
| 20 dB           | -0.16, +0.54, -0.28, +0.02                                                                                    | +0.32, +0.08, +0.56, +0.04                                                                                    |
| 25 dB           | -0.18, +0.52, -0.28, +0.02                                                                                    | +0.36, +0.04, +0.56, +0.04                                                                                    |
| 30 dB           | -0.19, +0.5, -0.29, +0.02                                                                                     | +0.38, +0, +0.58, +0.04                                                                                       |



Fig. 8. Example designs of the identical four-tap FFE employing (a) C-FFE and (b) A-FFE architectures. The single-bit responses and tap driver outputs of (c) C-FFE and (d) A-FFE examples.

main tap coefficient  $(w_m)$  of the corresponding C-FFE is not smaller than 0.5. The A-FFE output  $v_{[n]}$  is a weighted sum of sub-filter output taps  $b_{[n-k]}s:v_{[n]}=\sum_{k=0}^{N-1}b_{[n-k]}a_k$ . Therefore, if all non-zero terms  $b_{[n-k]}a_k$ s have the same sign; then, there is no analog subtraction between taps. We can prove this proposition by showing that all non-zero  $b_{[n-k]}$ s have the same sign using (1) and that  $a_k \ge 0$  for all k = $0, \ldots, N-1$  in practically interesting applications  $(w_m \ge$ 0.5) using (11). According to (1), of which results are also listed in Table I, all non-zero sub-filter output taps  $b_{[n-k]}$ s, including k = m, have the same sign of the main data tap  $x_{[n-m]} = b_{[n-m]}$ . Because  $a_{k\neq m} = 2|w_k|$  in (11), it is trivial that  $a_k \ge 0$  for all  $k \ne m$ . Because we are mostly interested in practical single-ended channels for which the C-FFE's main tap coefficient  $w_m \ge 0.5$ ,  $a_m = w_m - \sum_{k=0, k \ne m}^{N-1} |w_k| \ge 0$  from (11), because the C-FFE's coefficients are normalized  $(\sum_{k=0}^{N-1} |w_k| = 1)$ ,  $w_m \ge 0.5 \ge \sum_{k=0, k \ne m}^{N-1} |w_k|$  if  $w_m \ge 0.5$ . Therefore, (11) proves that  $a_k \ge 0$  for all ks in practical applications. Because all non-zero  $b_{[n-k]}$ s have the same sign and  $a_k \ge 0$  for all ks, there is no analog subtraction between taps of A-FFE in most practical applications. Table II shows the tap coefficients of C-FFE and A-FFE for various channel losses. In the simulation, when the PCB channel loss is 30 dB, the size of the C-FFE's main tap is 0.5. Therefore, A-FFE TX can be used when the channel loss is less than 30 dB.

For better understanding, Fig. 8 shows example designs of four-tap FFE employing C-FFE and A-FFE architectures, and the waveforms of their single-bit responses as well as their tap drivers' outputs. A-FFE's coefficients are acquired from the C-FFE's coefficients by (11). Although C-FFE and A-FFE have different outputs of tap drivers in Fig. 8(c) and (d), both



Fig. 9. Error signals when C-FFE and A-FFE TXs transmit a single bit pulse at 20 Gb/s, and there are 20% quantization errors on (a) (m+1)th C-FFE tap coefficient, (b) (m+2)th A-FFE tap coefficient, (c) (m+3)th A-FFE tap coefficient, and (d) (m+4)th A-FFE tap coefficient. A 1st-order RC channel with a loss of 15 dB at Nyquist frequency and a time constant of 88 ps is employed for the simulations.

TABLE III
FFE OUTPUT FORMULAS OF C-FFE AND A-FFE IN TERMS
OF FFE COEFFICIENTS

| D <sub>pre</sub> D <sub>main</sub> D <sub>post1</sub> D <sub>post2</sub> | CFFE sum (Coeff : -W <sub>pre</sub> ,<br>+W <sub>main</sub> , -W <sub>post1</sub> , +W <sub>post2</sub> ) | AFFE sum (Coeff : +A <sub>pre</sub> ,<br>+A <sub>main</sub> , +A <sub>post1</sub> , +A <sub>post2</sub> ) |
|--------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|
| -1 -1 -1 -1                                                              | +W <sub>pre</sub> -W <sub>main</sub> +W <sub>post1</sub> -W <sub>post2</sub>                              | -A <sub>main</sub> -A <sub>post2</sub>                                                                    |
| -1 -1 -1 +1                                                              | +W <sub>pre</sub> -W <sub>main</sub> +W <sub>post1</sub> +W <sub>post2</sub>                              | -A <sub>main</sub>                                                                                        |
| -1 -1 +1 -1                                                              | +W <sub>pre</sub> -W <sub>main</sub> -W <sub>post1</sub> -W <sub>post2</sub>                              | -A <sub>main</sub> -A <sub>post1</sub> -A <sub>post2</sub>                                                |
| -1 -1 +1 +1                                                              | +W <sub>pre</sub> -W <sub>main</sub> -W <sub>post1</sub> +W <sub>post2</sub>                              | -A <sub>main</sub> -A <sub>post1</sub>                                                                    |
| -1 +1 -1 -1                                                              | +W <sub>pre</sub> +W <sub>main</sub> +W <sub>post1</sub> -W <sub>post2</sub>                              | +A <sub>pre</sub> +A <sub>main</sub> +A <sub>post1</sub>                                                  |
| -1 +1 -1 +1                                                              | +W <sub>pre</sub> +W <sub>main</sub> +W <sub>post1</sub> +W <sub>post2</sub>                              | +A <sub>pre</sub> +A <sub>main</sub> +A <sub>post1</sub> +A <sub>post2</sub>                              |
| -1 +1 +1 -1                                                              | +W <sub>pre</sub> +W <sub>main</sub> -W <sub>post1</sub> -W <sub>post2</sub>                              | +A <sub>pre</sub> +A <sub>main</sub>                                                                      |
| -1 +1 +1 +1                                                              | +W <sub>pre</sub> +W <sub>main</sub> -W <sub>post1</sub> +W <sub>post2</sub>                              | +A <sub>pre</sub> +A <sub>main</sub> +A <sub>post2</sub>                                                  |
| +1 -1 -1 -1                                                              | -W <sub>pre</sub> -W <sub>main</sub> +W <sub>post1</sub> -W <sub>post2</sub>                              | -A <sub>pre</sub> -A <sub>main</sub> -A <sub>post2</sub>                                                  |
| +1 -1 -1 +1                                                              | -W <sub>pre</sub> -W <sub>main</sub> +W <sub>post1</sub> +W <sub>post2</sub>                              | -A <sub>pre</sub> -A <sub>main</sub>                                                                      |
| +1 -1 +1 -1                                                              | -W <sub>pre</sub> -W <sub>main</sub> -W <sub>post1</sub> -W <sub>post2</sub>                              | -Apre -Amain -Apost1 -Apost2                                                                              |
| +1 -1 +1 +1                                                              | -W <sub>pre</sub> -W <sub>main</sub> -W <sub>post1</sub> +W <sub>post2</sub>                              | -Apre -Amain -Apost1                                                                                      |
| +1 +1 -1 -1                                                              | -W <sub>pre</sub> +W <sub>main</sub> +W <sub>post1</sub> -W <sub>post2</sub>                              | +A <sub>main</sub> +A <sub>post1</sub>                                                                    |
| +1 +1 -1 +1                                                              | -W <sub>pre</sub> +W <sub>main</sub> +W <sub>post1</sub> +W <sub>post2</sub>                              | +A <sub>main</sub> +A <sub>post1</sub> +A <sub>post2</sub>                                                |
| +1 +1 +1 -1                                                              | -W <sub>pre</sub> +W <sub>main</sub> -W <sub>post1</sub> -W <sub>post2</sub>                              | +A <sub>main</sub>                                                                                        |
| +1 +1 +1 +1                                                              | -W <sub>pre</sub> +W <sub>main</sub> -W <sub>post1</sub> +W <sub>post2</sub>                              | +A <sub>main</sub> +A <sub>post2</sub>                                                                    |

FFEs have the identical single-bit responses. However, the operation of the A-FFE differs from that of the C-FFE in that the A-FFE produces the output voltage by only adding the tap drivers' outputs [Fig. 8(d)] whereas the C-FFE has analog subtractions between tap drivers' outputs [Fig. 8(c)]. It is also noticeable that the A-FFE's tap drivers, except the main one, are enabled only when necessary for FFE operation, whereas all the tap drivers of the C-FFE are always enabled. Table III summarizes the formulas of the outputs of C-FFE and A-FFE in terms of their tap coefficients. For all input patterns, the A-FFE does not have analog subtraction whereas the C-FFE does. Therefore, A-FFE saves unnecessary power consumption by analog subtraction between tap drivers' outputs.

# III. ROBUSTNESS TO QUANTIZATION ERRORS OF COEFFICIENTS

A-FFE suppresses the dominant error signals resulting from quantization errors of tap coefficients by utilizing channel loss like B-FFE [9], [10]. These errors are modulated to higher frequencies by A-FFE's difference filters. The modulated high-frequency error signals are more attenuated by the channel loss than the C-FFE error signals that are not modulated to a higher frequency.

Fig. 9 shows the output error signals of both a C-FFE and an A-FFE by 20% coefficient errors when each TX transmits a 1-bit pulse. We assumed that the low-bit tap control could be employed for practical use. The coefficient errors due to quantization of FFE coefficients can be modeled as an additive constant to the nominal coefficient, as shown in Fig. 9. In C-FFE, the main tap coefficient is usually the largest, and therefore, the quantization error of the main tap coefficient dominantly contributes to the error signal for the same percentage of errors [Fig. 8(c)]. Fig. 9(a) shows the output error signal of the C-FFE when its main tap coefficient  $w_m$  has +20% error. On the other hand, in A-FFE, the pre-cursor and post-cursor coefficients at the output taps of difference filters nearby the main tap are large for de-emphasis [Fig. 8(d)], and thus, their contributions to the error signals are dominant [Fig. 8(d)]. Fig. 9(b)-(d) depicts the output error signals when A-FFE has + 20% errors on the 1st  $(a_{m+1})$ , 2nd  $(a_{m+2})$ , and 3rd  $(a_{m+3})$  post-cursor coefficients at the output taps of difference filters, respectively.

We can express the C-FFE's additive TX output error signal  $w_{m}$ TX(t) due to errors on the main tap coefficient  $w_{m}$  as follows:

$$w_{m} \text{TX}(t) = \begin{cases} \Delta w_{m}, & 0 \le t < T \\ 0, & \text{otherwise.} \end{cases}$$
 (12)

The A-FFE's TX output error signal  $a_{m+k}$ \_TX(t) caused by the post-cursor de-emphasis (using a difference filter) coefficient  $a_{m+k}$  can be expressed as follows:

$$a_{m+k} TX(t) = \begin{cases} \Delta a_{m+k}, & 0 \le t < T \\ 0, & T \le t < kT \\ -\Delta a_{m+k}, & kT \le t < (k+1)T \\ 0, & (k+1)T \le t \end{cases}$$
(13)

where  $\Delta w_m$  and  $\Delta a_{m+k}$  are the additive errors on  $w_m$  and  $a_{m+k}$ , respectively. T is the symbol period. Because  $w_m$ \_TX(t)







Fig. 10. Frequency-domain transmitted and received error signals of the C-FFE and the A-FFE caused by (a) 20% quantization errors on the (m + 1)th C-FFE tap coefficient and (m + 2)th A-FFE tap coefficient, (b) 20% quantization errors on the (m + 1)th C-FFE tap coefficient, and (c) 20% quantization errors on the (m + 1)th C-FFE tap coefficient, and (c) 20% quantization errors on the (m + 1)th C-FFE tap coefficient, respectively.

is a square pulse [Fig. 9(a)], its spectrum has a large energy concentration at low frequencies (Fig. 10).  $a_{m+k}$ \_TX(t) is composed of one positive pulse and one negative pulse having the same magnitude  $\Delta a_{m+k}$  [Fig. 9(b)–(d)]. Therefore, it has much lower energy at low frequency than the C-FFE's error signal (Fig. 10).

Fig. 10 depicts the near-end (at the TX output) and the far-end (at the RX input) error signals of the A-FFE and C-FFE in the frequency domain. The C-FFE has a + 20% quantization error on the main tap  $w_m$ . The A-FFE has a + 20% quantization error on the 1st  $(a_{m+1})$ , 2nd  $(a_{m+2})$ , and 3rd  $(a_{m+3})$  post-cursor de-emphasis (using difference filters) coefficients, respectively, as shown in Fig. 10(a)–(c).

The Fourier transform of  $w_k$ \_TX(t) is

$$w_m \text{-TX}(f) = \Delta w_m T \operatorname{sinc}(Tf) e^{-j\pi Tf}.$$
 (14)

The Fourier transform of  $a_{m+k}$ \_TX(t) is

$$a_{m+k}TX(f) = \Delta a_{m+k}T\operatorname{sinc}(Tf)e^{-j\pi Tf}D(f)$$
 (15)

where  $D(f) = 0.5(e^{j\pi Tf} - e^{j2\pi(k-0.5)Tf})e^{-j\pi Tf}$  is the transfer function of the difference filter. Therefore, the difference filter, acting as a high-pass filter, attenuates the low-frequency components of the  $a_{m+k}$ \_TX(f) (Fig. 10). It is noticeable that the A-FEE's frequency-domain transmitted error signal has much smaller low-frequency components than the C-FFE's.

Because the channel is linear time-invariant (LTI), we can derive the far-end received C-FFE and A-FFE error signals as follows:

$$w_m RX(f) = H(f)w_m TX(f)$$
 (16)

and

$$a_{m+k} RX(f) = H(f)a_{m+k} TX(f)$$
(17)

respectively, where H(f) is the transfer function of the channel. The A-FFE's difference filter suppresses low-frequency error components while the low-pass filter channel suppresses high-frequency error components like the B-FFE [9], [10] [Fig. 10(a)–(c)]. In this manner, the A-FFE suppresses the error signals due to errors of the pre-cursor coefficients at the output taps of difference filters. On the other hand, the received C-FFE error signal is larger because its low-frequency component is rarely attenuated by the channel. Therefore, A-FFE is more robust to quantization errors of coefficients than C-FFE.



Fig. 11. Schematic of the implemented four-tap A-FFE TX.

The A-FFE's robustness advantage regarding quantization errors of tap coefficients over the C-FFE becomes greater if the channel loss is larger. With a larger channel loss, the C-FFE eye size is smaller while the tap coefficients and thus their errors become larger. Therefore, the C-FFE suffers more from the quantization errors with a larger channel loss [9], [10]. On the other hand, with a larger channel loss not only the quantization errors but also the attenuation of the error signals become larger in A-FFE. Therefore, A-FFE becomes much more robust to quantization errors of tap coefficients than C-FFE if the channel loss is large.

## IV. TRANSMITTER DESIGN

To verify the proposed FFE architecture, a four-tap A-FFE TX was designed for a single-ended memory interface using inverter drivers. Fig. 11 shows a schematic of the four-tap A-FFE TX. The TX adopts a half-rate architecture and consists of latch-based half-rate shift registers, half-rate digital decoding blocks, serializing 2:1 multiplexers (MUXs), full-rate tap drivers, and a clocking circuit (Fig. 11).

The four-tap A-FFE is composed of the 1st pre-tap, the main tap, the 1st post-tap, and the polarity-switchable 2nd post-tap to operate on a 15-dB PCB trace. The sign of the 2nd post-tap is controlled by the sign control bits of the XOR gates in the decoding block. The sizes of tap drivers are carefully designed to provide the driving strength needed for the desired data rate and the target channel loss.

The sub-filters of the A-FFE can be easily implemented by digital logic gates in the decoding block (Fig. 11). The inputs of the main tap and digital outputs of the decoding block are



Fig. 12. (a) Single-bit responses of the four-tap A-FFE at TX output. (b) Single-bit responses of the four-tap and three-tap A-FFE at RX input.

serialized by retiming 2:1 MUXs and then fed to the A-FFE TX tap drivers.

The tap drivers have two types: strong and weak drivers. A strong driver is an inverter bank that can provide large driving strength while a weak driver is a current-starved inverter whose strength can be precisely controlled by the tail current sources (Fig. 11) [14]. To allow for high swing voltage output, we used non-cascode current sources in our prototype chip.

Because the magnitudes of the 1st pre-tap and 1st post-tap coefficients of the A-FFE are much larger than the main and 2nd post-tap coefficients, they are implemented with strong drivers for sufficient driving strength. In our proof-of-concept design, the 1st pre-tap and 1st post-tap drivers are binary 6- and 7-bit inverter banks, respectively, with digitally configurable driving strength. These configurations are chosen to produce the same coefficient error percentage to verify robustness to the quantization error of the tap coefficient in measurement. The NAND gates and NOR gates control the pull-up and pull-down strengths of the strong drivers, respectively.

Because the A-FFE coefficients of the main tap and the 2nd post-tap are smaller than the coefficients of the other taps, their tap coefficients must be precisely controlled. Therefore, these taps were implemented with the weak drivers. However, when only the main tap driver is turned on, the strength of the A-FFE's output driver is small. In Table III, the input data patterns ( $D_{\rm pre}$ ,  $D_{\rm main}$ ,  $D_{\rm post1}$ ,  $D_{\rm post2}$ ) when only the main-tap driver drives the rest of the off FFE units are as follows: (-1, -1, -1, 1), (1, 1, 1, -1). Fig. 12 shows the single-bit response at the TX output and RX input with a 20-dB loss channel and output capacitors (200 fF for ESD and 100 fF for the pad) in the post-layout simulation. Although the limited driving strength of the main-tap driver may cause some post-cursor ISI [Fig. 12(a)], the 2nd post-tap driver can effectively compensate for the induced ISI [Fig. 12(b)].

The strong drivers behave nonlinearly due to the change of the TX output impedance [12]. For example, as the TX output voltage approaches the supply, the pull-up strength of the strong driver diminishes due to the reduced drain voltage [Fig. 13(a)]. Therefore, the TX output voltage is not sufficiently high. A booster tap driver is a strong driver that compensates for the nonlinear strength degradation of the inverter when the TX output voltage is near to the supply or ground (Fig. 11). In this case, the booster tap is



Fig. 13. Simulated and measured TX outputs with and without enabling a booster tap driver at 20 Gb/s with 1.1-V supply. (a) Simulated 1-bit pulse response, (b) measured eye diagram with a disabled booster tap driver, and (c) measured eye diagram with a enabled booster tap driver.



Fig. 14. Histograms of the A-FFE TX's output impedance and simulated eye diagrams of TX output across corners. (a) Pull-up impedance, (b) pull-down impedance, and (c) simulated eye diagrams.

activated to raise the TX output voltage to the appropriate level [Fig. 13(a)]. In this example, the PMOS booster tap turns on when both 1st pre-tap and 1st post-tap are activated simultaneously to increase the output voltage. Likewise, the NMOS booster tap helps tap drivers pull the TX output down when needed.

Fig. 14 shows histograms of the A-FFE TX's output impedance and the 20-Gb/s eye diagrams simulated with a 20-dB loss channel at multiple corners. The histograms were acquired by Monte Carlo simulation with 1000 samples. Thanks to the low impedance of the switch-MOS, when all taps are turned on, the pull-up and pull-down impedances of the A-FFE TX are 7.86 and 8.27  $\Omega$ , respectively, which is very small compared to conventional 50  $\Omega$ . The 3 $\sigma$  variations of the pull-up and pull-down impedances are 0.84 and 0.93  $\Omega$ , respectively. The eye height is reduced by the variation of the output impedance without changing the tap coefficients [Fig. 14(c)]. The maximum reduction rate of the eye height is 11%. Despite the variations, we achieve the eye diagram with high eye height due to the high TX output voltage swing [Fig. 14(c)]. The receiver-side termination of 50  $\Omega$  allows for relaxed impedance matching constraints, even with changes in TX impedance caused by PVT variation [6].



Fig. 15. Schematic of clocking circuits for serializing 2:1 MUXs.



Fig. 16. Die micrograph.

The clocking circuit consists of a duty cycle corrector (DCC), a digitally control delay line (DCDL), and clock drivers for serializing 2:1 MUXs (Fig. 15). The DCC consists of an always-on inverter and 4-bit coarse and 3-bit fine tri-state inverter banks with adjustable rise/fall delays. The DCDL is used to compensate for the skew between CLK\_OUT and CLKB\_OUT (Fig. 15). The DCDL consists of two inverters, each with MOS capacitor banks that are inserted between the inverter stages.

# V. MEASUREMENT RESULTS

To verify that an A-FFE TX can employ inverter drivers and improve robustness against quantization errors of tap coefficients and power efficiency, we fabricated the proposed four-tap A-FFE TX in 28-nm CMOS technology. Fig. 16 shows the TX's die micrograph. Due to the absence of termination resistors, the TX driver and the TX core occupy only 336 and 1149  $\mu$ m², respectively. A test environment of the TX is shown in Fig. 17. The chip was tested with a supply voltage of 1.1 V and the PRBS-31 data pattern. The PRBS-31 data are produced by using an on-chip PRBS generator [13]. The output data of the TX is applied to the oscilloscope via a PCB trace, an SMA cable, and a bias-tee. The PCB trace loss is measured to be 15 dB at 10 GHz (Fig. 17).

Fig. 13(b) and (c) shows the measured eye diagrams of the TX without and with enabling a booster tap, respectively. The TX achieves a data rate of 20 Gb/s. The eye height and width are 55.1 mV and 0.44 UI, respectively, as shown in Fig. 13(c). With the disabled booster tap, the eye height is reduced to 30.9 mV, as shown in Fig. 13(b). Therefore, the eye height is improved by 78% by the booster tap, which compensates for the non-linear behavior of the inverter drivers.



Fig. 17. Test setup.



Fig. 18. Measured eye diagrams of the four-tap A-FFE TX (a) without and (b) with a 20% error on the most sensitive tap (1st pre-tap) coefficient.



Fig. 19. Measured A-FFE TX energy consumption versus probability of data transition: the energy consumption of (a) total TX circuit and (b) only drivers.

To evaluate the A-FFE's robustness against quantization errors of coefficients, we measured eye sensitivities [12]. The eye sensitivity is the percentage of eye size reduction divided by the percentage of a coefficient reduction [12]. Fig. 18 shows the measured eye diagrams without and with a 20% error on the most sensitive tap coefficient. The errors of the strong and the weak drivers are given by changing the number of enabled inverters and the strength of current sources, respectively. The eye height was the smallest when the pre-cursor tap coefficient was reduced by 20%. The measured worst eye sensitivity is 0.68 (Fig. 18). In simulations, the eye sensitivities of the four-tap inverter-based C-FFE TX and the four-tap inverter-based A-FFE TX are 3.8 and 0.7, respectively (Fig. 3). These data show that A-FFE is much more robust against quantization errors of coefficients than C-FFE.

Fig. 19 presents the measured A-FFE TX energy consumption versus data transition probability. The A-FFE TX drivers' energy consumption is changed by transition probability. When data transition occurs, the difference filters

|                               | ISSCC 2020 [1]                                  | ISSCC 2022 [2]                                            | TCAS I 2022 [6]           | JSSC 2016 [8]                                        | JSSC 2016 [10]                     | This work                              |
|-------------------------------|-------------------------------------------------|-----------------------------------------------------------|---------------------------|------------------------------------------------------|------------------------------------|----------------------------------------|
| Technology                    | 8 nm                                            | 65 nm                                                     | 28 nm LPP                 | 28 nm FD-SOI                                         | 65 nm                              | 28 nm LPP                              |
| Supply voltage (V)            | VDDQ = 1.35,<br>VDD = 0.85                      | 1                                                         | 1.13                      | N/A                                                  | 1.3                                | 1.1                                    |
| Single/Differential           | Single                                          | Single                                                    | Single                    | Single                                               | Differential                       | Single                                 |
| Driver Type                   | Voltage-mode driver + capacitive-peaking driver | Capacitive driver with a ground-forcing biasing technique | Low-impedance<br>inverter | High-impedance<br>Inverter + RC high-<br>pass filter | CML                                | Inverter + current<br>starved inverter |
| Equalization (TX)             | 1-tap de-emphasis, Edge<br>boost, FEXT EQ       | 2-tap FFE                                                 | X                         | Passive EQ                                           | 4-tap B-FFE                        | 4-tap A-FFE                            |
| FFE Tap Addition-Only         | X                                               | X                                                         | N/A                       | N/A                                                  | X                                  | 0                                      |
| Data pattern                  | N/A                                             | PRBS 7                                                    | PRBS 31                   | PRBS 7                                               | N/A                                | PRBS 31                                |
| Data rate (Gb/s)              | 18                                              | 12                                                        | 20                        | 20                                                   | 8                                  | 20                                     |
| Channel loss (dB)             | 10                                              | N/A                                                       | 8                         | 10.7                                                 | 25                                 | 15 (PCB trace only)                    |
| Worst eye sensitivity         | N/A                                             | N/A                                                       | N/A                       | N/A                                                  | 0.56                               | 0.68                                   |
| Output Swing at TX Output     | N/A                                             | N/A                                                       | N/A                       | N/A                                                  | N/A                                | $870~\mathrm{mV_{pp}}$                 |
| Output Swing at Far-End       | N/A                                             | N/A                                                       | $850~\mathrm{mV_{pp}}$    | $118^{\rm b}~{ m mV_{pp}}$                           | 131 <sup>b</sup> mVd <sub>pp</sub> | 253 mV <sub>pp</sub>                   |
| Eye Height                    | 130 mV (WRITE), 110 mV (READ)                   | $36^{\rm b}~{ m mV}$                                      | 234 mV                    | 24 <sup>b</sup> mV                                   | 50 mV                              | 55.1 mV                                |
| Energy efficiency (TX) (pJ/b) | N/A                                             | 0.264                                                     | 1.18                      | 0.14                                                 | N/A                                | 1.18                                   |
| Area (um²)                    | 4151250 <sup>a</sup>                            | 3045                                                      | 1126                      | 4556                                                 | 2128                               | 1149                                   |

TABLE IV
PERFORMANCE SUMMARY AND COMPARISON

a Area includes PLL, CA Slice, and data slice (16bit)

<sup>b</sup>Estimated from an eye diagram.



Fig. 20. Power breakdown of the A-FFE TX.

become active and generate the 1st post-cursor and 1st precursor signals; if no data transition occurs, these filters are deactivated. In Fig. 19(b), without data transition (0% probability), the A-FFE TX drivers consume only 10% of the power consumption with a 25% data transition probability. Excluding the main-tap and 2nd post-tap, all A-FFE taps are activated only when data transition occurs. As a result, the energy consumption increases linearly with the probability of data transition (Fig. 19). Therefore, in the idle state, the A-FFE TX can save unnecessary power dissipation.

Fig. 20 shows the power breakdown of the TX. The power consumption breakdown of each sub-circuit is estimated by the measured TX power and by prorating the simulation results. The TX consumes 1.18 pJ/bit at the maximum speed of 20 Gb/s. The clocking circuit (clock drivers, DCC, and DCDL) occupies the largest portion (43%) of the TX power consumption whereas the decoding block consumes the smallest portion (3%). Because the decoding block is placed before the serializers, its size and power consumption are small. The clock drivers, the DCC, and the DCDL occupy 22%, 9%,

and 12% portion of the TX power consumption, respectively (Fig. 20). Excluding the power consumption of the DCC and the DCDL, the TX consumes 0.93 pJ/b at a data rate of 20 Gb/s (Fig. 20). The power consumption of the DCDL increases the power consumption of the entire clocking circuit. Due to the limited design time, we utilized an existing clocking circuit that was not optimized in this prototype chip. If we had used an optimized clocking circuit, the power consumption would have been smaller.

A performance summary and comparison with prior works are shown in Table IV. The proposed A-FFE architecture entirely removes subtractions between four-tap cursors whereas the C-FFE [2] and the B-FFE [10] do not. Since the A-FFE has an addition-only property, the four-tap A-FFE TX can use inverter drivers. The TX is robust to quantization errors of coefficients like the B-FFE [10] employing difference filters. Since using area-efficient inverter drivers, the TX occupies only 1149  $\mu$ m<sup>2</sup>. The TX occupies a smaller area than the passive-equalization TX [8]. The TX achieves a data rate of 20 Gb/s/pin although the TX in [1] achieves a slower data rate of 18 Gb/s/pin.

## VI. CONCLUSION

In this article, we proposed an A-FFE TX architecture. The addition-only property of the A-FFE architecture has three advantages. First, it allows to use inverter drivers as FFE taps. Because the inverter drivers do not include termination resistors, the TX can fit in a very small area. Second, it saves unnecessary power consumption by tap subtractions. Finally, A-FFE is robust to quantization errors of tap coefficients because the error signals are suppressed by the channel loss.

To verify the A-FFE architecture, we designed an inverterbased four-tap A-FFE TX in 28-nm bulk technology. The test chip achieves a data rate of 20 Gb/s/pin. The TX drivers have inverter banks and current starved inverters without employing termination resistors. Therefore, it occupies a small area of 1149  $\mu$ m<sup>2</sup>. Furthermore, the TX consumes low power when the probability of data transition is low. The A-FFE TX drivers consume 90% less power when the data transition probability is 0% than when it is 25%. The TX achieves an eye sensitivity of 0.68, and its energy efficiency is 1.18 pJ/bit.

#### ACKNOWLEDGMENT

The authors would like to thank IC Design Education Center (IDEC) for tool supports.

#### REFERENCES

- S.-M. Lee et al., "An 8 nm 18 Gb/s/pin GDDR6 PHY with TX bandwidth extension and RX training technique," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2020, pp. 338–339.
- [2] S. Lee, J. Yun, and S. Kim, "A 78.8fJ/b/mm 12.0 Gb/s/Wire capacitively driven on-chip link over 5.6 mm with an FFE-combined ground-forcing biasing technique for DRAM global bus line in 65 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. 65, Feb. 2022, pp. 454–456.
- [3] C. Moon, J. Seo, M. Lee, I. Jang, and B. Kim, "A 20 Gb/s/pin 1.18pJ/b 1149 μm<sup>2</sup> single-ended inverter-based 4-tap addition-only feed-forward equalization transmitter with improved robustness to coefficient errors in 28nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2022, pp. 450–451.
- [4] M. Choi et al., "An FFE transmitter which automatically and adaptively relaxes impedance matching," *IEEE J. Solid-State Circuits*, vol. 53, no. 6, pp. 1780–1792, Jun. 2018.
- [5] M. Choi et al., "An FFE TX with 3.8x eye improvement by automatic impedance adaptation for universal compatibility with arbitrary channel and RX impedances," in *Proc. Symp. VLSI Circuits*, Jun. 2017, pp. 58–59
- [6] M. Lee, P. K. Kaur, J. Seo, S. Han, and B. Kim, "A compact single-ended inverter-based transceiver with swing improvement for short-reach links," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 69, no. 9, pp. 3679–3688, Sep. 2022.
- [7] J. Seo, S. Lee, M. Lee, C. Moon, and B. Kim, "A 20-Gb/s/pin 0.0024-mm² single-ended DECS TRX with CDR-less self-slicing/auto-deserialization to improve tolerance on duty cycle error and RX supply noise for DCC/CDR-less short-reach memory interfaces," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. 65, Feb. 2022, pp. 1–3.
- [8] B. Dehlaghi and A. C. Carusone, "A 0.3 pJ/bit 20 Gb/s/wire parallel interface for die-to-die communication," *IEEE J. Solid-State Circuits*, vol. 51, no. 11, pp. 2690–2701, Nov. 2016.
- [9] S. Han, S. Lee, M. Choi, J.-Y. Sim, H.-J. Park, and B. Kim, "A coefficient-error-robust FFE TX with 230% eye-variation improvement without calibration in 65 nm CMOS technology," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2014, pp. 50–51.
- [10] S. Han, S. Lee, M. Choi, J.-Y. Sim, H.-J. Park, and B. Kim, "A coefficient-error-robust feed-forward equalizing transmitter for eyevariation and power improvement," *IEEE J. Solid-State Circuits*, vol. 51, no. 8, pp. 1902–1914, Aug. 2016.
- [11] C. Menolfi et al., "A 16 Gb/s source-series terminated transmitter in 65 nm CMOS SOI," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2007, pp. 446–447.
- [12] B. Kim and V. Stojanovic, "An energy-efficient equalized transceiver for RC-dominant channels," *IEEE J. Solid-State Circuits*, vol. 45, no. 6, pp. 1186–1197, Jun. 2010.
- [13] M. Lee, S. Han, J.-Y. Sim, H.-J. Park, and B. Kim, "A 10-GHz multipurpose reconfigurable built-in self-test circuit for high-speed links," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Nov. 2017, pp. 76–77.
- [14] H.-G. Ko, S. Shin, J. Oh, K. Park, and D.-K. Jeong, "An 8Gb/s/μm FFE-combined crosstalk-cancellation scheme for HBM on silicon interposer with 3D-staggered channels," in *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, Feb. 2020, pp. 128–129.



Changjae Moon (Graduate Student Member, IEEE) received the B.S. degree in electronic and electrical engineering from Pohang University of Science and Technology (POSTECH), Pohang, South Korea, in 2018, where he is currently pursuing the combined M.S. and Ph.D. degrees.

His research interests include high-speed links and signal integrity.

Mr. Moon received the Corporate Special Awards at the Korea Semiconductor Design Contest in 2022 and 2023.



**Jaeyoung Seo** received the B.S., M.S., and Ph.D. degrees in electrical engineering from Pohang University of Science and Technology (POSTECH), Pohang, South Korea, in 2015, 2017, and 2023, respectively.

Since 2023, he has been a Staff Engineer with Samsung Electronics, Hwaseong, South Korea. His research interests include high-speed serial/parallel links, signal/power integrity, and interconnect modeling.

Dr. Seo received several honorable awards. He was a co-recipient of the 19th and 23rd Korean Solid-State Circuits Design Competition Awards. He received the Kim Bum Man Best Dissertation Award from the Department of Electrical Engineering, POSTECH, in 2023.



Myungguk Lee (Graduate Student Member, IEEE) received the B.S. degree in electronic engineering from the Kumoh National Institute of Technology, Gumi, South Korea, in 2015, and the M.S. degree in electrical engineering from Pohang University of Science and Technology (POSTECH), Pohang, South Korea, in 2017, where he is currently pursuing the Ph.D. degree.

His research interests include high-speed links and signal integrity.

Dr. Lee was a recipient of the Gold Award from

the Korea Semiconductor Design Contest in 2017 and the Corporate Special Award in 2022 and 2023.



**Iksu Jang** (Graduate Student Member, IEEE) received the B.S. degree in electrical and computer engineering from Ajou University, Suwon, South Korea, in 2018, and the M.S. degree from Pohang University of Science and Technology (POSTECH), Pohang, South Korea, in 2020, where he is currently pursuing the Ph.D. degree in electrical engineering.

His current research interest is high-speed link circuits.



Byungsub Kim (Senior Member, IEEE) received the B.S. degree in electrical engineering from Pohang University of Science and Technology (POSTECH), Pohang, South Korea, in 2000, and the M.S. and Ph.D. degrees in electrical engineering and computer science from Massachusetts Institute of Technology (MIT), Cambridge, MA, USA, in 2004 and 2010, respectively.

He was an Analog Design Engineer with Intel Corporation, Hillsboro, OR, USA, from 2010 to 2011. In 2012, he joined the Faculty of Department of

Electrical Engineering, POSTECH, where he is currently a Professor.

Dr. Kim received several honorable awards. He was a recipient of the IEEE JOURNAL OF SOLID-STATE CIRCUITS Best Paper Award, in 2009; the Analog Device Inc.; and the Outstanding Student Designer Award from MIT, in 2009. He was a co-recipient of the Beatrice Winner Award for Editorial Excellence at the 2009 IEEE International Solid-State Circuits Conference. For several years, he served or has been serving as the Technical Program Committee Member for the IEEE International Solid-State Circuits Conference and the IEEE Asian Solid-State Circuit Conference.