# Imperial College London



# Global Interconnections in FPGAs: Modeling and Performance Analysis

Terrence Mak, Crescenzo D'Alessandro, Pete Sedcole, Peter Y.K. Cheung, Alex Yakovlev and Wayne Luk Department of Electrical and Electronic Engineering, Imperial College London, UK Department of Computing, Imperial College London, UK School of Electrical, Electronic and Computer Engineering, Newcastle University, UK

### Outline

#### Introduction

- Interconnect scaling problem
- Background
  - Sakurai's approximation
  - Deodhar and Davis's Method
- FPGA Interconnection Model
  - Modeling
  - Delay and Throughput Derivation
- Comparison with SPICE
- FPGA Experiments
  - Wave-pipelining in a Virtex-5 Device
- Conclusion

#### **Global Interconnections Problems in FPGAs**

0.1

250

- Technology scaling
  - Long interconnection looks grim in future
  - Worse in FPGAs
- Demand for high bandwidth interconnections
  - Rent's rule predicts that
  - Network-on-Chip architecture
- Interconnect fringing



130

Delay for Metal 1 and Global Wiring versus Feature Size

Process Technology Node (nm)

90

65 45 32

180

Local wire

Gate delay

#### Gate delay gets better, wire delay gets worse



# **New Signaling Techniques**

- Emerging techniques
  - Interconnect wave-pipelining (Dobkin *et al.*, Deodhar *et al.*, Xu *et al.*)
  - Phase-encoding (D'Alessandro and Yakovlev)
  - LVDS (Lee et al.)
- Can these techniques be adapted to FPGAs?
  - (Bad news) Interconnection nightmare
  - (Good news) Buffers were inserted in switching points

#### Previous Work (I)

Sakurai's Closed-Form Approximation

$$v_i = 1 - \sum_{j=1}^{\infty} k_{i,j} e^{-t_{v_i}/\sigma_{i,j}}$$

$$\sigma_i = R_i^d C_i^d + R_i^d C_i^s + R_i^s C_i^d + 0.4 R_i^s C_i^s$$

$$k_{i} = 1.01 \frac{R_{i}^{d}C_{i}^{s} + R_{i}^{s}C_{i}^{d} + R_{i}^{s}C_{i}^{s}}{R_{i}^{d}C_{i}^{s} + R_{i}^{s}C_{i}^{d} + \frac{\pi}{4}R_{i}^{s}C_{i}^{s}}$$



# Previous Work (II)

#### Deodhar and Davis's Method



pulse width (PW)



- Throughput = 1/PW
- Assumption
  - All interconnect segments and buffers are the same

#### Interconnections in FPGAs

- Highly constrained routing network versus flexible wire design in ASIC
  - Interconnections constructed based on segments of wires
  - Buffered at switching points
  - Interconnections are unknown until a circuit has been downloaded
- A structure model to analyze interconnection performance in FPGAs

#### **FPGA Global Interconnection Modeling**



#### **Delay Derivation**



• Delay  $T_n^{\text{Delay}} = \sigma_n \ln\left(\frac{\gamma_n k_n}{\gamma_n - v_n}\right) + \sum_{i=1}^{n-1} \sigma_i \ln\left(\frac{\gamma_i k_i}{\gamma_i - 0.5}\right) + \sum_{i=1}^n \delta_i$ Total of delay

Delay of the last segment

Total of delay of the n-1 segments

Delay of the *n* buffers

## **Throughput Derivation**



## A Note About Throughput

• Throughput is function of  $v_1$ , which can only be computed backward from  $v_n$ 



• If all interconnect segments are the same

$$v_{n-1} = \frac{1}{2 - v_n}$$

- 1. Larger parameter space to explore the interconnect throughput
- A more general expression is
- 2. Can be used for FPGA interconnection

$$v_{i-1} = \frac{\gamma_{i-1}}{\gamma_{i-1}k_{i-1}(2\gamma_{i-1}-1)\left(\frac{\gamma_i - v_i}{\gamma_i k_i}\right)^{\frac{\sigma_i}{\sigma_{i-1}}} + 1}$$

#### Max Throughput versus Min Delay



SPICE simulation based on 90nm PTM model

- Cadence
  Spectra and
  Analog
  Environment
- Up to 4 types of interconnects (Single, Double, Hex, 24-long)

#### **Throughput Ratio**



# **FPGA Virtex-5 Experiments**





- Built-In-Self-Test system (Wong et al. FPT'07)
- Run-time reconfigurable frequency and phase sweep
- Microblaze processor for control, measurement and interfacing
- Xilinx 65nm Virtex-5 device

#### Result (Frequency-phase profile)



#### **FPGA** Architecture Modification

Adding more buffers to the long wires



#### Conclusion

- A FPGA global interconnection model
- Studied the delay and throughput
- Experiments on a Virtex-5 FPGA
- Implications
  - Interconnect wave-pipelining can be realized in FPGA for higher throughput
  - On-FPGA serialization
  - FPGA architecture modification for interconnect wave-pipelining