Skip to main content

AutoBridge

Project description

Latest

  • [02/20/2022] We decide to only maintain AutoBridge as a plug-in of the TAPA workflow. The TAPA framework provides a stable and robust environment for AutoBridge across different HLS versions. TAPA is easy and natural to use if you are familiar with the HLS dataflow coding style.

image

  • [01/06/2022] We are integrating AutoBridge and TAPA to create a robust workflow. Currently AutoBridge relis on hacking the RTL generated by Vivado HLS, which makes it fragile. Instead, using the open-source TAPA compiler as the frontend will make the floorplanning-pipelining flow much more robust. While the integration of AutoBridge and TAPA is still in progress, feel free to contact me if you want to try it out, we will provide as much help as needed to make your design work!

  • [01/06/2022] With the help of AutoBridge and TAPA, Serpens achieves 270 MHz on Alveo U280 while using 24 HBM channels, while a normal Vitis flow will failed in routing. Serpens is an HBM-based accelerator for sparse matrix-vector multiplication (SpMV). With the high frequency, Serpens gets a 3.79X performance improvement over the previous state-of-the art GraphLily.

  • [01/06/2022] With the help of AutoBridge and TAPA, Sextans achieves 260 MHz on Alveo U250 while using 4 DDR channels, while a normal Vitis flow will only achieves 190 MHz.

  • [12/20/2021] We just open-sourced RapidStream, a follow-up work of AutoBridge. This time we parallelize the placement and routing of each slot based on the floorplanning by AutoBridge. Check out how we achieve 5-7X speedup over Vivado!

  • A new implementation has been ready! Check the example in AutoBridge/in-develop/test/autosa_cnn_13x8/.

  • The user interface has been significantly simplified. To invoke the new AutoBridge, just write a simple config file like this:

{
  "Board" : "U250",
  "HLSProjectPath" : "./kernel3",
  "HLSSolutionName" : "solution",
  "TopName" : "kernel3",

  "FloorplanMethod": "IterativeDivisionToHalfSLR",
  "AreaUtilizationRatio" : 0.7,

  "BundleToDDRMapping" : {
    "gmem_A": 0,
    "gmem_B": 1,
    "gmem_C": 2
  },

  "LoggingLevel" : "DEBUG"
}

About

  • What: AutoBridge is a floorplanning tool for Vivado HLS dataflow designs.

  • Why: Co-optimizing HLS compilation and placement brings new opportunities to improve the final achievable frequency.

  • How: Pre-determine the rough location of each module during HLS compilation, so that:

    • the long interconnect could be adequately pipelined by the HLS scheduler.

    • we prevent the Vivado placer to place the logic too densely.

  • In our experiments with a total of 43 design configurations, we improve the average frequency from 147 MHz to 297 MHz.

    • Notably, in 16 experiments we make the originally unroutable designs achieve 274 MHz on average
  • The pre-print manuscript of our paper could be found at https://vast.cs.ucla.edu/sites/default/files/publications/AutoBridge_FPGA2021.pdf

  • Projects using AutoBridge:

  • Motivating Examples:

    • Comparison of a stencil accelerator on Xilinx U280. From routing failure to 297 MHz.

      • Each color represents a module.
      • AutoBridge ensures a clean separation of logic in different regions to minimize unnecessary die crossing.
    • Comparison of a systolic array on Xilinx U250. From 158 MHz to 316 MHz.

      • Note that Vivado will try to pack things together to avoid die crossing as much as possible.
      • Instead, we ensure a balanced resource utilization across the whole device to reduce local congestion.
      • Meanwhile, the global connections will be adequately pipelined.

Requirements

  • Python 3.6+ and Pip
sudo apt install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.6
sudo apt install python3-pip
  • Pyverilog
python3.6 -m pip install pyverilog
  • Iverilog
sudo apt install iverilog
  • Python mip
python3.6 -m pip install mip
  • It is highly recommended that the user install the Gurobi solver which is free to academia and can be easily installed.

  • Package for Alveo U250 and U280 FPGA

  • Xilinx Vivado HLS and Xilinx Vitis

    • Note that the original experiments are based on the version 2019.2.
    • So far the floorplanner works well with designs compiled by the latest Vitis HLS 2020.2. However, Vitis HLS 2020.2 seems to have a bug in creating the "xo" object (Step 3). We are contacting Xilinx to confirm.

Introduction

Despite an increasing adoption of high-level synthesis (HLS) for its design productivity advantages, there remains a significant gap in the achievable clock frequency between an HLS-generated design and an optimized handcrafted RTL. In particular, the difficulty in accurately estimating the interconnect delay at the HLS level is a key factor that limits the timing quality of the HLS outputs. Unfortunately, this problem becomes even worse when large HLS designs are implemented on the latest multi-die FPGAs, where die-crossing interconnects incur a high delay penalty.

To tackle this challenge, we propose AutoBridge, an automated framework that couples a coarse-grained floorplanning step with pipelining during HLS compilation.

  • First, our approach provides HLS with a view on the global physical layout of the design; this allows HLS to more easily identify and pipeline the long wires, especially those crossing the die boundaries.
  • Second, by exploiting the flexibility of HLS pipelining, the floorplanner is able to distribute the design logic across multiple dies on the FPGA device without degrading clock frequency; this avoids the aggressive logic packing on a single die, which often results in local routing contention that eventually degrades timing.
  • Since pipelining may introduce additional latency, we further present analysis and algorithms to ensure the added latency will not hurt the overall throughput.

Currently AutoBridge supports two FPGA devices: the Alveo U250 and the Alveo U280. The users could customize the tool to support other FPGA boards as well.

Inputs

To use the tool, the user needs prepare for their Vivado HLS project that has already been c-synthesized.

To invoke AutoBridge, the following parameters should be provided by the user:

  • project_path: Directory of the HLS project.

  • top_name: The name of the top-level function of the HLS design

  • DDR_enable: A vector representing which DDR controllers the design will connect to. In U250 and U280, each SLR of the FPGA contains the IO bank for one DDR controller that can be instantiated. For example,

      DDR_enable = [1, 0, 0, 1]

means that there are four SLRs (U250) and the DDR controller on the SLR 0 and SLR 3 (the bottom one is the 0-th) are instantiated while the SLR 1 and SLR 2 are not instantiated. This parameter will affect the floorplanning step, as we must not use the area preserved for DDR controllers.

  • DDR_loc_2d_y: A dictionary recording the y-dim location of user-specified modules. For each IO module (which will directly connect to peripheral IPs such as DMA or DDR controller) in the design, the user must explicity tell the tool which region this module should be placed, according to the location of the target peripheral IPs (which usually have fixed locations). For example,
      DDR_loc_2d_y['foo'] = 1

means that the module (HLS function) foo must be placed in the 1-st SLR of the FPGA.

  • DDR_loc_2d_x: A dictionary recording the x-dim location of user-specified modules. By default we split each SLR by half. For example,
      DDR_loc_2d_x['bar'] = 1

means that the module (HLS function) must be placed in the right half (1 for the right half and 0 for the left half) of the FPGA.

  • max_usage_ratio_2d: A 2-dimensional vector specifying the maximum resource utilization ratio for each region. For example,
      max_usage_ratio_2d = [ [0.85, 0.6], [0.85, 0.6], [0.85, 0.85], [0.85, 0.6] ]

means that there are 8 regions in total (2x4), and at most 85% of the available resource on the left half of SLR 0 can be used, 60% of the right half of SLR 0 can be used, 85% of either the right and the left half of SLR 2 can be used, etc.

Outputs

The tool will produce:

  • A new RTL file corresponding to the top HLS function that has been additionally pipelined based on the floorplanning results.

  • A tcl script containing the floorplanning information.

Usage

  • Step 1: compile your HLS design using Vivado HLS.

  • Step 2: invoke AutoBridge to generate the floorplan file and transform the top RTL file.

  • Step 3: pack the output from Vivado HLS and AutoBridge together into an xo file.

  • Step 4: invoke Vitis for implementation.

Reference scripts for step 1, 3, 4 are provided in the reference-scripts folder. For step 2, we attach the AutoBridge script along with each benchmark design.

Issues

  • Should use mip version 1.8.1.

  • If you encounter the situation where the mip package complains that multiprocessing cannot be found, please upgrade the pyverilog to the latest release. Or if you run the program a second time things may work out.

  • In the divide-and-conquer approach, if a region is packed close to the max_usage_ratio, then it's possible that the next split will fail because a function cannot be split into two sub regions. The current work-around is to increase the max_usage_ratio a little bit.

  • Function names in the HLS program should not contain "fifo" or "FIFO".

FPGA'21 Artifact Review

The experiment results for all benchmarks in our submission to FPGA'21 are available at: https://ucla.box.com/s/5hpgduqrx93t2j4kx6fflw6z15oylfhu

Currently only a subset of the source code of the benchmarks are open-sourced here, as some designs are not published yet and will be updated later.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autobridge-0.0.20220408.dev2.tar.gz (60.6 kB view hashes)

Uploaded Source

Built Distribution

autobridge-0.0.20220408.dev2-py3-none-any.whl (79.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page