Extending High-Level Synthesis for Task-Parallel Programs
Project description
TAPA
TAPA is a dataflow HLS framework that features fast compilation, expressive programming model and generates high-frequency FPGA accelerators.
High-Frequency
-
TAPA explicitly decouples communication and computation for better QoR.
-
TAPA integrates the AutoBridge floorplanner to optimize the RTL generation process.
-
TAPA achieves 2× higher the frequency on average compared to Vivado. 1
Speed
-
TAPA compiles 7× faster than Vitis HLS. 2
-
TAPA provides 3× faster software simulation than Vitis HLS.2
-
TAPA provides 8× faster RTL simulation than Vitis.
-
[in-progress] TAPA is integrating RapidStream that is up to 10× faster than Vivado.3
Expressiveness
-
TAPA extends the Vitis HLS syntax for richer expressiveness at the C++ level.
-
TAPA provides dedicated APIs for arbitrary external memory access patterns.
-
TAPA allows users to explicitly specify parallelism.
-
In addition to static burst analysis, TAPA supports runtime burst detectuion by transparently merging small memory transactions into large bursts.
HBM-Specific Optimizations
-
TAPA significantly reduce the area overhead of HBM interface IPs compared to Vitis HLS.
-
TAPA includes an automated design space exploration tool to balance the resource pressure and the wire pressure for HBM FPGAs.
-
TAPA automatically select the physical channel for each top-level argument of your accelerator.
Successful Cases
- Serpens, DAC'22, achieves 270 MHz on the Xilinx Alveo U280 HBM board when using 24 HBM channels. The Vitis HLS baseline failed in routing.
- Sextans, FPGA'22, achieves 260 MHz on the Xilinx Alveo U250 board when using 4 DDR channels. The Vivado baseline achieves only 189 MHz.
- SPLAG, FPGA'22, achieves up to a 4.9× speedup over state-of-the-art FPGA accelerators, up to a 2.6× speedup over 32-thread CPU running at 4.4 GHz, and up to a 0.9× speedup over an A100 GPU (that has 4.1× power budget and 3.4× HBM bandwidth).
- AutoSA Systolic-Array Compiler, FPGA'21:
- KNN, FPT'20, achieves 252 MHz on the Xilinx Alveo U280 board. The Vivado baseline achieves only 165 MHz.
Getting Started
TAPA Publications
- Yuze Chi, Licheng Guo, Jason Lau, Young-kyu Choi, Jie Wang, Jason Cong. Extending High-Level Synthesis for Task-Parallel Programs. In FCCM, 2021. [PDF] [Code] [Slides] [Video]
- Licheng Guo, Yuze Chi, Jie Wang, Jason Lau, Weikang Qiao, Ecenur Ustun, Zhiru Zhang, Jason Cong. AutoBridge: Coupling Coarse-Grained Floorplanning and Pipelining for High-Frequency HLS Design on Multi-Die FPGAs. In FPGA, 2021. (Best Paper Award) [PDF] [Code] [Slides] [Video]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tapa-0.0.20240825.1.tar.gz
.
File metadata
- Download URL: tapa-0.0.20240825.1.tar.gz
- Upload date:
- Size: 98.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 977c16ec681a98d67580cc5166cfd96dd4c9b1968dc459e9808ccbccee8286fb |
|
MD5 | 5e71598aa74411a87e7ec8ab6985514a |
|
BLAKE2b-256 | 24a8796065c9a6b9859f78137ea472a39a78428ff2693a43045fad2b202c8da3 |
File details
Details for the file tapa-0.0.20240825.1-py3-none-any.whl
.
File metadata
- Download URL: tapa-0.0.20240825.1-py3-none-any.whl
- Upload date:
- Size: 126.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4273af1cde651b48ba468f67d6b61ce405c09c4f49e698c40b7c95deb8d8bb6 |
|
MD5 | 725e7c310609aedd33fb8ba451c5a970 |
|
BLAKE2b-256 | d775cc22e30f16c0d57d6b78627c37cb59ae26ae8ee104a6d0f69d27c05c329f |