CPU-based performance estimation tool for accelerator workloads
Project description
accel-sim 🚀
accel-sim is a high-performance, CPU-based estimation tool designed to predict the latency and memory footprint of Transformer-based workloads on hardware accelerators (GPUs, TPUs).
By hooking into the PyTorch computation graph, accel-sim applies an analytical Roofline Model to estimate performance without requiring access to the physical hardware.
🌟 Real-World Scenarios
1. Cost-Benefit Analysis for Cloud Renting
Before spinning up an expensive A100 or H100 cluster, use accel-sim to determine if your model actually benefits from the extra TFLOPS. If your workload is memory-bandwidth bound, a cheaper V100 might offer nearly identical performance.
2. OOM (Out-of-Memory) Prediction
Predict exactly at what sequence length or batch size your model will crash on a 16GB V100 vs. a 40GB A100. This is critical for production deployment planning and avoiding runtime failures.
3. CI/CD Performance Guardrails
Integrate accel-sim into your GitHub Actions or Jenkins pipelines. Since it runs entirely on CPU, you can catch performance regressions (e.g., a change that accidentally doubles the FLOPs of your attention layer) before they are merged, without needing GPU runners.
4. Hardware-Aware Model Design
Identify which specific ops (e.g., Softmax vs. Matmul) are your primary bottlenecks. This helps you decide where to apply optimizations like operator fusion, quantization, or specialized kernels like FlashAttention.
5. Custom Hardware Evaluation
Designing a new AI chip? Define a custom DeviceProfile with your theoretical TFLOPS and memory bandwidth to see how state-of-the-art models like Llama-3 or GPT-4 would perform on your architecture.
🛠 Installation
# Clone the repository
git clone https://github.com/your-repo/accel-sim.git
cd accel-sim
# Create a virtual environment and install
python3 -m venv venv
source venv/bin/activate
pip install -e ".[dev]"
🚀 Usage
1. Simulate a Model
Estimate performance for a specific model script and device:
accel-sim simulate examples/gpt_block.py --device v100
2. Compare Across Devices
Benchmark your model across multiple hardware generations:
accel-sim compare examples/gpt_block.py --devices v100,a100,h100
3. List Device Profiles
See the specifications used for estimation:
accel-sim devices
🏗 Architecture
- Capture: Uses
torch.fxandmake_fxto extract a detailed Aten-level computation graph. - Cost Model: Applies a Roofline Model:
time = max(flops / peak_tflops, bytes / peak_bandwidth). - Simulator: Walks the graph sequentially, tracking live activation memory and cumulative latency.
- Reporter: Generates a detailed breakdown of bottlenecks and OOM warnings.
📊 Example Output
Device: V100 (125.0 TFLOPS, 900.0 GB/s, 16.0 GB)
Total latency: 42.30 ms
Peak memory: 9.10 GB ✓ (fits in device memory)
Top bottlenecks:
1. attn_qk_matmul 27.60 ms (65.2%)
2. mlp_fc1_matmul 8.50 ms (20.1%)
3. attn_softmax 3.20 ms (7.6%)
📜 License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file accel_sim-0.1.0.tar.gz.
File metadata
- Download URL: accel_sim-0.1.0.tar.gz
- Upload date:
- Size: 11.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f13e19cc068cca7abf5fcfbe2dae657767a2431815a4c854380f36930254e239
|
|
| MD5 |
f9f1f513f67e4ee23a2bc9bda8226c20
|
|
| BLAKE2b-256 |
4942ff3719b651acf2ae6a0340a01e9a2130e38a71ea7339060e50ee46a42988
|
File details
Details for the file accel_sim-0.1.0-py3-none-any.whl.
File metadata
- Download URL: accel_sim-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3b7e697870dd8b2e62302485bdbb9e76f9517120cae4bf5bfe061c7adcc24b96
|
|
| MD5 |
8845e692acdd48306e3d2bc3a666e0d7
|
|
| BLAKE2b-256 |
957b152147c4b9c1a43d4c5e74e760de652e0dde8104964b38ea2104b0462143
|