CPU-based performance estimation tool for accelerator workloads

Project description

accel-sim 🚀

accel-sim is a high-performance, CPU-based estimation tool designed to predict the latency and memory footprint of Transformer-based workloads on hardware accelerators (GPUs, TPUs).

By hooking into the PyTorch computation graph, accel-sim applies an analytical Roofline Model to estimate performance without requiring access to the physical hardware.

🌟 Real-World Scenarios

1. Cost-Benefit Analysis for Cloud Renting

Before spinning up an expensive A100 or H100 cluster, use accel-sim to determine if your model actually benefits from the extra TFLOPS. If your workload is memory-bandwidth bound, a cheaper V100 might offer nearly identical performance.

2. OOM (Out-of-Memory) Prediction

Predict exactly at what sequence length or batch size your model will crash on a 16GB V100 vs. a 40GB A100. This is critical for production deployment planning and avoiding runtime failures.

3. CI/CD Performance Guardrails

Integrate accel-sim into your GitHub Actions or Jenkins pipelines. Since it runs entirely on CPU, you can catch performance regressions (e.g., a change that accidentally doubles the FLOPs of your attention layer) before they are merged, without needing GPU runners.

4. Hardware-Aware Model Design

Identify which specific ops (e.g., Softmax vs. Matmul) are your primary bottlenecks. This helps you decide where to apply optimizations like operator fusion, quantization, or specialized kernels like FlashAttention.

5. Custom Hardware Evaluation

Designing a new AI chip? Define a custom DeviceProfile with your theoretical TFLOPS and memory bandwidth to see how state-of-the-art models like Llama-3 or GPT-4 would perform on your architecture.

🛠 Installation

# Clone the repository
git clone https://github.com/your-repo/accel-sim.git
cd accel-sim

# Create a virtual environment and install
python3 -m venv venv
source venv/bin/activate
pip install -e ".[dev]"

🚀 Usage

1. Simulate a Model

Estimate performance for a specific model script and device:

accel-sim simulate examples/gpt_block.py --device v100

2. Compare Across Devices

Benchmark your model across multiple hardware generations:

accel-sim compare examples/gpt_block.py --devices v100,a100,h100

3. List Device Profiles

See the specifications used for estimation:

accel-sim devices

🏗 Architecture

Capture: Uses torch.fx and make_fx to extract a detailed Aten-level computation graph.
Cost Model: Applies a Roofline Model: time = max(flops / peak_tflops, bytes / peak_bandwidth).
Simulator: Walks the graph sequentially, tracking live activation memory and cumulative latency.
Reporter: Generates a detailed breakdown of bottlenecks and OOM warnings.

📊 Example Output

Device: V100 (125.0 TFLOPS, 900.0 GB/s, 16.0 GB)

Total latency:   42.30 ms
Peak memory:      9.10 GB  ✓ (fits in device memory)

Top bottlenecks:
  1. attn_qk_matmul       27.60 ms  (65.2%)
  2. mlp_fc1_matmul        8.50 ms  (20.1%)
  3. attn_softmax          3.20 ms   (7.6%)

📜 License

MIT

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Apr 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

accel_sim-0.1.0.tar.gz (11.9 kB view details)

Uploaded Apr 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

accel_sim-0.1.0-py3-none-any.whl (11.1 kB view details)

Uploaded Apr 20, 2026 Python 3

File details

Details for the file accel_sim-0.1.0.tar.gz.

File metadata

Download URL: accel_sim-0.1.0.tar.gz
Upload date: Apr 20, 2026
Size: 11.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for accel_sim-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`f13e19cc068cca7abf5fcfbe2dae657767a2431815a4c854380f36930254e239`
MD5	`f9f1f513f67e4ee23a2bc9bda8226c20`
BLAKE2b-256	`4942ff3719b651acf2ae6a0340a01e9a2130e38a71ea7339060e50ee46a42988`

See more details on using hashes here.

File details

Details for the file accel_sim-0.1.0-py3-none-any.whl.

File metadata

Download URL: accel_sim-0.1.0-py3-none-any.whl
Upload date: Apr 20, 2026
Size: 11.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for accel_sim-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3b7e697870dd8b2e62302485bdbb9e76f9517120cae4bf5bfe061c7adcc24b96`
MD5	`8845e692acdd48306e3d2bc3a666e0d7`
BLAKE2b-256	`957b152147c4b9c1a43d4c5e74e760de652e0dde8104964b38ea2104b0462143`

See more details on using hashes here.

accel-sim 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

accel-sim 🚀

🌟 Real-World Scenarios

1. Cost-Benefit Analysis for Cloud Renting

2. OOM (Out-of-Memory) Prediction

3. CI/CD Performance Guardrails

4. Hardware-Aware Model Design

5. Custom Hardware Evaluation

🛠 Installation

🚀 Usage

1. Simulate a Model

2. Compare Across Devices

3. List Device Profiles

🏗 Architecture

📊 Example Output

📜 License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes