Skip to main content

Direct-index quantum state simulator (GHZ-focused)

Project description

qsim-uma: Quantum Circuit Simulation on Unified Memory Architecture

Benchmark suite comparing 11 Python backends for quantum state-vector simulation on Apple Silicon (M4 Pro, 48 GB unified memory). Accompanies the paper:

"A Controlled Study of Memory Hierarchy Transitions in Quantum State-Vector Simulation on Unified Memory Architecture"


Key Findings

  • DRAM cliff at 28→29q: state vector grows from 2.1 GB to 4.3 GB, exceeding the M4 Pro L3 cache. Tensordot backends show a 3.8–4.5× step; direct-index backends show ~2.1×.
  • Direct-index is consistently DRAM-bound: scatter-write access pattern prevents cache reuse even at 27q, so there is no cache phase to collapse from — hence the flat ~2× scaling.
  • Cliff location is circuit-independent: GHZ (O(n) gates) and QFT (O(n²) gates) produce the same cliff magnitude for each backend (±0.1×).
  • JAX XLA/AMX CPU ≈ MLX Metal GPU for tensordot workloads — both are memory-bandwidth-bound on the same DRAM.
  • 30q performance (fastest → slowest): J (MLX GPU direct-index, 11.3s) → F (MLX GPU tensor, 23.4s) → H (MLX GPU flat, 32.3s) → C (JAX CPU tensordot, 38.3s) → …

Backend Legend

Key Backend Framework Device Algorithm
A Brute-force NumPy NumPy CPU Dense matmul
B pykronecker pykronecker CPU Kronecker product
C JAX CPU tensordot JAX CPU (AMX) tensordot
D NumPy direct-index NumPy CPU Direct-index scatter-write
F MLX GPU tensor MLX GPU (Metal) tensordot
G MLX CPU tensor MLX CPU tensordot
H MLX GPU flat MLX GPU (Metal) Flat-index
I MLX CPU flat MLX CPU Flat-index
J MLX GPU direct-index MLX GPU (Metal) Direct-index scatter-write
K MLX CPU direct-index MLX CPU Direct-index scatter-write

Backend A terminated before 16q (OOM). Backend E not listed (internal numbering gap).


Requirements

  • Apple Silicon Mac (tested: M4 Pro, 48 GB unified memory, macOS 15)
  • Python 3.12
  • caffeinate (built into macOS) to prevent sleep during long runs
pip install -r requirements.txt
# or: conda env create -f environment.yml

Pinned versions used to produce the published results:

mlx==0.29.3
jax==0.4.30  jaxlib==0.4.30
numpy==2.0.2

Thermal monitoring (required for thermally isolated benchmarks): The isolation scripts read thermal pressure via sudo powermetrics. Add this line to avoid interactive password prompts:

echo "ALL ALL=(root) NOPASSWD: /usr/bin/powermetrics" | sudo tee /etc/sudoers.d/powermetrics

Run python3 thermal_monitor.py to verify the setup before benchmarking.


Repository Layout

experiments/
  1_ghz_statistical/scripts/  # Exp 1: GHZ, 11 backends, N=7, 3–30q
  2_ghz_isolated/scripts/     # Exp 2: GHZ thermally isolated, N=5, 27–30q (backends C,F,G,H,I,J,K)
  3_qft_single_run/scripts/   # Exp 3: QFT, 11 backends, N=1, 3–30q
  4_qft_isolated/scripts/     # Exp 4: QFT thermally isolated, N=3, 27–30q

scripts/                      # Canonical top-level scripts (mirrors experiment scripts)
  bench_cliff_isolated.py     # Thermally isolated cliff benchmark (27–30q)
  quantum_benchmark.py        # Full 11-backend statistical benchmark (3–30q)
  bench_qft.py                # QFT benchmark
  verify_ghz.py               # Correctness checks
  verify_qft.py
  verify_direct_index.py
  stream_probe.py             # STREAM bandwidth measurement

publication_scripts/          # Regenerate paper figures (data hardcoded in scripts)
  plot_ghz_speedup.py         → fig3
  plot_qft_speedup.py         → fig4
  plot_circuit_independence.py → fig5

figures/                      # Pre-generated publication figures (fig1–fig5)

thermal_monitor.py            # Verify thermal monitoring before benchmarking
requirements.txt
environment.yml

Reproducing the Paper Figures

Figures 3–5 can be regenerated directly from the scripts (data is hardcoded):

python3 publication_scripts/plot_ghz_speedup.py
python3 publication_scripts/plot_qft_speedup.py
python3 publication_scripts/plot_circuit_independence.py

Figures 1–2 are included as pre-generated PNGs; their generation scripts depend on raw log files not distributed in this repo.


Reproducing the Benchmark Data

Run under caffeinate to prevent the system from sleeping mid-benchmark.

Thermally isolated GHZ cliff (~2.5 hours):

caffeinate -i python3 scripts/bench_cliff_isolated.py --circuit ghz --backends C,F,J,K

Thermally isolated QFT cliff (~5–6 hours):

caffeinate -i python3 scripts/bench_cliff_isolated.py --circuit qft --backends C,F,J,K

Full 11-backend GHZ statistical benchmark (~several hours):

caffeinate -i python3 scripts/quantum_benchmark.py

--no-cool skips thermal recovery and is for development/smoke-testing only. Results from --no-cool runs will not match the paper data.


Correctness Verification

python3 scripts/verify_ghz.py
python3 scripts/verify_qft.py
python3 scripts/verify_direct_index.py

Citation


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qsim_uma-0.1.0.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qsim_uma-0.1.0-py3-none-any.whl (4.2 kB view details)

Uploaded Python 3

File details

Details for the file qsim_uma-0.1.0.tar.gz.

File metadata

  • Download URL: qsim_uma-0.1.0.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for qsim_uma-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5aa250784506ec20b7669402f7c16bba6b21ac5eb66b27063d48f6a88ef9fdeb
MD5 2f7b0773e027d5a5a26403902c2c905b
BLAKE2b-256 d2dd6b7e67998e94f2a01c53a1cd41652e4749f43aecb6a71cf7c12823ddc553

See more details on using hashes here.

File details

Details for the file qsim_uma-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: qsim_uma-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for qsim_uma-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4df6edcfa63dc34bb3f5f481b363a585fd2f50dfe56eae5ec940d727b9c624d5
MD5 20ea6fbef41ae2624ffde482bf5be1a2
BLAKE2b-256 0c25601d165048cf873a1b2a1d7996c1266eb93e2754e774a4877501d0c8c905

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page