Quantum Relative Entropy + Petz Recovery toolkit on Apple Silicon via MLX
Project description
mlx-qre
Quantum Relative Entropy + Petz Recovery toolkit on Apple Silicon via MLX.
$$\Sigma = D(\rho | \sigma) = \mathrm{Tr}[\rho(\ln\rho - \ln\sigma)]$$
A complete, self-contained library for quantum information quantities (QRE, von Neumann entropy, Rényi / JSD), quantum channels (thermal attenuator, depolarizing, dephasing) and Petz recovery analysis (recovery map, fidelity, retrodiction). Built as the computational companion to the Σ = 2 ln Q entropy-production framework — see petz-recovery-unification and tau-chrono.
Performance note. Eigendecomposition runs on the Metal GPU via MLX. For small matrices (N < 500), NumPy + Accelerate (CPU) is typically faster due to lower dispatch overhead — use NumPy if you only need a handful of small QREs. The GPU path becomes useful for batched evaluation and N ≥ 500, where it pulls ahead of NumPy. See benchmark_results.md for the full breakdown.
Installation
pip install -e .
Requires Python 3.10+ and Apple Silicon (M1/M2/M3/M4).
Quick Start
import mlx.core as mx
from mlx_qre import quantum_relative_entropy, random_density_matrix
# Two 100x100 density matrices on GPU
rho = random_density_matrix(100)
sigma = random_density_matrix(100)
# Compute D(rho || sigma) — eigendecomposition runs on Metal GPU
D = quantum_relative_entropy(rho, sigma)
mx.eval(D)
print(f"D(rho || sigma) = {D.item():.6f}")
# Batched: 500 pairs simultaneously
rho_batch = random_density_matrix(50, batch_size=500)
sigma_batch = random_density_matrix(50, batch_size=500)
D_batch = quantum_relative_entropy(rho_batch, sigma_batch)
Stochastic Lanczos backend (large N)
For N >= 1000 the O(N^3) eigendecomposition is the bottleneck and the
Metal eigh kernel can hit GPU command-buffer timeouts past N = 2000.
mlx-qre ships a Stochastic Lanczos Quadrature (SLQ) estimator that
replaces the eigendecomposition with O(k * m * N^2) matvecs.
This implementation runs entirely on MLX (no NumPy fallback in the hot
path). All m probe vectors are stacked as a single (N, m) matrix and
each Lanczos step is a single block matmul A @ V, which both
amortises GPU dispatch overhead and lets MLX tile the m probes across
the GPU. Only the small (k, k) tridiagonal eigh is delegated to MLX's
CPU eigh stream (k <= 30 -- CPU is the right place for it).
import mlx.core as mx
from mlx_qre import (
random_density_matrix,
quantum_relative_entropy,
von_neumann_entropy_lanczos,
quantum_relative_entropy_lanczos,
)
rho = random_density_matrix(2000)
sigma = random_density_matrix(2000)
# Direct API
S = von_neumann_entropy_lanczos(rho, k=25, m=20, seed=0)
D = quantum_relative_entropy_lanczos(rho, sigma, k=25, m=20, seed=0)
# Or via the main API with method="lanczos"
D_mx = quantum_relative_entropy(rho, sigma, method="lanczos", k=25, m=20, seed=0)
Typical accuracy at default k=25, m=20:
S(rho): ~1-2% median relative error (clean SLQ on a single matrix)D(rho || sigma): ~3-7% median relative error (cross-term has higher variance becauseTr[rho ln sigma]is not a single-matrix spectral function). Increasemfor tighter accuracy.
Speed crossover (M1 Max, dense complex Haar-random density matrices,
SLQ at k=25, m=20):
| N | MLX eigh (ms) | NumPy eigh (ms) | SLQ pure-MLX (ms) |
|---|---|---|---|
| 100 | 4 | 4 | 19 |
| 500 | 160 | 287 | 20 |
| 1000 | 1077 | 1988 | 24 |
| 2000 | timeout | 24048 | 37 |
SLQ wins from N >= 500 and the gap blows up at N >= 1000
(~50-80x vs MLX eigh, ~80-660x vs NumPy eigh). The pure-MLX block
matvec hot path is also ~30-140x faster than the previous
NumPy-Accelerate hot path on the same machine. See
benchmark_results.md for the full breakdown.
Features
| Module | Function | Description |
|---|---|---|
qre |
quantum_relative_entropy(rho, sigma, method="exact"|"lanczos") |
D(rho || sigma) via Metal eigendecomposition or SLQ |
qre |
von_neumann_entropy(rho) |
S(rho) = -Tr[rho ln rho] |
qre |
relative_entropy_pure_state(psi, sigma) |
Efficient D for pure states: -ln(psi|sigma|psi) |
lanczos |
von_neumann_entropy_lanczos(rho, k, m) |
S(rho) via Stochastic Lanczos Quadrature |
lanczos |
quantum_relative_entropy_lanczos(rho, sigma, k, m) |
D via Hutchinson + Lanczos |
lanczos |
stochastic_lanczos_logtr(A, k, m) |
Tr[A ln A] for any Hermitian PSD A |
classical |
kl_divergence(p, q) |
Classical KL divergence |
classical |
jensen_shannon_divergence(p, q) |
Symmetric JSD |
classical |
renyi_divergence(p, q, alpha) |
Renyi divergence of order alpha |
channels |
thermal_attenuator(eta) |
Gravitational channel eta = -g_00 |
channels |
channel_entropy_production(K, rho, sigma) |
Sigma through channel |
channels |
depolarizing_channel(p) |
Depolarizing noise |
channels |
dephasing_channel(gamma) |
Dephasing noise |
petz |
petz_recovery_map(K, sigma) |
Construct Petz recovery R |
petz |
petz_recovery_fidelity(rho, sigma, K) |
F(rho, R o N(rho)) |
petz |
verify_petz_bound(rho, sigma, K) |
Check F >= exp(-Sigma/2) |
petz |
retrodiction_quality(rho, sigma, K) |
tau = 1 - F |
Use Cases
- Gravitational entropy production: Sigma_grav = D(N_eta(rho) || N_eta(sigma)) with eta = 1/Q^2
- Quantum channel analysis: entropy production, data processing inequality
- Petz recovery bounds: F >= exp(-Sigma/2), retrodiction quality
- Quantum ML: kernel methods using QRE as a distance measure
- Neural entropy: EEG/neural signal entropy production analysis
Benchmark
python -m mlx_qre.benchmark
Compares MLX (Apple Silicon GPU) vs NumPy (CPU) across matrix sizes N = 10 to 1000. Summary on M1 Max:
| N | MLX (ms) | NumPy (ms) | MLX vs NumPy |
|---|---|---|---|
| 10 | 0.74 | 0.04 | 0.06× (NumPy wins) |
| 100 | 3.95 | 3.57 | 0.90× |
| 500 | 158 | 278 | 1.76× |
| 1000 | 1042 | 2010 | 1.93× |
For batched QRE at N=100 the GPU sustains ~460 pairs/sec. Use NumPy for one-off small problems; reach for mlx-qre for batched / large-N work and for the integrated Petz / channel utilities below. See benchmark_results.md for the full table.
Tests
pip install -e ".[dev]"
pytest tests/ -v
Theory
The quantum relative entropy D(rho || sigma) is the quantum generalization of KL divergence. In the retrocausality framework:
- Sigma = 2 ln Q: unified entropy production formula
- Petz bound: F >= exp(-Sigma/2) quantifies retrodiction cost
- tau = 1 - F: retrodiction deficit (0 = perfect, 1 = irreversible)
- Zero-entropy limit: Sigma -> 0 implies perfect retrodiction (no time arrow)
License
MIT License. Copyright (c) 2026 Sheng-Kai Huang.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlx_qre-0.2.0.tar.gz.
File metadata
- Download URL: mlx_qre-0.2.0.tar.gz
- Upload date:
- Size: 32.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d7e32eec37f4e6cfaab0a804f717519a1abb110eab35f40c57b10fcd685f818f
|
|
| MD5 |
ea4367f2be536955ac606f671e765fd9
|
|
| BLAKE2b-256 |
45f8ee331bb3ce15895b4f6e53b094d4b9af42b791e2467031edde1749742f10
|
File details
Details for the file mlx_qre-0.2.0-py3-none-any.whl.
File metadata
- Download URL: mlx_qre-0.2.0-py3-none-any.whl
- Upload date:
- Size: 27.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5ea60e04d7fb336fbcbce8ee37fc8c51ed6b0c55a304a8e9d17a17304733307
|
|
| MD5 |
a7415f445116aa79d7cffe6c14594dab
|
|
| BLAKE2b-256 |
3e46a9bf605f32f1d12638007fa4f9502d639fd8e97c6075b6ce2e6fce9d4c37
|