GPU exact arithmetic - 512-bit precision, zero accumulation error
Project description
SimGen VLA - Zero-Error GPU Arithmetic
Drop-in PyTorch replacement with exact arithmetic. No accumulation error. Ever.
Free during beta - Use freely for research, academic, and commercial projects.
Support development: ko-fi.com/kyleclouthier
The Problem: Floating-Point Lies
Every GPU computation accumulates tiny errors. These errors compound silently until your results are wrong.
import torch
# Classic floating-point failure
x = torch.tensor([1e16, 1.0, -1e16])
print(x.sum()) # 0.0 <- WRONG! Should be 1.0
# 10 million additions - error explodes
values = torch.ones(10_000_000) * 0.1
print(values.sum()) # 999999.9880... <- Should be 1000000.0
This affects: financial calculations, scientific simulations, physics engines, signal processing, cryptography, and any computation requiring precision.
The Solution: SimGen VLA
from simgen import vla
# Exact arithmetic - mathematically correct
x = vla.tensor([1e16, 1.0, -1e16])
print(x.sum()) # 1.0 <- CORRECT!
# 10 million additions - still exact
values = vla.ones(10_000_000) * 0.1
print(values.sum()) # 1000000.0 <- EXACTLY correct
No code changes. Same PyTorch API. Just import vla instead of torch.
Installation
pip install simgen-vla
Requirements:
- Python 3.10, 3.11, or 3.12
- PyTorch 2.0+ with CUDA
- CuPy (matching your CUDA version:
pip install cupy-cuda11xorcupy-cuda12x) - NVIDIA GPU (Pascal through Hopper: sm_60 to sm_90)
Platforms: Windows, Linux
What's New in v5.2.0
- Focused API: 57 exact GPU operations for universal computing
- SVD Support: Singular Value Decomposition via Jacobi rotations
- Exact I/O: All inputs automatically converted to exact representation
- Linear Algebra Suite: LU, QR, eigenvalues, determinant, inverse
Use Cases
Financial Computing
Mixed-magnitude calculations where every cent matters:
from simgen import vla
# Portfolio with massive range - standard FP loses the pennies
positions = vla.tensor([
1_000_000_000.00, # $1 billion position
0.01, # 1 cent transaction fee
-999_999_999.99, # Large short position
50_000.50, # Medium holding
])
total = positions.sum()
print(f"Portfolio: ${float(total):,.2f}") # $50,000.52 - exact!
Scientific Simulation
Physics simulations that don't drift over time:
from simgen import vla
# Chaotic system (Lorenz attractor)
def lorenz_step(state, dt=0.01):
x, y, z = state[0], state[1], state[2]
sigma, rho, beta = 10.0, 28.0, 8.0/3.0
dx = sigma * (y - x)
dy = x * (rho - z) - y
dz = x * y - beta * z
return vla.tensor([x + dx * dt, y + dy * dt, z + dz * dt])
# Run forward then backward - returns to EXACTLY initial state
state = vla.tensor([1.0, 1.0, 1.0])
initial = state.clone()
for _ in range(10000):
state = lorenz_step(state, dt=0.01)
for _ in range(10000):
state = lorenz_step(state, dt=-0.01)
error = (state - initial).abs().sum()
print(f"Reversal error: {float(error)}") # 0.0 with VLA!
Linear Algebra
Exact matrix decompositions and solvers:
from simgen import vla
# Matrix operations
A = vla.randn((100, 100))
B = vla.randn((100, 100))
C = vla.matmul(A, B) # Exact matrix multiply
# LU Decomposition
L, U = vla.lu(A)
# QR Decomposition
Q, R = vla.qr(A)
# Eigenvalues (power iteration)
eigenvalue, eigenvector = vla.eig(A)
# Matrix inverse and determinant
A_inv = vla.inv(A)
det = vla.det(A)
# Solve linear system: Ax = b
x = vla.solve(A, b)
Signal Processing
FFT and convolutions with exact arithmetic:
from simgen import vla
# 2D Convolution
signal = vla.randn((1, 3, 64, 64))
kernel = vla.randn((16, 3, 3, 3))
output = vla.conv2d(signal, kernel)
Complete API Reference
Tensor Creation
from simgen import vla
x = vla.tensor([1.0, 2.0, 3.0]) # From list
z = vla.zeros((3, 3)) # Zeros
o = vla.ones((100,)) # Ones
r = vla.randn((10, 10)) # Random normal
u = vla.rand((5, 5)) # Random uniform [0,1]
a = vla.arange(0, 10) # Range [0,1,2,...,9]
l = vla.linspace(0, 1, 100) # 100 points from 0 to 1
I = vla.eye(5) # 5x5 identity matrix
Arithmetic Operations
c = a + b # Exact addition
c = a - b # Exact subtraction
c = a * b # Exact multiplication
c = a / b # Exact division
c = -a # Negation
c = a ** 2 # Power
Reductions (Zero Drift)
total = vla.sum(x) # Exact sum
avg = vla.mean(x) # Exact mean
product = vla.prod(x) # Exact product
minimum = vla.min(x) # Minimum
maximum = vla.max(x) # Maximum
std_dev = vla.std(x) # Standard deviation
variance = vla.var(x) # Variance
Linear Algebra
C = vla.matmul(A, B) # Matrix multiplication
C = vla.mm(A, B) # Matrix-matrix multiply
y = vla.mv(A, x) # Matrix-vector multiply
d = vla.dot(a, b) # Dot product
C = vla.bmm(A, B) # Batched matrix multiply
L, U = vla.lu(A) # LU decomposition
Q, R = vla.qr(A) # QR decomposition
e, v = vla.eig(A) # Eigenvalue (power iteration)
det = vla.det(A) # Determinant
inv = vla.inv(A) # Matrix inverse
x = vla.solve(A, b) # Solve Ax = b
Math Functions
y = vla.exp(x) # Exponential
y = vla.log(x) # Natural log
y = vla.sqrt(x) # Square root
y = vla.abs(x) # Absolute value
y = vla.sin(x) # Sine
y = vla.cos(x) # Cosine
y = vla.tan(x) # Tangent
y = vla.tanh(x) # Hyperbolic tangent
y = vla.sigmoid(x) # Sigmoid
Activations
y = vla.relu(x) # ReLU
y = vla.gelu(x) # GELU
y = vla.silu(x) # SiLU/Swish
y = vla.softmax(x) # Softmax
Shape Operations
y = vla.reshape(x, (2, 3)) # Reshape
y = vla.transpose(x, 0, 1) # Transpose dims
y = vla.squeeze(x) # Remove size-1 dims
y = vla.unsqueeze(x, 0) # Add dimension
y = vla.stack([a, b, c]) # Stack tensors
y = vla.cat([a, b]) # Concatenate
Exact Output
# Get TRUE exact value as Python Decimal
result = x.sum()
exact_value = result.to_decimal() # Decimal('1.0') - mathematically exact
# SHA256 checksum for verification
hash_val = result.checksum() # Verify across systems
Supported GPUs
| Architecture | Example GPUs | Compute Capability |
|---|---|---|
| Pascal | GTX 1080, P100, P40 | sm_60, sm_61 |
| Volta | V100, Titan V | sm_70 |
| Turing | RTX 2080, T4, Quadro RTX | sm_75 |
| Ampere | RTX 3090, A100, A10 | sm_80, sm_86 |
| Ada Lovelace | RTX 4090, 4080, 4070, L40 | sm_89 |
| Hopper | H100, H200 | sm_90 |
Cloud Support: AWS (P3, P4, G4, G5), GCP (T4, A100, L4), Azure (NC, ND series), Kaggle (T4 x2 free), Colab
Benchmarks
| Operation | Elements | PyTorch Error | VLA Error |
|---|---|---|---|
| Sum | 10M | 10^-7 relative | 0.0 |
| Dot Product | 1M | 10^-8 relative | 0.0 |
| Matrix Multiply | 1000x1000 | 10^-6 relative | 0.0 |
| Chained Ops | 1000 iterations | Diverges | Exact |
FAQ
Q: Is this slower than PyTorch? A: Slightly. The overhead is typically 2-5x, which is negligible for applications where correctness matters.
Q: What about CPU? A: GPU required. VLA's exact arithmetic relies on native CUDA kernels - no CPU support.
Q: Can I verify results across systems?
A: Yes! Use to_decimal() for exact values or checksum() for verification.
Support & Contact
Website: simgen.dev
Support Development: ko-fi.com/kyleclouthier
Email: kyle@simgen.dev
GitHub: github.com/DigitalMax321/simgen
License
Proprietary. Free during beta for research, academic, and commercial use.
(c) 2025-2026 Clouthier Simulation Labs. All rights reserved.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file simgen_vla-6.1.0-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: simgen_vla-6.1.0-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 5.1 MB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9dcef968cd9e66a5e11029373198f5c710e8a1ccfc95b46a7479132d19a43d9
|
|
| MD5 |
a75ee29c622e772fdedc79fbd761dd7c
|
|
| BLAKE2b-256 |
45da604ecf51528e14cb8a24c508e67a2fcef14a4b129565e87bcfcf7364a95b
|
File details
Details for the file simgen_vla-6.1.0-cp312-cp312-manylinux_2_17_x86_64.whl.
File metadata
- Download URL: simgen_vla-6.1.0-cp312-cp312-manylinux_2_17_x86_64.whl
- Upload date:
- Size: 7.0 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a4d150ac0f6c2302e0e2c572117b51a5820324c217e65beeb9143e33111a8f4
|
|
| MD5 |
d6a8f2b047f4b52c75875535ec22ec9a
|
|
| BLAKE2b-256 |
ddf204774bdc7e30d3ec74ee732b09177c5491fb610c5c8653f1132060fb6220
|
File details
Details for the file simgen_vla-6.1.0-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: simgen_vla-6.1.0-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 5.1 MB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c6c990e695fc75ca4f759aa04aab9e9e741080b6d2a4b8c98b2bb34983c57e8
|
|
| MD5 |
77fff56d153ae0e3ce0c094a05c9562c
|
|
| BLAKE2b-256 |
f812c6fac6802266efbadd759f1454928e1c3881d2f2570c855f95907ce4b1fd
|