GPU exact arithmetic - 512-bit precision, zero accumulation error

These details have not been verified by PyPI

Project links

Project description

SimGen VLA - Zero-Error GPU Arithmetic

Drop-in PyTorch replacement with exact arithmetic. No accumulation error. Ever.

Free during beta - Use freely for research, academic, and commercial projects.

Support development: ko-fi.com/kyleclouthier

The Problem: Floating-Point Lies

Every GPU computation accumulates tiny errors. These errors compound silently until your results are wrong.

import torch

# Classic floating-point failure
x = torch.tensor([1e16, 1.0, -1e16])
print(x.sum())  # 0.0  <- WRONG! Should be 1.0

# 10 million additions - error explodes
values = torch.ones(10_000_000) * 0.1
print(values.sum())  # 999999.9880... <- Should be 1000000.0

This affects: financial calculations, scientific simulations, physics engines, signal processing, cryptography, and any computation requiring precision.

The Solution: SimGen VLA

from simgen import vla

# Exact arithmetic - mathematically correct
x = vla.tensor([1e16, 1.0, -1e16])
print(x.sum())  # 1.0  <- CORRECT!

# 10 million additions - still exact
values = vla.ones(10_000_000) * 0.1
print(values.sum())  # 1000000.0  <- EXACTLY correct

No code changes. Same PyTorch API. Just import vla instead of torch.

Installation

pip install simgen-vla

Requirements:

Python 3.10, 3.11, or 3.12
PyTorch 2.0+ with CUDA
CuPy (matching your CUDA version: pip install cupy-cuda11x or cupy-cuda12x)
NVIDIA GPU (Pascal through Hopper: sm_60 to sm_90)

Platforms: Windows, Linux

What's New in v5.2.0

Focused API: 57 exact GPU operations for universal computing
SVD Support: Singular Value Decomposition via Jacobi rotations
Exact I/O: All inputs automatically converted to exact representation
Linear Algebra Suite: LU, QR, eigenvalues, determinant, inverse

Use Cases

Financial Computing

Mixed-magnitude calculations where every cent matters:

from simgen import vla

# Portfolio with massive range - standard FP loses the pennies
positions = vla.tensor([
    1_000_000_000.00,   # $1 billion position
    0.01,                # 1 cent transaction fee
    -999_999_999.99,     # Large short position
    50_000.50,           # Medium holding
])

total = positions.sum()
print(f"Portfolio: ${float(total):,.2f}")  # $50,000.52 - exact!

Scientific Simulation

Physics simulations that don't drift over time:

from simgen import vla

# Chaotic system (Lorenz attractor)
def lorenz_step(state, dt=0.01):
    x, y, z = state[0], state[1], state[2]
    sigma, rho, beta = 10.0, 28.0, 8.0/3.0

    dx = sigma * (y - x)
    dy = x * (rho - z) - y
    dz = x * y - beta * z

    return vla.tensor([x + dx * dt, y + dy * dt, z + dz * dt])

# Run forward then backward - returns to EXACTLY initial state
state = vla.tensor([1.0, 1.0, 1.0])
initial = state.clone()

for _ in range(10000):
    state = lorenz_step(state, dt=0.01)
for _ in range(10000):
    state = lorenz_step(state, dt=-0.01)

error = (state - initial).abs().sum()
print(f"Reversal error: {float(error)}")  # 0.0 with VLA!

Linear Algebra

Exact matrix decompositions and solvers:

from simgen import vla

# Matrix operations
A = vla.randn((100, 100))
B = vla.randn((100, 100))
C = vla.matmul(A, B)  # Exact matrix multiply

# LU Decomposition
L, U = vla.lu(A)

# QR Decomposition
Q, R = vla.qr(A)

# Eigenvalues (power iteration)
eigenvalue, eigenvector = vla.eig(A)

# Matrix inverse and determinant
A_inv = vla.inv(A)
det = vla.det(A)

# Solve linear system: Ax = b
x = vla.solve(A, b)

Signal Processing

FFT and convolutions with exact arithmetic:

from simgen import vla

# 2D Convolution
signal = vla.randn((1, 3, 64, 64))
kernel = vla.randn((16, 3, 3, 3))
output = vla.conv2d(signal, kernel)

Complete API Reference

Tensor Creation

from simgen import vla

x = vla.tensor([1.0, 2.0, 3.0])       # From list
z = vla.zeros((3, 3))                  # Zeros
o = vla.ones((100,))                   # Ones
r = vla.randn((10, 10))                # Random normal
u = vla.rand((5, 5))                   # Random uniform [0,1]
a = vla.arange(0, 10)                  # Range [0,1,2,...,9]
l = vla.linspace(0, 1, 100)            # 100 points from 0 to 1
I = vla.eye(5)                         # 5x5 identity matrix

Arithmetic Operations

c = a + b          # Exact addition
c = a - b          # Exact subtraction
c = a * b          # Exact multiplication
c = a / b          # Exact division
c = -a             # Negation
c = a ** 2         # Power

Reductions (Zero Drift)

total = vla.sum(x)         # Exact sum
avg = vla.mean(x)          # Exact mean
product = vla.prod(x)      # Exact product
minimum = vla.min(x)       # Minimum
maximum = vla.max(x)       # Maximum
std_dev = vla.std(x)       # Standard deviation
variance = vla.var(x)      # Variance

Linear Algebra

C = vla.matmul(A, B)       # Matrix multiplication
C = vla.mm(A, B)           # Matrix-matrix multiply
y = vla.mv(A, x)           # Matrix-vector multiply
d = vla.dot(a, b)          # Dot product
C = vla.bmm(A, B)          # Batched matrix multiply
L, U = vla.lu(A)           # LU decomposition
Q, R = vla.qr(A)           # QR decomposition
e, v = vla.eig(A)          # Eigenvalue (power iteration)
det = vla.det(A)           # Determinant
inv = vla.inv(A)           # Matrix inverse
x = vla.solve(A, b)        # Solve Ax = b

Math Functions

y = vla.exp(x)             # Exponential
y = vla.log(x)             # Natural log
y = vla.sqrt(x)            # Square root
y = vla.abs(x)             # Absolute value
y = vla.sin(x)             # Sine
y = vla.cos(x)             # Cosine
y = vla.tan(x)             # Tangent
y = vla.tanh(x)            # Hyperbolic tangent
y = vla.sigmoid(x)         # Sigmoid

Activations

y = vla.relu(x)            # ReLU
y = vla.gelu(x)            # GELU
y = vla.silu(x)            # SiLU/Swish
y = vla.softmax(x)         # Softmax

Shape Operations

y = vla.reshape(x, (2, 3))       # Reshape
y = vla.transpose(x, 0, 1)       # Transpose dims
y = vla.squeeze(x)               # Remove size-1 dims
y = vla.unsqueeze(x, 0)          # Add dimension
y = vla.stack([a, b, c])         # Stack tensors
y = vla.cat([a, b])              # Concatenate

Exact Output

# Get TRUE exact value as Python Decimal
result = x.sum()
exact_value = result.to_decimal()  # Decimal('1.0') - mathematically exact

# SHA256 checksum for verification
hash_val = result.checksum()       # Verify across systems

Supported GPUs

Architecture	Example GPUs	Compute Capability
Pascal	GTX 1080, P100, P40	sm_60, sm_61
Volta	V100, Titan V	sm_70
Turing	RTX 2080, T4, Quadro RTX	sm_75
Ampere	RTX 3090, A100, A10	sm_80, sm_86
Ada Lovelace	RTX 4090, 4080, 4070, L40	sm_89
Hopper	H100, H200	sm_90

Cloud Support: AWS (P3, P4, G4, G5), GCP (T4, A100, L4), Azure (NC, ND series), Kaggle (T4 x2 free), Colab

Benchmarks

Operation	Elements	PyTorch Error	VLA Error
Sum	10M	10^-7 relative	0.0
Dot Product	1M	10^-8 relative	0.0
Matrix Multiply	1000x1000	10^-6 relative	0.0
Chained Ops	1000 iterations	Diverges	Exact

FAQ

Q: Is this slower than PyTorch? A: Slightly. The overhead is typically 2-5x, which is negligible for applications where correctness matters.

Q: What about CPU? A: GPU required. VLA's exact arithmetic relies on native CUDA kernels - no CPU support.

Q: Can I verify results across systems? A: Yes! Use to_decimal() for exact values or checksum() for verification.

Support & Contact

Website: simgen.dev

Support Development: ko-fi.com/kyleclouthier

Email: kyle@simgen.dev

GitHub: github.com/DigitalMax321/simgen

License

Proprietary. Free during beta for research, academic, and commercial use.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

6.7.1

Mar 13, 2026

6.7.0

Mar 13, 2026

6.6.0

Mar 13, 2026

6.5.0

Mar 13, 2026

6.4.0

Mar 13, 2026

6.3.6

Mar 13, 2026

6.3.5

Mar 13, 2026

6.3.4

Mar 13, 2026

6.3.3

Mar 13, 2026

6.3.2

Mar 12, 2026

6.3.1

Mar 12, 2026

6.3.0

Mar 12, 2026

6.2.5

Mar 12, 2026

6.2.4

Mar 12, 2026

6.2.3

Mar 12, 2026

6.2.2

Mar 12, 2026

6.2.1

Mar 12, 2026

6.1.1

Mar 12, 2026

This version

6.1.0

Mar 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

simgen_vla-6.1.0-cp312-cp312-win_amd64.whl (5.1 MB view details)

Uploaded Mar 12, 2026 CPython 3.12Windows x86-64

simgen_vla-6.1.0-cp312-cp312-manylinux_2_17_x86_64.whl (7.0 MB view details)

Uploaded Mar 12, 2026 CPython 3.12manylinux: glibc 2.17+ x86-64

simgen_vla-6.1.0-cp311-cp311-win_amd64.whl (5.1 MB view details)

Uploaded Mar 12, 2026 CPython 3.11Windows x86-64

File details

Details for the file simgen_vla-6.1.0-cp312-cp312-win_amd64.whl.

File metadata

Download URL: simgen_vla-6.1.0-cp312-cp312-win_amd64.whl
Upload date: Mar 12, 2026
Size: 5.1 MB
Tags: CPython 3.12, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for simgen_vla-6.1.0-cp312-cp312-win_amd64.whl
Algorithm	Hash digest
SHA256	`b9dcef968cd9e66a5e11029373198f5c710e8a1ccfc95b46a7479132d19a43d9`
MD5	`a75ee29c622e772fdedc79fbd761dd7c`
BLAKE2b-256	`45da604ecf51528e14cb8a24c508e67a2fcef14a4b129565e87bcfcf7364a95b`

See more details on using hashes here.

File details

Details for the file simgen_vla-6.1.0-cp312-cp312-manylinux_2_17_x86_64.whl.

File metadata

Download URL: simgen_vla-6.1.0-cp312-cp312-manylinux_2_17_x86_64.whl
Upload date: Mar 12, 2026
Size: 7.0 MB
Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for simgen_vla-6.1.0-cp312-cp312-manylinux_2_17_x86_64.whl
Algorithm	Hash digest
SHA256	`8a4d150ac0f6c2302e0e2c572117b51a5820324c217e65beeb9143e33111a8f4`
MD5	`d6a8f2b047f4b52c75875535ec22ec9a`
BLAKE2b-256	`ddf204774bdc7e30d3ec74ee732b09177c5491fb610c5c8653f1132060fb6220`

See more details on using hashes here.

File details

Details for the file simgen_vla-6.1.0-cp311-cp311-win_amd64.whl.

File metadata

Download URL: simgen_vla-6.1.0-cp311-cp311-win_amd64.whl
Upload date: Mar 12, 2026
Size: 5.1 MB
Tags: CPython 3.11, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for simgen_vla-6.1.0-cp311-cp311-win_amd64.whl
Algorithm	Hash digest
SHA256	`3c6c990e695fc75ca4f759aa04aab9e9e741080b6d2a4b8c98b2bb34983c57e8`
MD5	`77fff56d153ae0e3ce0c094a05c9562c`
BLAKE2b-256	`f812c6fac6802266efbadd759f1454928e1c3881d2f2570c855f95907ce4b1fd`

See more details on using hashes here.

simgen-vla 6.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SimGen VLA - Zero-Error GPU Arithmetic

The Problem: Floating-Point Lies

The Solution: SimGen VLA

Installation

What's New in v5.2.0

Use Cases

Financial Computing

Scientific Simulation

Linear Algebra

Signal Processing

Complete API Reference

Tensor Creation

Arithmetic Operations

Reductions (Zero Drift)

Linear Algebra

Math Functions

Activations

Shape Operations

Exact Output

Supported GPUs

Benchmarks

FAQ

Support & Contact

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes