Bit-exact deterministic deep learning training across heterogeneous hardware

Project description

CARBON

Same seed. Different GPU. Identical weights.

Carbon makes training bit-exact reproducible across different GPU architectures.

Train on an RTX 4090. Train on an RTX 5090. Same SHA-256 hash on every weight tensor. Same optimizer state. Same loss. Different silicon, identical bits.

Built by Tushar Sharma at ALIA Labs.

Install

pip install alia-carbon

The Problem

RTX 4090:  hash = 6a6e2bc1...  loss = 0.069844
RTX 5090:  hash = 46681ef8...  loss = 0.069844

Same loss. Different weights. cuBLAS picked different internal algorithms on different silicon. Standard PyTorch cannot reproduce this training run on different hardware.

The Fix

import carbon
carbon.enable(seed=42)

RTX 4090:  hash = 62118e9c...  loss = 0.070026
RTX 5090:  hash = 62118e9c...  loss = 0.070026

Identical. 10 seeds tested, up to 500 steps, every configuration matches.

How It Works

Every non-deterministic op replaced with a deterministic one:

Op	Standard	Carbon
MatMul	cuBLAS (arch-dependent)	Tiled fp64 + Kahan accumulation
LayerNorm	Parallel reduction (thread-dependent)	fp64 mean/variance
AllReduce	NCCL (arrival-order-dependent)	AllGather + rank-order reduce
Scatter	Atomic race conditions	Sorted index ops

The mechanism: split every matmul into tiles, upcast to float64, accumulate with Kahan-Babushka-Neumaier compensation in a fixed order. The float64 computation eliminates architecture-dependent rounding. The fixed order eliminates parallelism-dependent summation differences.

The Proof

Toy Model (500K params, 50 steps, 5 seeds)

Seed	Steps	4090 Hash	5090 Hash	Match
42	500	`d6830c89...`	`d6830c89...`	yes
123	500	`44843c64...`	`44843c64...`	yes
7	500	`7bf2c902...`	`7bf2c902...`	yes
999	500	`8cc60024...`	`8cc60024...`	yes
2024	500	`1e420d04...`	`1e420d04...`	yes

10 out of 10 configurations. Every hash matches.

GPT-2 124M Fine-Tune (60M trainable, 20 steps)

Run	Hash	Loss
Standard PyTorch (5090)	`85b72d9f...`	7.1904
Carbon run 1 (5090)	`995d4c9b...`	7.1904
Carbon run 2 (5090)	`995d4c9b...`	7.1904
Carbon cross-GPU (4090)	`995d4c9b...`	7.1904

Three Carbon runs, two GPUs, one hash. Standard PyTorch produces a different hash.

Overhead

Scale	Overhead
500K toy model	1.07x
GPT-2 124M (60M trainable)	10.1x

The cost of bit-exact determinism. fp64 Kahan-compensated matmul is slower than cuBLAS. For alignment research and debugging where you need exact reproducibility, it's worth it.

Important: What Carbon Requires

Cross-architecture determinism requires replacing nn.Linear with DeterministicLinear and nn.LayerNorm with CarbonLayerNorm. The carbon.enable() call patches torch.matmul globally, but standard PyTorch modules use internal C++ paths that bypass the patch.

This is not a one-line fix for existing training code. It's a mechanism that works when you build with Carbon's layers.

Tested On

RTX 4090 (Ada) | RTX 5090 (Blackwell) | H100 SXM | A100 SXM

Consumer GPUs match each other. Datacenter GPUs match each other. Cross-tier (consumer vs datacenter) produces different hashes. Documented, not hidden.

Citation

@article{sharma2026carbon,
  title={Carbon: Bit-Exact Deterministic Training Across Consumer GPU Architectures},
  author={Sharma, Tushar},
  year={2026},
  url={https://github.com/TxsharDev/carbon}
}

Roadmap

v0.1 (current) - Proof of concept. KBN summation, tiled fp64 matmul, Python-level tiling.

v0.2 - Performance. Replace Kahan accumulation with superaccumulators for order-independent exact summation. This eliminates the need for fixed tile order and could close the consumer-vs-datacenter hash gap. CUDA-native tiled matmul to cut the 10x overhead to under 2x.

v0.3 - Scale. Mixed precision (bf16 forward, fp32 master weights). Multi-GPU deterministic allreduce. FSDP/DDP integration. Target: deterministic fine-tuning of 7B+ models.

License

Apache-2.0 | ALIA Labs

Project details

Release history Release notifications | RSS feed

This version

0.2.0

Jun 30, 2026

0.1.0

Jun 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alia_carbon-0.2.0.tar.gz (18.4 kB view details)

Uploaded Jun 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

alia_carbon-0.2.0-py3-none-any.whl (18.3 kB view details)

Uploaded Jun 30, 2026 Python 3

File details

Details for the file alia_carbon-0.2.0.tar.gz.

File metadata

Download URL: alia_carbon-0.2.0.tar.gz
Upload date: Jun 30, 2026
Size: 18.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for alia_carbon-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`717226c9655cc1f689fcbcf8a15a4744571fef98b720b03b661f393fb20cab14`
MD5	`25d385585369baf9900c7948a7dfa7a4`
BLAKE2b-256	`a38d92153fbd09426727dc75d65bc15c2190da501b769f3ab70dbdab55585ec9`

See more details on using hashes here.

File details

Details for the file alia_carbon-0.2.0-py3-none-any.whl.

File metadata

Download URL: alia_carbon-0.2.0-py3-none-any.whl
Upload date: Jun 30, 2026
Size: 18.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for alia_carbon-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ce7899229aeb41607833e3accff1c733dd2a4a7e0e56d4b679c4cc6f5b20e8c0`
MD5	`2155131c4575eea6326aaff6cb30b3cb`
BLAKE2b-256	`72f08bc8c8f82fd99a8dbf0a53057e7c9980eb55c0f5b970ce9d22e5cfc2c039`

See more details on using hashes here.

alia-carbon 0.2.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Project description

CARBON

Install

The Problem

The Fix

How It Works

The Proof

Toy Model (500K params, 50 steps, 5 seeds)

GPT-2 124M Fine-Tune (60M trainable, 20 steps)

Overhead

Important: What Carbon Requires

Tested On

Citation

Roadmap

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes