Skip to main content

Bit-exact deterministic deep learning training across heterogeneous hardware

Project description

CARBON

Exact copy, every time, any hardware

Bit-Exact Deterministic Training Across Heterogeneous Infrastructure

GitHub · Paper · Install


Why "Carbon"? A carbon copy is an exact duplicate — zero deviation, every detail preserved. Train on 8 GPUs, train on 64. A100s, H100s, B200s. Carbon: same weights, same gradients, same loss. The name is the guarantee.


The Problem

Training is not reproducible. Same model, same data, different GPU count — different results. Three sources:

  1. Floating-point non-associativity(a+b)+c ≠ a+(b+c). Different parallelism = different summation order = different bits.
  2. Non-deterministic CUDA kernels — cuBLAS picks algorithms at runtime. Atomics race.
  3. NCCL collectives — allreduce arrival order varies run to run.

Alignment teams can't study what a training change did vs. what was floating-point noise. This blocks interpretability research.

How Carbon Works

Every non-deterministic op replaced with a deterministic equivalent:

Operation Standard Carbon
Summation Non-associative accumulation Kahan compensated, canonical sorted order
MatMul cuBLAS (algorithm varies) Tiled with fixed reduction order + Kahan
AllReduce NCCL (arrival order varies) AllGather + local reduce in rank order
Scatter Atomic race conditions Sorted index operations

Install

pip install -e ".[dev]"

Quick Start

import carbon

carbon.enable(seed=42)

# training is now bit-exact deterministic
for batch in dataloader:
    loss = model(batch)
    loss.backward()
    optimizer.step()

Full Wrapper

from carbon import DeterministicTrainer

trainer = DeterministicTrainer(model, optimizer, seed=42)

for batch in dataloader:
    loss = trainer.step(batch, loss_fn=lambda m, b: m(b).loss)

# prove it
assert trainer.verify_determinism(batch, loss_fn)

Overhead

Scale Overhead Notes
500K param toy model 1.07x Matmul is small, overhead negligible
GPT-2 124M (60M trainable) 10.1x fp64 tiled matmul dominates

The overhead is the price of bit-exact cross-architecture determinism. The fp64 Kahan-compensated tiled matmul is slower than cuBLAS because it trades speed for reproducibility. For alignment research, debugging, and auditing where reproducibility is non-negotiable — worth it. For production training — wait for v0.2 with CUDA-native deterministic kernels.

Proven Results — RTX 4090 vs RTX 5090

Different GPU architectures (Ada Lovelace vs Blackwell). Same seed. Same data. 50 and 200 training steps.

Standard PyTorch

GPU Weight Hash Loss
RTX 4090 6a6e2bc1b29e831b... 0.069844
RTX 5090 46681ef8c8420252... 0.069844

Same loss, DIFFERENT weights. The models converged to different local configurations because cuBLAS picked different internal algorithms on different silicon. You can't reproduce this run.

Carbon

GPU Weight Hash Optimizer Hash Loss
RTX 4090 (50 steps) 62118e9c641a0150... 0.070026
RTX 5090 (50 steps) 62118e9c641a0150... 0.070026
RTX 4090 (200 steps) e2aa1052f4a9dcf3... 39dc99f0f803efe7... 0.045067
RTX 5090 (200 steps) e2aa1052f4a9dcf3... 39dc99f0f803efe7... 0.045067

Identical weights. Identical optimizer state. Identical loss. Different silicon.

Carbon achieved what PyTorch could not: bit-exact reproducible training across heterogeneous GPU architectures.

Citation

@article{sharma2026carbon,
  title={Carbon: Bit-Exact Deterministic Training Across Heterogeneous Hardware},
  author={Sharma, Tushar},
  year={2026},
  url={https://github.com/TxsharDev/carbon}
}

License

Apache-2.0 — Alia Labs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alia_carbon-0.1.0.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

alia_carbon-0.1.0-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file alia_carbon-0.1.0.tar.gz.

File metadata

  • Download URL: alia_carbon-0.1.0.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for alia_carbon-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c1341d5dd00450e034c06b64e09efdd6786a9de693730067adcb0306aa3312b4
MD5 e1a161ec6beca12b3bb49bd1f82eb6c3
BLAKE2b-256 df76e62e7cf7c30d51d4beb2928b4a7438c163b146de03615fdf8acd0a4aecd7

See more details on using hashes here.

File details

Details for the file alia_carbon-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: alia_carbon-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for alia_carbon-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 89f13cf686f38ba4132e4c572dd8a03ed8618dc60e7b003860dc5faaee020a29
MD5 ebfa891d99561219b35aad407e8d4135
BLAKE2b-256 43282da7c821760440507bd3fa5c534ee5836d923eb8b234d9559f9d495fbf91

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page