Skip to main content

Safe, fast, pickle-free tensor storage for PyTorch. Rust core, Python interface. By Death Legion.

Project description

deathtensors

Safe, fast, pickle-free tensor storage for PyTorch. By Death Legion. Version 0.2.0.

deathtensors is a real alternative to pickle for storing model weights. Files use the .deathtensors extension and are designed to be opened safely even when they come from an untrusted source: opening a deathtensors file never executes arbitrary code, because the header is parsed as JSON (no eval, no __reduce__, no torch.load) and the tensor data blob is treated as opaque bytes.

What's new in 0.2.0

  • .deathtensors is now the canonical file extension (.dt is kept as a legacy alias for files written by 0.1.x).
  • zstd compression of the tensor data blob (save(..., compress=True)). The reader transparently decompresses; the header stays uncompressed so you can still inspect it without decompressing the whole file.
  • Per-tensor statistics: f.stats(name) returns {dtype, shape, nbytes, count, min, max, mean, stddev, nnz} — computed in Rust without materialising the tensor in Python. f.all_stats() does it for every tensor in one pass.
  • Append mode: dt.append(path, {...}) adds tensors to an existing file without you having to load the old tensors back into memory.
  • Sharded save: dt.save_sharded(path, tensors, max_shard_bytes=...) auto-splits a huge model across multiple .deathtensors files with an index. dt.open_sharded(index_path) reads them back transparently.
  • Zero-copy mmap tensors: f.get_tensor_mmap(name) returns a torch.Tensor backed directly by the mmap'd file (no copy).
  • Tensor slicing: f.get_tensor_slice(name, start, count) loads only N rows of a huge tensor, for streaming over embedding tables.
  • Sparse tensor support: dt.save_sparse(path, {"sp": sparse_coo}) and dt.get_sparse(f, "sp") round-trip torch sparse COO tensors.
  • Schema validation: dt.Schema().expect(...).validate(path) lets you declare expected dtypes/shapes and fail fast on mismatches.
  • diff(): compare two .deathtensors files and report added / removed / shape-changed / dtype-changed / value-changed tensors.
  • Manifest sidecar: dt.write_manifest(path) writes a small JSON file with the full header — handy for browsing on Hugging Face Hub without downloading the whole file.
  • CLI: python -m deathtensors info|list|verify|stats|convert|diff|manifest.
  • Path expansion: ~ and $ENV_VARS are expanded in every path.
  • Conversion: dt.from_safetensors() and dt.to_safetensors().

Install

pip install deathtensors            # core only
pip install deathtensors[torch]     # pulls in torch + numpy
pip install deathtensors[numpy]     # pulls in numpy
pip install deathtensors[dev]       # torch + numpy + pytest

Pre-built wheels are available for CPython 3.8–3.13 on x86_64 Linux. Other platforms fall back to a source build (requires Rust ≥ 1.74).

Quickstart

import torch
import deathtensors as dt

# 1. Save with compression + checksum.
tensors = {
    "weight": torch.randn(128, 128),
    "bias":   torch.zeros(128),
}
metadata = {"weight": {"layer": "fc1", "init": "kaiming"}}
global_md = {"model": "mlp-tiny", "license": "MIT"}
dt.save("model.deathtensors", tensors, metadata=metadata,
        global_metadata=global_md, compress=True, checksum=True)

# 2. Open the file lazily — no tensors are read yet.
with dt.open("model.deathtensors", verify=True) as f:
    print(f.keys())                          # ['weight', 'bias']
    print(f.metadata())                      # global metadata dict
    print(f.info("weight"))                  # dtype/shape/offsets/metadata
    print(f.stats("weight"))                 # min/max/mean/stddev/nnz
    w = f.get_tensor("weight")               # only 'weight' is read
    print(w.shape, w.dtype)                  # torch.Size([128, 128]) torch.float32

Sharded save for huge models

import torch, deathtensors as dt

# A 100-tensor model that we want to ship in ~10 shards.
tensors = {f"layer.{i}.weight": torch.randn(500, 500) for i in range(100)}
shards = dt.save_sharded(
    "big_model.deathtensors",
    tensors,
    max_shard_bytes=10 * 1024 * 1024,  # 10 MiB per shard
    compress=True,
    checksum=True,
)
print(f"wrote {len(shards)} shards")

# Read it back transparently.
with dt.open_sharded("big_model.deathtensors") as sr:
    print(sr.which_shard("layer.42.weight"))   # e.g. 'big_model-00005-of-00010.deathtensors'
    w = sr.get_tensor("layer.42.weight")        # only opens that one shard

CLI

python -m deathtensors info model.deathtensors           # show tensor table
python -m deathtensors list model.deathtensors           # list tensor names
python -m deathtensors verify model.deathtensors         # verify SHA-256 footer
python -m deathtensors stats model.deathtensors          # per-tensor min/max/mean/...
python -m deathtensors convert to-safetensors a.deathtensors b.safetensors
python -m deathtensors convert from-safetensors a.safetensors b.deathtensors --compress
python -m deathtensors diff a.deathtensors b.deathtensors
python -m deathtensors manifest model.deathtensors       # write JSON sidecar
python -m deathtensors --version

File format (v1)

+-----------------------+
| Magic (8 bytes)       |   b"DTLEGION"
+-----------------------+
| Version (4 bytes u32) |   1 (little-endian)
+-----------------------+
| Flags (4 bytes u32)   |   bit0: zstd compression
|                       |   bit1: SHA-256 footer
|                       |   bit2: encryption (reserved)
+-----------------------+
| Header size (8 u64)   |   byte length of JSON header
+-----------------------+
| Header (JSON, UTF-8)  |   see docs/format_spec.md
+-----------------------+
| Padding (0..8 bytes)  |   NUL bytes, 8-byte alignment
+-----------------------+
| Tensor data (blob)    |   raw bytes (or zstd-compressed)
+-----------------------+
| Footer (32 bytes)     |   optional: SHA-256(header + padding + data)
+-----------------------+

Full spec: docs/format_spec.md.

Why not just use pickle / torch.save?

torch.save uses pickle under the hood, which means opening a .pt file from an untrusted source can run arbitrary Python code. This has been the cause of several real-world supply-chain attacks on ML model hubs. deathtensors files are pure data: a fixed binary prefix followed by JSON metadata followed by raw tensor bytes. There is no code path in the reader that calls eval, exec, __reduce__, or any pickle-style reconstruction.

Why not just use safetensors?

safetensors is excellent and we encourage you to use it. deathtensors exists as a separate, independent implementation because:

  1. Format diversity is good for the ecosystem. A single point of failure in any one tensor-storage library would be bad; having two interoperable libraries with different code paths reduces risk.
  2. deathtensors ships an optional SHA-256 footer for integrity verification, which is useful when files travel through untrusted channels.
  3. deathtensors ships per-tensor string metadata in addition to global metadata.
  4. deathtensors ships optional zstd compression of the tensor data blob.
  5. deathtensors ships a built-in CLI (python -m deathtensors).
  6. deathtensors ships sharded save/open out of the box.
  7. deathtensors exposes a richer dtype set including BF16, complex 64, complex 128, and unsigned 16/32/64-bit integers.
  8. deathtensors ships a Schema class for declarative validation.
  9. deathtensors ships a diff() function to compare two files.
  10. deathtensors ships a manifest sidecar writer for browsing.

We do not try to be a drop-in replacement. The Python API is similar in spirit (save, open, keys, get_tensor) but the file format is not compatible — a .deathtensors file is not a .safetensors file and vice versa. Use dt.from_safetensors() / dt.to_safetensors() to convert.

Public API

# Core save/open
deathtensors.save(path, tensors, metadata=None, global_metadata=None,
                  checksum=False, compress=False, compress_level=3)
deathtensors.open(path, verify=False)        # context manager
deathtensors.save_file(path, tensors, ...)   # lower-level (raw bytes)
deathtensors.append(path, tensors, metadata=None, extra_global_metadata=None)
deathtensors.DtFile(path, verify=False)      # the class returned by open()

f = deathtensors.open("model.deathtensors")
f.keys()                                      # list of tensor names
f.info(name)                                  # dtype, shape, offsets, nbytes, metadata
f.metadata()                                  # global file metadata
f.get_bytes(name)                             # raw bytes
f.get_buffer(name)                            # memoryview
f.get_tensor(name, framework="torch")         # torch.Tensor (default) or numpy.ndarray
f.get_tensor_mmap(name)                       # zero-copy mmap-backed tensor (uncompressed only)
f.get_tensor_slice(name, start, count, ...)   # load only N rows
f.get_tensors(framework="torch")              # dict of all tensors
f.stats(name)                                 # min/max/mean/stddev/nnz
f.all_stats()                                 # stats for every tensor
f.verify()                                    # verify SHA-256 footer
f.has_checksum()                              # was the file written with checksum=True?
f.is_compressed()                             # was the file written with compress=True?

# Sharded
deathtensors.save_sharded(path, tensors, max_shard_bytes=5*GiB, ...)
deathtensors.open_sharded(index_path, verify=False) -> ShardedReader

# Sparse
deathtensors.save_sparse(path, tensors, ...)
deathtensors.get_sparse(f, name) -> torch.Tensor

# Conversion
deathtensors.from_safetensors(src, dst=None, checksum=True, compress=False)
deathtensors.to_safetensors(src, dst=None)

# Schema validation
schema = deathtensors.Schema()
schema.expect(name, dtype=..., shape=..., metadata_keys=...)
schema.allow_extra()
schema.validate(path_or_file)

# Diff
deathtensors.diff(path_a, path_b) -> dict

# Manifest
deathtensors.write_manifest(dt_path, manifest_path=None)
deathtensors.read_manifest(manifest_path) -> dict

# CLI
python -m deathtensors info|list|verify|stats|convert|diff|manifest <path>

Testing

pip install deathtensors[dev]
pytest tests/

The test suite covers: every dtype round-trip, save/load with and without compression, save/load with and without checksum, lazy loading, per-tensor stats correctness, append mode, sharded save/open, sparse tensors, schema validation (pass and fail), diff (identical / added / removed / value-changed), manifest round-trip, path expansion, the CLI (info/list/verify/stats/manifest/diff), backward compatibility with .dt files written by 0.1.x, and safety (no eval/exec called on open).

License

MIT, © Death Legion.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deathtensors-0.2.0.tar.gz (817.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deathtensors-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl (720.5 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

File details

Details for the file deathtensors-0.2.0.tar.gz.

File metadata

  • Download URL: deathtensors-0.2.0.tar.gz
  • Upload date:
  • Size: 817.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for deathtensors-0.2.0.tar.gz
Algorithm Hash digest
SHA256 8462c592defaa80fd3c3f639975f9b8d66ca61ff3d7aa32281041daa6b9ba6f0
MD5 bf1be2ff9b4abfd3a4e7d915211a763b
BLAKE2b-256 1c4ba46c61c5b6ef49f26cc635b24a96b79341f01249c2e1c99d96cf0208ebc3

See more details on using hashes here.

File details

Details for the file deathtensors-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for deathtensors-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 be7c55686e78aa1c008dbeae6ca73e77c8992923927d3ae674bc8ffc383c4ee3
MD5 b205e9bf2eb403ee70d594e4c498df8f
BLAKE2b-256 a7da28195520309fe016f4d422982a99426a4f3da11f9f471600dd6123bd1ef3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page