Safe, fast, pickle-free tensor storage for PyTorch. Rust core, Python interface. By Death Legion.
Project description
deathtensors
Safe, fast, pickle-free tensor storage for PyTorch. By Death Legion. Version 0.2.0.
deathtensors is a real alternative to pickle for storing model weights.
Files use the .deathtensors extension and are designed to be opened
safely even when they come from an untrusted source: opening a
deathtensors file never executes arbitrary code, because the header
is parsed as JSON (no eval, no __reduce__, no torch.load) and the
tensor data blob is treated as opaque bytes.
What's new in 0.2.0
.deathtensorsis now the canonical file extension (.dtis kept as a legacy alias for files written by 0.1.x).- zstd compression of the tensor data blob (
save(..., compress=True)). The reader transparently decompresses; the header stays uncompressed so you can still inspect it without decompressing the whole file. - Per-tensor statistics:
f.stats(name)returns{dtype, shape, nbytes, count, min, max, mean, stddev, nnz}— computed in Rust without materialising the tensor in Python.f.all_stats()does it for every tensor in one pass. - Append mode:
dt.append(path, {...})adds tensors to an existing file without you having to load the old tensors back into memory. - Sharded save:
dt.save_sharded(path, tensors, max_shard_bytes=...)auto-splits a huge model across multiple.deathtensorsfiles with an index.dt.open_sharded(index_path)reads them back transparently. - Zero-copy mmap tensors:
f.get_tensor_mmap(name)returns a torch.Tensor backed directly by the mmap'd file (no copy). - Tensor slicing:
f.get_tensor_slice(name, start, count)loads only N rows of a huge tensor, for streaming over embedding tables. - Sparse tensor support:
dt.save_sparse(path, {"sp": sparse_coo})anddt.get_sparse(f, "sp")round-trip torch sparse COO tensors. - Schema validation:
dt.Schema().expect(...).validate(path)lets you declare expected dtypes/shapes and fail fast on mismatches. diff(): compare two.deathtensorsfiles and report added / removed / shape-changed / dtype-changed / value-changed tensors.- Manifest sidecar:
dt.write_manifest(path)writes a small JSON file with the full header — handy for browsing on Hugging Face Hub without downloading the whole file. - CLI:
python -m deathtensors info|list|verify|stats|convert|diff|manifest. - Path expansion:
~and$ENV_VARSare expanded in every path. - Conversion:
dt.from_safetensors()anddt.to_safetensors().
Install
pip install deathtensors # core only
pip install deathtensors[torch] # pulls in torch + numpy
pip install deathtensors[numpy] # pulls in numpy
pip install deathtensors[dev] # torch + numpy + pytest
Pre-built wheels are available for CPython 3.8–3.13 on x86_64 Linux. Other platforms fall back to a source build (requires Rust ≥ 1.74).
Quickstart
import torch
import deathtensors as dt
# 1. Save with compression + checksum.
tensors = {
"weight": torch.randn(128, 128),
"bias": torch.zeros(128),
}
metadata = {"weight": {"layer": "fc1", "init": "kaiming"}}
global_md = {"model": "mlp-tiny", "license": "MIT"}
dt.save("model.deathtensors", tensors, metadata=metadata,
global_metadata=global_md, compress=True, checksum=True)
# 2. Open the file lazily — no tensors are read yet.
with dt.open("model.deathtensors", verify=True) as f:
print(f.keys()) # ['weight', 'bias']
print(f.metadata()) # global metadata dict
print(f.info("weight")) # dtype/shape/offsets/metadata
print(f.stats("weight")) # min/max/mean/stddev/nnz
w = f.get_tensor("weight") # only 'weight' is read
print(w.shape, w.dtype) # torch.Size([128, 128]) torch.float32
Sharded save for huge models
import torch, deathtensors as dt
# A 100-tensor model that we want to ship in ~10 shards.
tensors = {f"layer.{i}.weight": torch.randn(500, 500) for i in range(100)}
shards = dt.save_sharded(
"big_model.deathtensors",
tensors,
max_shard_bytes=10 * 1024 * 1024, # 10 MiB per shard
compress=True,
checksum=True,
)
print(f"wrote {len(shards)} shards")
# Read it back transparently.
with dt.open_sharded("big_model.deathtensors") as sr:
print(sr.which_shard("layer.42.weight")) # e.g. 'big_model-00005-of-00010.deathtensors'
w = sr.get_tensor("layer.42.weight") # only opens that one shard
CLI
python -m deathtensors info model.deathtensors # show tensor table
python -m deathtensors list model.deathtensors # list tensor names
python -m deathtensors verify model.deathtensors # verify SHA-256 footer
python -m deathtensors stats model.deathtensors # per-tensor min/max/mean/...
python -m deathtensors convert to-safetensors a.deathtensors b.safetensors
python -m deathtensors convert from-safetensors a.safetensors b.deathtensors --compress
python -m deathtensors diff a.deathtensors b.deathtensors
python -m deathtensors manifest model.deathtensors # write JSON sidecar
python -m deathtensors --version
File format (v1)
+-----------------------+
| Magic (8 bytes) | b"DTLEGION"
+-----------------------+
| Version (4 bytes u32) | 1 (little-endian)
+-----------------------+
| Flags (4 bytes u32) | bit0: zstd compression
| | bit1: SHA-256 footer
| | bit2: encryption (reserved)
+-----------------------+
| Header size (8 u64) | byte length of JSON header
+-----------------------+
| Header (JSON, UTF-8) | see docs/format_spec.md
+-----------------------+
| Padding (0..8 bytes) | NUL bytes, 8-byte alignment
+-----------------------+
| Tensor data (blob) | raw bytes (or zstd-compressed)
+-----------------------+
| Footer (32 bytes) | optional: SHA-256(header + padding + data)
+-----------------------+
Full spec: docs/format_spec.md.
Why not just use pickle / torch.save?
torch.save uses pickle under the hood, which means opening a
.pt file from an untrusted source can run arbitrary Python code.
This has been the cause of several real-world supply-chain attacks on
ML model hubs. deathtensors files are pure data: a fixed binary
prefix followed by JSON metadata followed by raw tensor bytes. There is
no code path in the reader that calls eval, exec, __reduce__, or
any pickle-style reconstruction.
Why not just use safetensors?
safetensors is excellent and we encourage you to use it. deathtensors
exists as a separate, independent implementation because:
- Format diversity is good for the ecosystem. A single point of failure in any one tensor-storage library would be bad; having two interoperable libraries with different code paths reduces risk.
deathtensorsships an optional SHA-256 footer for integrity verification, which is useful when files travel through untrusted channels.deathtensorsships per-tensor string metadata in addition to global metadata.deathtensorsships optional zstd compression of the tensor data blob.deathtensorsships a built-in CLI (python -m deathtensors).deathtensorsships sharded save/open out of the box.deathtensorsexposes a richer dtype set including BF16, complex 64, complex 128, and unsigned 16/32/64-bit integers.deathtensorsships aSchemaclass for declarative validation.deathtensorsships adiff()function to compare two files.deathtensorsships a manifest sidecar writer for browsing.
We do not try to be a drop-in replacement. The Python API is similar
in spirit (save, open, keys, get_tensor) but the file format
is not compatible — a .deathtensors file is not a .safetensors
file and vice versa. Use dt.from_safetensors() / dt.to_safetensors()
to convert.
Public API
# Core save/open
deathtensors.save(path, tensors, metadata=None, global_metadata=None,
checksum=False, compress=False, compress_level=3)
deathtensors.open(path, verify=False) # context manager
deathtensors.save_file(path, tensors, ...) # lower-level (raw bytes)
deathtensors.append(path, tensors, metadata=None, extra_global_metadata=None)
deathtensors.DtFile(path, verify=False) # the class returned by open()
f = deathtensors.open("model.deathtensors")
f.keys() # list of tensor names
f.info(name) # dtype, shape, offsets, nbytes, metadata
f.metadata() # global file metadata
f.get_bytes(name) # raw bytes
f.get_buffer(name) # memoryview
f.get_tensor(name, framework="torch") # torch.Tensor (default) or numpy.ndarray
f.get_tensor_mmap(name) # zero-copy mmap-backed tensor (uncompressed only)
f.get_tensor_slice(name, start, count, ...) # load only N rows
f.get_tensors(framework="torch") # dict of all tensors
f.stats(name) # min/max/mean/stddev/nnz
f.all_stats() # stats for every tensor
f.verify() # verify SHA-256 footer
f.has_checksum() # was the file written with checksum=True?
f.is_compressed() # was the file written with compress=True?
# Sharded
deathtensors.save_sharded(path, tensors, max_shard_bytes=5*GiB, ...)
deathtensors.open_sharded(index_path, verify=False) -> ShardedReader
# Sparse
deathtensors.save_sparse(path, tensors, ...)
deathtensors.get_sparse(f, name) -> torch.Tensor
# Conversion
deathtensors.from_safetensors(src, dst=None, checksum=True, compress=False)
deathtensors.to_safetensors(src, dst=None)
# Schema validation
schema = deathtensors.Schema()
schema.expect(name, dtype=..., shape=..., metadata_keys=...)
schema.allow_extra()
schema.validate(path_or_file)
# Diff
deathtensors.diff(path_a, path_b) -> dict
# Manifest
deathtensors.write_manifest(dt_path, manifest_path=None)
deathtensors.read_manifest(manifest_path) -> dict
# CLI
python -m deathtensors info|list|verify|stats|convert|diff|manifest <path>
Testing
pip install deathtensors[dev]
pytest tests/
The test suite covers: every dtype round-trip, save/load with and
without compression, save/load with and without checksum, lazy loading,
per-tensor stats correctness, append mode, sharded save/open, sparse
tensors, schema validation (pass and fail), diff (identical / added /
removed / value-changed), manifest round-trip, path expansion, the CLI
(info/list/verify/stats/manifest/diff), backward
compatibility with .dt files written by 0.1.x, and safety (no
eval/exec called on open).
License
MIT, © Death Legion.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deathtensors-0.2.0.tar.gz.
File metadata
- Download URL: deathtensors-0.2.0.tar.gz
- Upload date:
- Size: 817.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8462c592defaa80fd3c3f639975f9b8d66ca61ff3d7aa32281041daa6b9ba6f0
|
|
| MD5 |
bf1be2ff9b4abfd3a4e7d915211a763b
|
|
| BLAKE2b-256 |
1c4ba46c61c5b6ef49f26cc635b24a96b79341f01249c2e1c99d96cf0208ebc3
|
File details
Details for the file deathtensors-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: deathtensors-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 720.5 kB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be7c55686e78aa1c008dbeae6ca73e77c8992923927d3ae674bc8ffc383c4ee3
|
|
| MD5 |
b205e9bf2eb403ee70d594e4c498df8f
|
|
| BLAKE2b-256 |
a7da28195520309fe016f4d422982a99426a4f3da11f9f471600dd6123bd1ef3
|