coldcrate

Row-wise, self-describing, single-file cold-storage format with per-entry compression and encryption

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

larryvrh

These details have not been verified by PyPI

Project description

ColdCrate

A row-wise, self-describing, single-file format for cold storage.
Structured rows + heavy blobs, archived once and read back by offset — with built-in compression and encryption.

Python Format Dependencies

English · 简体中文

What is ColdCrate?

ColdCrate is a file format (and a small, dependency-light Python library) for archiving datasets where each record is a structured row plus a heavy blob — think images with metadata, embeddings, documents, model shards.

A chunk is a single file: a small header, an embedded JSON schema that fully describes every row, then an append-only stream of length-prefixed entries. Each entry's payload can be compressed (LZ4 / Zstd) and encrypted (AES-256-XTS, derived from a passphrase) independently.

┌──────────────┬───────────────┬──────── append-only ────────────────┐
│  Header 128B │  JSON schema  │  entry · entry · entry · entry · …   │
└──────────────┴───────────────┴──────────────────────────────────────┘
                  ▲ travels with the data — the file explains itself

It deliberately ships no built-in index. Fast lookup is the caller's job via an external manifest of (resource_id, offset) — so the format stays a clean, predictable byte container.

Why ColdCrate?

📦 Self-describing. The schema lives in the file. Hand someone a chunk and they can read every field name, type, and description — no side-channel docs. scan() can fully rebuild a lost manifest.
🧬 Real types, real nesting. u8…u64 / i8…i64 / f32 / f64 / bool / bytes / utf8 / uuid / timestamp, plus fixed/variable arrays and nested structs, arbitrarily composed.
🗜️ Compression built in. Per-entry LZ4 or Zstd with a tunable level; skip it per-entry for already-compressed blobs.
🔐 Encryption built in. AES-256-XTS keyed from a passphrase (random salt + scrypt in the header). The schema is encrypted too, so field names don't leak. Wrong passphrase fails fast at open().
➕ Append-only, no seal. Keep appending anytime; a crash never corrupts prior entries. Checksummed scan() recovers what's valid; repair() truncates trailing garbage.
🧊 Built for scale. Streaming write/read (flat memory regardless of chunk size), 8-byte aligned for mmap, and embarrassingly parallel across chunks — multi-GB chunks, thousands of them, are the design target.
🪶 Light. Core install pulls in only xxhash. Compression and crypto backends are optional extras, imported lazily.
🔎 Fully typed. Ships py.typed (PEP 561) and is mypy-strict clean, so your type checker sees every signature.

Install

pip install coldcrate            # core (xxhash checksums only)
pip install coldcrate[zstd]      # + Zstd compression
pip install coldcrate[lz4]       # + LZ4 compression
pip install coldcrate[crypto]    # + AES-256-XTS encryption
pip install coldcrate[all]       # everything

Backends are imported lazily; using one you didn't install raises a clear CompressionError / EncryptionError.

Quick start

import coldcrate as cc

schema = cc.Schema(
    description="image dataset",
    fields=[
        cc.Field("source",     "utf8", description="origin URL"),
        cc.Field("category",   "utf8"),
        cc.Field("dimensions", cc.Struct([
            cc.Field("width",  "u32"),
            cc.Field("height", "u32"),
        ])),
        cc.Field("tags",       cc.VarArray("utf8")),
        cc.Field("embedding",  cc.FixedArray("f32", 768), nullable=True),
        cc.Field("image_data", "bytes"),
    ],
)

# --- write ---
manifest = []
with cc.ChunkWriter.create("images.coldcrate", schema, compression="zstd") as w:
    res = w.append(b"img-001", {
        "source": "http://example.com/a.jpg",
        "category": "cat",
        "dimensions": {"width": 800, "height": 600},
        "tags": ["cute", "outdoor"],
        "embedding": None,
        "image_data": jpeg_bytes,
    })
    manifest.append((b"img-001", res.offset))   # remember where it landed

# --- read by offset (manifest-driven) ---
with cc.ChunkReader.open("images.coldcrate") as r:
    entry = r.read_at(manifest[0][1])
    print(entry.fields["category"], entry.checksum_ok)

    # or sweep everything in order
    for entry in r.scan():
        ...

A row is a dict matching the schema: nested structs are nested dicts, arrays are lists. Validation is strict — out-of-range ints, wrong types, missing non-nullable fields, and unknown keys all raise SchemaError instead of being silently coerced.

When to use it — and when not to

Reach for ColdCrate when:

You archive many structured records, each with a sizable blob, and read them back by offset (or by full scan) rather than by ad-hoc query.
You want one self-contained file that explains itself, optionally compressed and encrypted, with no database to run.
Your access pattern is write-mostly-once, read-occasionally — cold storage, dataset shipping, ML training shards.

Look elsewhere when:

You need…	Use instead
Ad-hoc queries / secondary indexes	SQLite, DuckDB, a real DB
Columnar analytics over wide tables	Parquet / Arrow
In-place random update or delete	a mutable store (ColdCrate is append-only)
Authentication against active tampering, out of the box	add an HMAC/signature yourself (XTS has no MAC — see Encryption)
Reading one sub-field without touching the rest	— reads decode the whole row eagerly
A non-Python reader today	— only the Python implementation exists (the format is simple, though)

Core concepts

Chunk — one file. Created once with a fixed header + schema, then appended to.
Schema — the row definition, embedded as JSON. One schema per chunk; every entry conforms to it.
resource_id — an opaque per-entry handle (1–512 bytes), always stored in plaintext (even in encrypted chunks), used to reference entries from a manifest.
Manifest — your external index: typically (resource_id, chunk_path, offset, …) rows you collect from each append(). ColdCrate has no built-in index because without a manifest a resource_id has no meaning to look up, and with one the offset already lives there. A full scan() rebuilds it.

What's stored vs what you supply

A chunk is self-describing: the algorithms and parameters needed to read it are written into the header and schema, so there's no way to mis-specify them on open and silently corrupt a read.

set at `create()`	stored in the chunk?	needed at read time?
`schema`	✅ embedded JSON	no — read from the file
`compression` algorithm	✅ header	no
`encryption` algorithm	✅ header	no
`kdf` params + random salt	✅ header	no
`chunk_id`, `created_at`	✅ header	no
`compression_level`	❌ writer-side only	never — decompression doesn't need it
`passphrase`	❌ it's the secret	yes, for encrypted chunks

So the only thing you pass to ChunkReader.open() is the passphrase, and only for encrypted chunks. You can't "mismatch" the compression/encryption algorithm, level, or KDF — they come from the file, not from you. A wrong passphrase fails fast at open() (the encrypted schema won't decrypt), never a silent garbage read.

Guide

Schema & types

Primitives are strings; composites are helper objects, nesting arbitrarily.

Type	Python value
`u8 u16 u32 u64 i8 i16 i32 i64`	`int` (range-checked)
`f32 f64`	`float`
`bool`	`bool` (strict — not `0/1`)
`bytes`	`bytes` / `bytearray` / `memoryview`
`utf8`	`str`
`uuid`	`uuid.UUID` (or 16 bytes)
`timestamp`	`int` — Unix microseconds (no timezone magic)
`Struct([Field, …])`	nested `dict`
`FixedArray(elem, n)`	`list` of exactly `n`
`VarArray(elem)`	`list` of any length

Any Field can be nullable=True (value None, or omit the key). Nesting is capped at 64 levels and the embedded schema at 8 MiB on read, so a pathological or hostile schema raises a clean error instead of exhausting the stack. A single variable-length field (bytes/utf8/array) is bounded by a u32 length prefix (~4 GiB).

Compression

Set per chunk; opt out per entry. The level is a writer-side speed/ratio knob and is not stored (decompression never needs it).

cc.ChunkWriter.create("c.coldcrate", schema, compression="zstd", compression_level=19)
...
w.append(rid, row, compress=False)   # this blob is already compressed (e.g. JPEG)

Encryption

The passphrase is the only secret you supply. A random salt and scrypt parameters are stored in the header, so the file fully describes how to re-derive its own key; the same passphrase yields different ciphertext across chunks.

with cc.ChunkWriter.create(
    "secret.coldcrate", schema,
    compression="zstd", encryption="aes-256-xts", passphrase="correct horse",
    kdf=(18, 8, 1),                  # optional: raise scrypt log2n for cold storage
) as w:
    w.append(b"k", {...})

with cc.ChunkReader.open("secret.coldcrate", passphrase="correct horse") as r:
    entry = r.read_at(off)           # decrypted transparently

When a chunk is encrypted, its schema is encrypted too — field names are as sensitive as values. A keyless open() still exposes the header, resource_ids, integrity checks, and scan_raw() (stored bytes), but reader.schema is None and field decoding needs the key. A wrong passphrase fails at open() (the schema won't decrypt).

Threat model. AES-256-XTS provides confidentiality, not authentication (length-preserving → nowhere for a MAC). The XXH64 checksum detects corruption, not tampering (it's unkeyed). If active modification is in scope, layer an HMAC or signature over the chunk yourself.

Deletion (tombstones)

Append-only ⇒ deletion is logical. append_tombstone(rid) writes a marker (tombstone flag, empty payload); a reader returns it with tombstone is True and fields is None. Resolution is caller logic, like resource_id uniqueness:

w.append(b"img-001", {...})
w.append_tombstone(b"img-001")       # later marker logically deletes it

live = {}
for e in r.scan():
    if e.tombstone:
        live.pop(e.resource_id, None)
    else:
        live[e.resource_id] = e.offset

Durability & recovery

append() writes the entry and nothing else. The header's entry_count / tail_offset counters are committed on flush() / close(); flush(sync=True) adds fsync. tail_offset is written last as a commit marker: a reader trusts the cached counters only if tail_offset == file size, otherwise both read as None — never a misleading stale value.

After a crash, coldcrate.repair(path) scans the longest valid run of entries (checksum-validated, no passphrase needed), truncates trailing partial bytes, and rewrites the counters. ChunkWriter.open() refuses to append to a dirty chunk until you do this. A corrupt chunk never crashes the reader: scan() resyncs past damage and yields what's valid; any malformed input raises a clean ColdCrateError.

Concurrency

One chunk has a single writer: create() / open() take a best-effort advisory exclusive lock (fcntl.flock where available), so a second writer fails fast instead of interleaving. Readers take no lock — many ChunkReaders (and threads sharing one, since read_at is positional) can read concurrently, including while a writer appends. Parallelism across chunks is unrestricted: each chunk is an independent file.

Performance & scale

ColdCrate is streaming — one entry in memory at a time for both write and scan — so memory stays flat regardless of chunk or dataset size, and a single chunk can far exceed RAM. Indicative single-core throughput on 512 KiB incompressible payloads (a realistic entry size; already-compressed media — your numbers depend on data and hardware):

pipeline	write	scan	random `read_at`
plain	~2.5 GiB/s	~3.8 GiB/s	~4.3 GiB/s
+ zstd	~1.7 GiB/s	~3.3 GiB/s	~3.5 GiB/s
+ AES-256-XTS	~1.0 GiB/s	~2.0 GiB/s	~2.1 GiB/s

It's memory-bandwidth-bound at these sizes. Encryption roughly halves write throughput; the gap is much larger for tiny entries (a few KiB), where the per-entry cipher setup dominates rather than the AES itself — so size your entries accordingly. Because chunks are independent, aggregate throughput scales with cores — on a 22-core host, 8 parallel encrypted+zstd writers reach ~3.5 GiB/s vs ~700 MiB/s for one (~5×). For multi-TB datasets, shard across chunks and run roughly one writer per core (or per machine):

from concurrent.futures import ProcessPoolExecutor

def write_shard(task):
    path, rows = task
    with cc.ChunkWriter.create(path, SCHEMA, compression="zstd") as w:
        for rid, row in rows:
            w.append(rid, row)

with ProcessPoolExecutor(max_workers=16) as ex:
    list(ex.map(write_shard, shard_tasks))

Measure on your own hardware:

python benchmarks/bench.py                          # compression × encryption matrix
python benchmarks/bench.py --codec                  # pure encode/decode throughput
python benchmarks/bench.py --parallel-chunks 16     # multi-process scaling
python benchmarks/bench.py --stress --target-gb 10  # sustained large write
python benchmarks/gil_scaling_probe.py              # why encryption isn't threaded

Caveats & gotchas

Things worth knowing before you depend on it:

No built-in index. You must keep a manifest of offsets, or scan() to find things. This is by design.
No authentication. XTS protects confidentiality only; the checksum is unkeyed. Layer your own MAC/signature if tampering is a threat.
Reads decode the whole row. There's no lazy/partial field access — fetching one sub-field still materializes the entire entry.
Big single payloads use a few × their size in RAM transiently during compress/encrypt/decode. A single variable-length field caps at ~4 GiB; split larger blobs across entries.
Encrypted random access: reuse the reader. open() runs scrypt (tens of ms). Opening per read makes the KDF dominate — keep readers open / pool them.
Compression level isn't stored; it only affects the writer. Decompression works regardless.
One writer per chunk. Concurrent writers are blocked where flock exists, undefined where it doesn't (e.g. Windows) — keep single-writer discipline yourself there.
Pre-1.0 format. The on-disk layout (FORMAT_VERSION = 1) may change before 1.0; no cross-version compatibility guarantee yet.

API reference

Everything is re-exported from the top-level coldcrate package (import coldcrate as cc). The package ships py.typed (PEP 561) and is fully annotated, so mypy / pyright resolve every signature below.

Schema definition

`Schema(fields: list[Field], description: str | None = None, version: int = 1)`

The row definition embedded in a chunk. Validated on construction (SchemaError on a bad shape; nesting capped at 64).


`fields: list[Field]`	ordered field definitions (serialisation order)
`description: str \| None`	optional human description
`version: int`	schema-format version (default `1`)

Methods: encode_row(row: dict) -> bytes, decode_row(buf: bytes | bytearray | memoryview) -> dict, to_dict() -> dict, to_json_bytes() -> bytes, and classmethods from_dict(d: dict) -> Schema, from_json_bytes(raw: bytes) -> Schema.

`Field(name: str, type: TypeExpr, nullable: bool = False, description: str | None = None)`

One field. type is a type string or a composite (Struct / FixedArray / VarArray). nullable=True allows None (or omitting the key).

Type strings

"u8" "u16" "u32" "u64" "i8" "i16" "i32" "i64" "f32" "f64" "bool" "bytes" "utf8" "uuid" "timestamp"

`Struct(fields: list[Field])` · `FixedArray(elem: TypeExpr, count: int)` · `VarArray(elem: TypeExpr)`

Composite types. Struct.fields is a list[Field]; elem is any nested type; count is the fixed length (≥ 1).

Writing

`ChunkWriter.create(path: str | os.PathLike, schema: Schema, *, compression: str = "none", compression_level: int | None = None, encryption: str = "none", passphrase: str | bytes | None = None, kdf: tuple[int, int, int] | None = None, chunk_id: uuid.UUID | None = None, created_at: int | None = None) -> ChunkWriter`

Create a new chunk (exclusive create — FileExistsError if it exists) and write its header + schema.

param	meaning
`compression`	`"none"` / `"lz4"` / `"zstd"`
`compression_level`	backend level (zstd ~1–22, lz4 HC); `None` = default. Writer-side, not stored
`encryption`	`"none"` / `"aes-256-xts"` (requires `passphrase`)
`passphrase`	`str` / `bytes`; the only encryption secret
`kdf`	`(log2n, r, p)` scrypt cost; default `(15, 8, 1)`. `log2n ≤ 32`
`chunk_id` / `created_at`	override the generated UUID / Unix-µs timestamp

`ChunkWriter.open(path: str | os.PathLike, *, passphrase: str | bytes | None = None, compression_level: int | None = None) -> ChunkWriter`

Open an existing chunk to append more entries. Requires the passphrase if encrypted; raises InvalidChunkError on a dirty chunk (call repair() first), ColdCrateError if another writer holds the lock.

`writer.append(resource_id: bytes, row: dict, *, compress: bool | None = None, encrypt: bool | None = None) -> AppendResult`

Serialize row (a dict matching the schema), optionally compress + encrypt, and append it. resource_id is 1–512 bytes. compress / encrypt default to the chunk's settings; pass False to skip for this entry. Returns where it landed.

`writer.append_tombstone(resource_id: bytes) -> AppendResult`

Append a deletion marker (tombstone flag, empty payload) for resource_id.

`writer.append_many(items: Iterable[tuple[bytes, dict]]) -> list[AppendResult]`

Convenience over append for an iterable of (resource_id, row) pairs. No implicit flush.

`writer.flush(, sync: bool = False) -> None` · `writer.close(, sync: bool = False) -> None`

Commit the mutable header counters (and fsync if sync=True). close flushes then closes (also a context-manager exit). Properties: header, schema, entry_count, tail_offset.

`AppendResult`

offset: int (absolute offset of the entry) · checksum: int (XXH64 of resource_id ‖ stored payload). Feed these into your manifest.

Reading

`ChunkReader.open(path: str | os.PathLike, *, passphrase: str | bytes | None = None, mmap: bool = True) -> ChunkReader`

Open a chunk for reading. passphrase is needed only to decode fields of an encrypted chunk (header / resource_id / scan_raw work without it). mmap=True memory-maps for random access.

`reader.read_at(offset: int) -> Entry`

Read a single entry by absolute offset (from a manifest). Verifies the checksum (reported via Entry.checksum_ok, never raised). Raises InvalidEntryError if the offset isn't a valid entry, EncryptionError if encrypted and opened without a key.

`reader.scan(*, verify: bool = True) -> Iterator[Entry]`

Yield decoded entries from start to end. verify=True (default) checksums each entry and resyncs past corruption (cold-storage recovery); verify=False is faster, trusts the framing, and stops at the first anomaly (checksum_ok is None).

`reader.scan_raw(*, verify: bool = True) -> Iterator[RawEntry]`

Like scan but yields stored (still compressed/encrypted) payloads — needs no passphrase. Use for integrity patrol or copying.

Properties: header -> ChunkHeader, schema -> Schema | None (None for an encrypted chunk opened without the passphrase).

`Entry`

offset: int · resource_id: bytes · fields: dict | None (None for a tombstone) · checksum_ok: bool | None (None if unverified) · flags: int. Properties: tombstone, compressed, encrypted.

`RawEntry`

offset · resource_id · payload: bytes (stored form) · checksum_ok · flags, with the same flag properties.

Maintenance

`coldcrate.repair(path: str | os.PathLike) -> RepairResult`

Recover a chunk left dirty by a crash: scan the longest valid contiguous run (checksum-validated, keyless), truncate trailing partial bytes, and rewrite the header counters. Returns RepairResult(entry_count, tail_offset, truncated_bytes).

`ChunkHeader`

Frozen dataclass returned by reader.header / writer.header: version, flags, chunk_id, created_at, schema_size, compression, encryption, kdf_salt, kdf_log2n, kdf_r, kdf_p, plus entry_count and tail_offset (both int or None — non-None ⇒ exact) and a data_start property.

Errors

All derive from ColdCrateError:

exception	raised when
`InvalidChunkError`	not a valid chunk (bad magic/version/header, oversized schema, dirty on append)
`InvalidEntryError`	a malformed / truncated entry, or a row that doesn't fit the schema
`SchemaError`	invalid schema, or a row value that doesn't match it
`CompressionError`	missing backend, or a (de)compression failure
`EncryptionError`	missing passphrase/backend, or bad KDF parameters

Constants

coldcrate.__version__ · MAGIC (b"COLDCRT\0") · FORMAT_VERSION (1).

License

MIT © larryvrh

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

larryvrh

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.0

Jun 18, 2026

This version

0.1.0

Jun 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coldcrate-0.1.0.tar.gz (47.1 kB view details)

Uploaded Jun 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

coldcrate-0.1.0-py3-none-any.whl (38.5 kB view details)

Uploaded Jun 18, 2026 Python 3

File details

Details for the file coldcrate-0.1.0.tar.gz.

File metadata

Download URL: coldcrate-0.1.0.tar.gz
Upload date: Jun 18, 2026
Size: 47.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for coldcrate-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`b892fa6955f3c40dd8bca94651f9dec82da8d7e1f58c6cfa97ad99ffcd24122f`
MD5	`79a4990fe3e8fd441fc00c7af4fb5d9d`
BLAKE2b-256	`52aa4390e2484e5f0b631514e1da3b0bb30b3fd289094120b1664809a6a1e848`

See more details on using hashes here.

Provenance

The following attestation bundles were made for coldcrate-0.1.0.tar.gz:

Publisher: publish.yml on Larryvrh/coldcrate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: coldcrate-0.1.0.tar.gz
- Subject digest: b892fa6955f3c40dd8bca94651f9dec82da8d7e1f58c6cfa97ad99ffcd24122f
- Sigstore transparency entry: 1858589093
- Sigstore integration time: Jun 18, 2026
Source repository:
- Permalink: Larryvrh/coldcrate@864a62dac25316b0a07eb4715d65526ca563778f
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Larryvrh
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@864a62dac25316b0a07eb4715d65526ca563778f
- Trigger Event: push

File details

Details for the file coldcrate-0.1.0-py3-none-any.whl.

File metadata

Download URL: coldcrate-0.1.0-py3-none-any.whl
Upload date: Jun 18, 2026
Size: 38.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for coldcrate-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8c387054045fbee69e6251596642cc90d67c723baac035a58216ee15fc150466`
MD5	`c7ece366510e83ef68e933de7cbe2d4f`
BLAKE2b-256	`e8f7fbbf682d7ea5a13c28bd10568c0be46000ef110a88f40e4763b20e75fba8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for coldcrate-0.1.0-py3-none-any.whl:

Publisher: publish.yml on Larryvrh/coldcrate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: coldcrate-0.1.0-py3-none-any.whl
- Subject digest: 8c387054045fbee69e6251596642cc90d67c723baac035a58216ee15fc150466
- Sigstore transparency entry: 1858589174
- Sigstore integration time: Jun 18, 2026
Source repository:
- Permalink: Larryvrh/coldcrate@864a62dac25316b0a07eb4715d65526ca563778f
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Larryvrh
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@864a62dac25316b0a07eb4715d65526ca563778f
- Trigger Event: push

coldcrate 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

What is ColdCrate?

Why ColdCrate?

Install

Quick start

When to use it — and when not to

Core concepts

What's stored vs what you supply

Guide

Schema & types

Compression

Encryption

Deletion (tombstones)

Durability & recovery

Concurrency

Performance & scale

Caveats & gotchas

API reference

Schema definition

Schema(fields: list[Field], description: str | None = None, version: int = 1)

Field(name: str, type: TypeExpr, nullable: bool = False, description: str | None = None)

Type strings

Struct(fields: list[Field]) · FixedArray(elem: TypeExpr, count: int) · VarArray(elem: TypeExpr)

Writing

ChunkWriter.open(path: str | os.PathLike, *, passphrase: str | bytes | None = None, compression_level: int | None = None) -> ChunkWriter

writer.append(resource_id: bytes, row: dict, *, compress: bool | None = None, encrypt: bool | None = None) -> AppendResult

writer.append_tombstone(resource_id: bytes) -> AppendResult

writer.append_many(items: Iterable[tuple[bytes, dict]]) -> list[AppendResult]

writer.flush(*, sync: bool = False) -> None · writer.close(*, sync: bool = False) -> None

AppendResult

Reading

ChunkReader.open(path: str | os.PathLike, *, passphrase: str | bytes | None = None, mmap: bool = True) -> ChunkReader

reader.read_at(offset: int) -> Entry

reader.scan(*, verify: bool = True) -> Iterator[Entry]

reader.scan_raw(*, verify: bool = True) -> Iterator[RawEntry]

Entry

RawEntry

Maintenance

coldcrate.repair(path: str | os.PathLike) -> RepairResult

ChunkHeader

Errors

Constants

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`Schema(fields: list[Field], description: str | None = None, version: int = 1)`

`Field(name: str, type: TypeExpr, nullable: bool = False, description: str | None = None)`

`Struct(fields: list[Field])` · `FixedArray(elem: TypeExpr, count: int)` · `VarArray(elem: TypeExpr)`

`ChunkWriter.open(path: str | os.PathLike, *, passphrase: str | bytes | None = None, compression_level: int | None = None) -> ChunkWriter`

`writer.append(resource_id: bytes, row: dict, *, compress: bool | None = None, encrypt: bool | None = None) -> AppendResult`

`writer.append_tombstone(resource_id: bytes) -> AppendResult`

`writer.append_many(items: Iterable[tuple[bytes, dict]]) -> list[AppendResult]`

`writer.flush(, sync: bool = False) -> None` · `writer.close(, sync: bool = False) -> None`

`AppendResult`

`ChunkReader.open(path: str | os.PathLike, *, passphrase: str | bytes | None = None, mmap: bool = True) -> ChunkReader`

`reader.read_at(offset: int) -> Entry`

`reader.scan(*, verify: bool = True) -> Iterator[Entry]`

`reader.scan_raw(*, verify: bool = True) -> Iterator[RawEntry]`

`Entry`

`RawEntry`

`coldcrate.repair(path: str | os.PathLike) -> RepairResult`

`ChunkHeader`