Skip to main content

Core runtime primitives for Seamless: content-addressed data model, cell type system, and buffer caching

Project description

seamless-core

seamless-core is the foundational data layer of the Seamless ecosystem. It provides a content-addressed data model built around two core abstractions — Checksum and Buffer — together with a cell type system for serializing and converting structured data, and a smart in-memory buffer cache.

While seamless-core underpins higher-level Seamless packages (seamless-config, seamless-remote, seamless-transformer, seamless-dask), it is also usable on its own as a content-addressed data serialization and caching library.

Core concepts

Checksum

Checksum wraps a SHA-256 hash as a first-class Python object. Beyond simple identity, it supports:

  • Construction from hex strings, raw bytes, or other Checksum instances.
  • resolve() — retrieve the corresponding buffer from the local cache or a remote server.
  • fingertip() — resolve with fallback to recomputation (if a transformation produced this checksum).
  • incref() / decref() / tempref() — reference counting to keep buffers alive in the cache.
  • load() / save() — file I/O (auto-appends .CHECKSUM extension).

Buffer

Buffer represents raw content (bytes) paired with an optional checksum. It bridges Python values and content-addressed storage:

  • Construct from raw bytes, or from a Python value plus a cell type: Buffer(value, celltype="plain").
  • get_value(celltype) — deserialize the buffer back to a Python object.
  • get_checksum() — compute the SHA-256 checksum (lazily, cached).
  • incref() / decref() / tempref() — manage buffer lifetime in the cache.
  • load() / save() — file I/O.

Cell types and conversions

Seamless defines 13 cell types that govern how Python values are serialized into buffers and deserialized back:

Cell type Serialized form
plain JSON (2-space indent, sorted keys)
text, python, ipython UTF-8 text (with AST/syntax validation for code types)
yaml UTF-8 YAML text
str, int, float, bool JSON scalar + newline
binary NumPy .npy format
bytes Raw bytes
mixed Umbrella format for heterogeneous data (nested dicts/lists containing numpy arrays, scalars, and strings). "plain" and "binary" are special cases of "mixed"
checksum Hex-encoded SHA-256 strings (or dicts/lists thereof)

A complete conversion matrix classifies every possible type-pair conversion:

  • Trivial — checksum-preserving, always safe (e.g. textbytes).
  • Reinterpret — checksum-preserving, may fail (reverse of trivial).
  • Reformat — may change checksum, always safe (e.g. bytesbinary).
  • Possible — may change checksum, may fail (e.g. mixedint).
  • Forbidden — requires value-level evaluation or is disallowed.

This conversion system ensures that type coercions across the Seamless ecosystem are well-defined and reproducible.

Buffer cache

The buffer cache is a dual weak/strong in-memory store:

  • Weak cache — buffers registered without references; eligible for garbage collection.
  • Strong cache — buffers with active references (incref or tempref); kept alive.

Temporary references (tempref) model decaying interest — useful for intermediate results that may or may not be needed again. When memory usage exceeds configurable soft/hard caps (default 5 GB / 50 GB), the cache evicts buffers in cost-aware order, considering download cost, recomputation cost, and buffer size.

Installation

pip install seamless-core

CLI scripts

Installing seamless-core also provides:

  • seamless-checksum — compute the SHA-256 checksum of a file.
  • seamless-checksum-file — compute and write a .CHECKSUM sidecar file.
  • seamless-checksum-index — build checksum indices for directories.

Development build

python -m pip install --upgrade build
python -m build

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seamless_core-0.1.0.tar.gz (53.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seamless_core-0.1.0-py3-none-any.whl (61.9 kB view details)

Uploaded Python 3

File details

Details for the file seamless_core-0.1.0.tar.gz.

File metadata

  • Download URL: seamless_core-0.1.0.tar.gz
  • Upload date:
  • Size: 53.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for seamless_core-0.1.0.tar.gz
Algorithm Hash digest
SHA256 88bea6d7345a266f8e176fd090132f8b39978a640d1218ea96afa406d93a0c44
MD5 8c4dd271fb49c52afc0b0382c5cfc1a9
BLAKE2b-256 64f2583951fc5d09bf9289c6a1787ea98270971a7ada4adfee6648efb9a43ceb

See more details on using hashes here.

File details

Details for the file seamless_core-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: seamless_core-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 61.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for seamless_core-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 52b0ffea1c3c3341968ad79450630614a913c1194eb79f375eeee70a0d650349
MD5 10ddbfc7447ad0ec0ddecb10656d55b5
BLAKE2b-256 ff6959704183031898ac00d6450553d9d2e9f9fcca44482d9f33ccaf0cf62883

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page