Skip to main content

Core runtime primitives for Seamless: content-addressed data model, cell type system, and buffer caching

Project description

seamless-core

seamless-core is the foundational data layer of the Seamless ecosystem. It provides a content-addressed data model built around two core abstractions — Checksum and Buffer — together with a cell type system for serializing and converting structured data, and a smart in-memory buffer cache.

While seamless-core underpins higher-level Seamless packages (seamless-config, seamless-remote, seamless-transformer, seamless-dask), it is also usable on its own as a content-addressed data serialization and caching library.

Core concepts

Checksum

Checksum wraps a SHA-256 hash as a first-class Python object. Beyond simple identity, it supports:

  • Construction from hex strings, raw bytes, or other Checksum instances.
  • resolve() — retrieve the corresponding buffer from the local cache or a remote server.
  • fingertip() — resolve with fallback to recomputation (if a transformation produced this checksum).
  • incref() / decref() / tempref() — reference counting to keep buffers alive in the cache.
  • load() / save() — file I/O (auto-appends .CHECKSUM extension).

Buffer

Buffer represents raw content (bytes) paired with an optional checksum. It bridges Python values and content-addressed storage:

  • Construct from raw bytes, or from a Python value plus a cell type: Buffer(value, celltype="plain").
  • get_value(celltype) — deserialize the buffer back to a Python object.
  • get_checksum() — compute the SHA-256 checksum (lazily, cached).
  • incref() / decref() / tempref() — manage buffer lifetime in the cache.
  • load() / save() — file I/O.

Cell types and conversions

Seamless defines 13 cell types that govern how Python values are serialized into buffers and deserialized back:

Cell type Serialized form
plain JSON (2-space indent, sorted keys)
text, python, ipython UTF-8 text (with AST/syntax validation for code types)
yaml UTF-8 YAML text
str, int, float, bool JSON scalar + newline
binary NumPy .npy format
bytes Raw bytes
mixed Umbrella format for heterogeneous data (nested dicts/lists containing numpy arrays, scalars, and strings). "plain" and "binary" are special cases of "mixed"
checksum Hex-encoded SHA-256 strings (or dicts/lists thereof)

A complete conversion matrix classifies every possible type-pair conversion:

  • Trivial — checksum-preserving, always safe (e.g. textbytes).
  • Reinterpret — checksum-preserving, may fail (reverse of trivial).
  • Reformat — may change checksum, always safe (e.g. bytesbinary).
  • Possible — may change checksum, may fail (e.g. mixedint).
  • Forbidden — requires value-level evaluation or is disallowed.

This conversion system ensures that type coercions across the Seamless ecosystem are well-defined and reproducible.

Buffer cache

The buffer cache is a dual weak/strong in-memory store:

  • Weak cache — buffers registered without references; eligible for garbage collection.
  • Strong cache — buffers with active references (incref or tempref); kept alive.

Temporary references (tempref) model decaying interest — useful for intermediate results that may or may not be needed again. When memory usage exceeds configurable soft/hard caps (default 5 GB / 50 GB), the cache evicts buffers in cost-aware order, considering download cost, recomputation cost, and buffer size.

Installation

pip install seamless-core

CLI scripts

Installing seamless-core also provides:

  • seamless-checksum — compute the SHA-256 checksum of a file.
  • seamless-checksum-file — compute and write a .CHECKSUM sidecar file.
  • seamless-checksum-index — build checksum indices for directories.

Development build

python -m pip install --upgrade build
python -m build

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seamless_core-0.1.1.tar.gz (53.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seamless_core-0.1.1-py3-none-any.whl (61.9 kB view details)

Uploaded Python 3

File details

Details for the file seamless_core-0.1.1.tar.gz.

File metadata

  • Download URL: seamless_core-0.1.1.tar.gz
  • Upload date:
  • Size: 53.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for seamless_core-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0373b7147309ff8829663dac2ee4a0b2b89febf0d6d99712defe82b7a5f88ff1
MD5 e5bd45f42a498db8515349563335334e
BLAKE2b-256 9bad650434d4d072d6d40491704d8771d5a7287fe1f63a0f4ece881ce1283b1f

See more details on using hashes here.

File details

Details for the file seamless_core-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: seamless_core-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 61.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for seamless_core-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dfc09406146220bc8c36e83247c6a5f598efc570f520e50e1e2ffcf29f0c72e1
MD5 9547b5818fffc35003acba8106b1ac6e
BLAKE2b-256 0a3684718ec568087ea4235738a8f766bba15e7b4171da4cfb86ba9980f73c55

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page