Skip to main content

SQLite-backed metadata database service for Seamless

Project description

seamless-database

seamless-database is the checksum-based metadata and caching service for the Seamless framework. It acts as the distributed computation cache that allows Seamless workflows to avoid recomputing identical transformations, both within a single session and across the entire cluster.

How it works

Seamless uses content-addressed storage: every piece of data (buffers, code, parameters) is identified by its checksum. When a transformation (computation) is submitted, its inputs are hashed into a transformation checksum. Before executing the computation, Seamless components (such as seamless-dask) query the database: "has this transformation been computed before?" If a cached result is found, the result checksum is returned immediately, skipping the computation entirely.

The database stores the following kinds of records:

Table Purpose
Transformation Maps a transformation checksum to its result checksum
RevTransformation Reverse lookup: finds which transformations produced a given result
BufferInfo Stores buffer metadata (length, dtype, encoding, etc.) for a checksum
SyntacticToSemantic Maps between syntactic and semantic checksums per celltype
Expression Caches expression evaluation results (input checksum + path + celltype → result checksum)
MetaData Stores execution metadata for transformations (executor, environment, timing)
IrreproducibleTransformation Records transformations whose results are not reproducible

All data is persisted in a single SQLite file (typically seamless.db).

Role in the Seamless ecosystem

Other Seamless components interact with the database over HTTP:

  • seamless-dask checks the database cache before scheduling a transformation on the Dask cluster, and writes results back after computation.
  • seamless-remote provides the DatabaseClient / DatabaseLaunchedClient classes that other components use to communicate with the database server.
  • seamless-config defines the launch template for the database server (port range, host, timeout, read/write mode).

The server exposes a JSON-over-HTTP protocol: clients send {"type": "<record_type>", "checksum": "<hex>", ...} via GET (read) or PUT (write) requests.

Installation

pip install seamless-database

Usage

# Start a writable database server on a random port
seamless-database seamless.db --port-range 5520 5530 --writable

# Start a read-only server on a fixed port
seamless-database seamless.db --port 5522

CLI options

Option Description
database_file Path to the SQLite file (created if it doesn't exist and --writable is set)
--port PORT Fixed network port
--port-range START END Pick a random free port from an inclusive range
--host HOST Bind address (default: 0.0.0.0)
--writable Allow PUT requests; opens the database in read/write mode
--status-file FILE JSON file used to report server status (for process managers)
--timeout SECONDS Stop the server after this many seconds of inactivity

CLI scripts

Installing seamless-database also provides:

  • seamless-database

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seamless_database-2.0.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seamless_database-2.0-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file seamless_database-2.0.tar.gz.

File metadata

  • Download URL: seamless_database-2.0.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for seamless_database-2.0.tar.gz
Algorithm Hash digest
SHA256 19f916e4f3bf8af6505cc61707042a52da7d003a933cffbe90b17969092cb4fc
MD5 5c5f9ab7b3c8b74095b51ee65cde7ccd
BLAKE2b-256 0124023ba63f33c388073d33baac5340cb68bf9c4533275184e41313d277126b

See more details on using hashes here.

File details

Details for the file seamless_database-2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for seamless_database-2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2c9c7cae3cd1e0716ac3d1da8c0704a7341f5f1518618ca616c157576313ac93
MD5 8d737530d4f17cbeb7803e1153dc7d17
BLAKE2b-256 13f01d899521978225d8dc6134f3d43dedb1b44e99c66e16d03231b715e86042

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page