Skip to main content

SQLite-backed metadata database service for Seamless

Project description

seamless-database

seamless-database is the checksum-based metadata and caching service for the Seamless framework. It acts as the distributed computation cache that allows Seamless workflows to avoid recomputing identical transformations, both within a single session and across the entire cluster.

How it works

Seamless uses content-addressed storage: every piece of data (buffers, code, parameters) is identified by its checksum. When a transformation (computation) is submitted, its inputs are hashed into a transformation checksum. Before executing the computation, Seamless components (such as seamless-dask) query the database: "has this transformation been computed before?" If a cached result is found, the result checksum is returned immediately, skipping the computation entirely.

The database stores the following kinds of records:

Table Purpose
Transformation Maps a transformation checksum to its result checksum
RevTransformation Reverse lookup: finds which transformations produced a given result
BufferInfo Stores buffer metadata (length, dtype, encoding, etc.) for a checksum
SyntacticToSemantic Maps between syntactic and semantic checksums per celltype
Expression Caches expression evaluation results (input checksum + path + celltype → result checksum)
MetaData Stores execution metadata for transformations (executor, environment, timing)
IrreproducibleTransformation Records transformations whose results are not reproducible

All data is persisted in a single SQLite file (typically seamless.db).

Role in the Seamless ecosystem

Other Seamless components interact with the database over HTTP:

  • seamless-dask checks the database cache before scheduling a transformation on the Dask cluster, and writes results back after computation.
  • seamless-remote provides the DatabaseClient / DatabaseLaunchedClient classes that other components use to communicate with the database server.
  • seamless-config defines the launch template for the database server (port range, host, timeout, read/write mode).

The server exposes a JSON-over-HTTP protocol: clients send {"type": "<record_type>", "checksum": "<hex>", ...} via GET (read) or PUT (write) requests.

Installation

pip install seamless-database

Usage

# Start a writable database server on a random port
seamless-database seamless.db --port-range 5520 5530 --writable

# Start a read-only server on a fixed port
seamless-database seamless.db --port 5522

CLI options

Option Description
database_file Path to the SQLite file (created if it doesn't exist and --writable is set)
--port PORT Fixed network port
--port-range START END Pick a random free port from an inclusive range
--host HOST Bind address (default: 0.0.0.0)
--writable Allow PUT requests; opens the database in read/write mode
--status-file FILE JSON file used to report server status (for process managers)
--timeout SECONDS Stop the server after this many seconds of inactivity

CLI scripts

Installing seamless-database also provides:

  • seamless-database

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seamless_database-2.0.1.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seamless_database-2.0.1-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file seamless_database-2.0.1.tar.gz.

File metadata

  • Download URL: seamless_database-2.0.1.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for seamless_database-2.0.1.tar.gz
Algorithm Hash digest
SHA256 22258628c36785ad6c842e7b0d1e585872c07dc9c396b60918da764ba0e20d0c
MD5 31794b300fa49fb84a2f32fb1d3568a9
BLAKE2b-256 f230b23ba8476274e306174ac614b3d26f47fac13a9920c58fbaa5836b9426d0

See more details on using hashes here.

File details

Details for the file seamless_database-2.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for seamless_database-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 adaf92fec7a50320a2e071adeb7115785350d8f0ab7d953f8daea618a7ccb238
MD5 fb27bb1c949309663cbe3fcdfe24f2fa
BLAKE2b-256 c66e64ac630f1a43cfc24cab713d12bceceaf7c0cdfa37b6f3542002b3b690a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page