SQLite-backed metadata database service for Seamless
Project description
seamless-database
seamless-database is the checksum-based metadata and caching service for the Seamless framework. It acts as the distributed computation cache that allows Seamless workflows to avoid recomputing identical transformations, both within a single session and across the entire cluster.
How it works
Seamless uses content-addressed storage: every piece of data (buffers, code, parameters) is identified by its checksum. When a transformation (computation) is submitted, its inputs are hashed into a transformation checksum. Before executing the computation, Seamless components (such as seamless-dask) query the database: "has this transformation been computed before?" If a cached result is found, the result checksum is returned immediately, skipping the computation entirely.
The database stores the following kinds of records:
| Table | Purpose |
|---|---|
| Transformation | Maps a transformation checksum to its result checksum |
| RevTransformation | Reverse lookup: finds which transformations produced a given result |
| BufferInfo | Stores buffer metadata (length, dtype, encoding, etc.) for a checksum |
| SyntacticToSemantic | Maps between syntactic and semantic checksums per celltype |
| Expression | Caches expression evaluation results (input checksum + path + celltype → result checksum) |
| MetaData | Stores execution metadata for transformations (executor, environment, timing) |
| IrreproducibleTransformation | Records transformations whose results are not reproducible |
All data is persisted in a single SQLite file (typically seamless.db).
Role in the Seamless ecosystem
Other Seamless components interact with the database over HTTP:
- seamless-dask checks the database cache before scheduling a transformation on the Dask cluster, and writes results back after computation.
- seamless-remote provides the
DatabaseClient/DatabaseLaunchedClientclasses that other components use to communicate with the database server. - seamless-config defines the launch template for the database server (port range, host, timeout, read/write mode).
The server exposes a JSON-over-HTTP protocol: clients send {"type": "<record_type>", "checksum": "<hex>", ...} via GET (read) or PUT (write) requests.
Installation
pip install seamless-database
Usage
# Start a writable database server on a random port
seamless-database seamless.db --port-range 5520 5530 --writable
# Start a read-only server on a fixed port
seamless-database seamless.db --port 5522
If --port and --port-range are both omitted, seamless-database picks a random free port in the dynamic/private range (49152-65535).
Status-file protocol
seamless-database does not require a status file. If --status-file is omitted, it runs independently.
If --status-file is provided, the file is used for two things:
- Report the chosen port, especially when
--port-rangeis used. - Report whether startup succeeded (
"running") or failed ("failed").
The status-file protocol is simple:
- Wait for the status file to exist and parse it as JSON.
- Reuse the existing JSON object as the base payload. An empty JSON object
{}is sufficient. - Choose or validate its listening port.
- Once the HTTP server is up, rewrite the same file with
"status": "running"and the selected"port". - If startup fails before the server is running, rewrite the file with
"status": "failed"instead.
If remote-http-launcher is used, it may pre-populate the JSON with fields such as the PID, workdir, or "status": "starting". seamless-database preserves such fields when it writes back the final status.
CLI options
| Option | Description |
|---|---|
database_file |
Path to the SQLite file (created if it doesn't exist and --writable is set) |
--port PORT |
Fixed network port |
--port-range START END |
Pick a random free port from an inclusive range |
--host HOST |
Bind address (default: 0.0.0.0) |
--writable |
Allow PUT requests; opens the database in read/write mode |
--status-file FILE |
JSON file used to report server status (for process managers) |
--timeout SECONDS |
Stop the server after this many seconds of inactivity |
CLI scripts
Installing seamless-database also provides:
seamless-database
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file seamless_database-2.0.2.tar.gz.
File metadata
- Download URL: seamless_database-2.0.2.tar.gz
- Upload date:
- Size: 13.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2e03a45a0376e73f0b994ef90f6715d15b99f306494ee64e8eec048aa1efb5f
|
|
| MD5 |
25166b78619da43f2e6e6bab02615c92
|
|
| BLAKE2b-256 |
024d1d0d38b75b312dab8ffbf4568ff8fd662aed1c8409ca0da094c96c880c10
|
File details
Details for the file seamless_database-2.0.2-py3-none-any.whl.
File metadata
- Download URL: seamless_database-2.0.2-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
96fc5ff29bd99917ea2c938a4b79b79ae9409e0280db7ea1993e01d546d5e5bd
|
|
| MD5 |
459513de2eb6be355592c00a05d01ece
|
|
| BLAKE2b-256 |
1bd2d9fda82a6845bdbb0e4bf2ee2b2333b713e1efcf1c0f189bb67641f58a89
|