Skip to main content

Computation engine for Seamless: execute checksum-addressed transformations in Python or bash

Project description

seamless-transformer

seamless-transformer is the computation engine of the Seamless framework. It takes a transformation — a pure-functional computation defined as a checksum-addressed dict of inputs, code, and language — and executes it, returning a result checksum. It supports Python and bash transformations, multi-process worker pools with shared-memory IPC, and integration with the Seamless caching and remote infrastructure.

Core concepts

A transformation in Seamless is a deterministic computation: given the same inputs and code (identified by their checksums), it always produces the same output. seamless-transformer is responsible for:

  1. Building the transformation dict from the inputs and code, then computing its checksum (which serves as the transformation's identity for caching).
  2. Building the execution namespace: resolving input buffers, compiling modules, injecting dependencies.
  3. Executing the code — either Python (via exec) or bash (via subprocess with file-mapped pins).
  4. Returning the result as a checksum, which can be cached and reused.

Worker pool

For production use, seamless-transformer can spawn a pool of worker processes (seamless_transformer.worker.spawn()). Workers run in separate processes using the spawn multiprocessing context, and communicate with the parent via a custom IPC channel built on multiprocessing.Connection and shared memory.

  • The parent distributes transformation requests to the least-loaded worker.
  • Workers can delegate sub-transformations back to the parent (which redistributes them).
  • Buffer data is exchanged through shared memory to avoid serialization overhead.
  • Workers automatically restart on crash (segfault, etc.).

Bounded parallel execution

In Python, for large batches of delayed transformations, use parallel() or parallel_async() instead of manually calling .start() and .run() on thousands of objects.

Integration with the Seamless ecosystem

  • seamless-core: provides the Checksum, Buffer, and buffer-cache primitives that seamless-transformer builds on.
  • seamless-dask: optionally offloads transformations to a Dask cluster (TransformationDaskMixin).
  • seamless-remote: used by the transformation cache to (a) look up cached results in the database before running, (b) access the buffer server for buffer data, and (c) submit transformations to the jobserver for remote execution (an alternative to local execution, not a cache lookup).
  • seamless-config: supplies project/stage selection for storage routing.
  • seamless-jobserver: depends on seamless-transformer to execute transformations received from the job queue.

CLI scripts

Installing seamless-transformer provides:

Command Description
seamless-run The CLI face of Seamless: wrap a bash command or pipeline as a transformation, using file/directory argument names as pin names
seamless-upload Upload input files/directories to the buffer server and write .CHECKSUM sidecar files, staging inputs for seamless-run
seamless-download Fetch result files/directories from the buffer server using .CHECKSUM sidecar files produced by seamless-run
seamless-run-transformation Universal transformation executor: run any Seamless transformation (Python, bash, or other) by checksum and print the result checksum
seamless-queue Run a queue server that executes seamless-run --qsubmit jobs concurrently — the CLI face's parallelization mechanism beyond &
seamless-queue-finish Signal the queue server to drain remaining jobs and shut down
seamless-mode-bind.sh Shell script: source it to bind seamless-mode commands and hotkeys into the current shell session

Installation

pip install seamless-transformer

Setting up seamless-mode

After installing, seamless-mode-bind.sh is available on your PATH. Source it in your shell session to activate the seamless-mode-on, seamless-mode-off, seamless-mode-toggle commands and the Ctrl-U U hotkey.

Manual (any environment) — add to ~/.bashrc or ~/.zshrc:

source $(which seamless-mode-bind.sh)

Conda — auto-activate with the environment:

cp $(which seamless-mode-bind.sh) $CONDA_PREFIX/etc/conda/activate.d/

venv / virtualenv — append to the environment's activate script:

echo "source $(which seamless-mode-bind.sh)" >> $VIRTUAL_ENV/bin/activate

virtualenvwrapper — add to the environment's postactivate hook:

echo "source $(which seamless-mode-bind.sh)" >> $VIRTUAL_ENV/bin/postactivate

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seamless_transformer-0.3.0.tar.gz (124.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seamless_transformer-0.3.0-py3-none-any.whl (125.4 kB view details)

Uploaded Python 3

File details

Details for the file seamless_transformer-0.3.0.tar.gz.

File metadata

  • Download URL: seamless_transformer-0.3.0.tar.gz
  • Upload date:
  • Size: 124.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for seamless_transformer-0.3.0.tar.gz
Algorithm Hash digest
SHA256 6d846986d9e115c10f1d58a35ff65c9d38b2f5c8db20d110abd6f1b6dece3739
MD5 67fedb46756d80fea2c15d2433e13f0f
BLAKE2b-256 2a73e2d557fe2dfa9f85c16453f49ce0aa2352e633f26f5ed59593c1d3b0e09c

See more details on using hashes here.

File details

Details for the file seamless_transformer-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for seamless_transformer-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 60e6bca9133b826338ccc3a4ca385100393d172624db505e9625955e1869d8e5
MD5 a52eb671fd125b4d70ab6c3860f62d75
BLAKE2b-256 1205b35cccea9970435d20e44b2310bee58991cc5f64aa0416a8f51bdad55460

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page