Skip to main content

Remote clients and launch abstractions for Seamless services

Project description

seamless-remote

seamless-remote is the client connectivity layer of the Seamless ecosystem. It provides async HTTP clients for every remote service that a Seamless workflow can talk to — buffer storage, transformation cache, jobserver, and Dask scheduler — together with launch-aware wrappers that start those services on demand via remote-http-launcher. When seamless-config calls init() or set_stage(), it is seamless-remote that opens the actual connections and keeps them alive.

This is an internal infrastructure package. User workflow code never imports seamless-remote directly — it is activated behind the scenes by seamless-config and consumed by seamless-core, seamless-transformer, and seamless-dask. The only user-facing entry points are the two CLI scripts described below.

Core concepts

Client hierarchy

Every remote service is accessed through a pair of classes:

Base client Launched wrapper Service
BufferClient BufferLaunchedClient Content-addressed buffer store (hashserver)
DatabaseClient DatabaseLaunchedClient Transformation result cache (seamless-database)
JobserverClient JobserverLaunchedClient HTTP job dispatcher (seamless-jobserver)
DaskserverLaunchedHandle Dask scheduler (seamless-dask-wrapper)

Base clients (BufferClient, DatabaseClient, JobserverClient) perform async HTTP against a known host and port. They inherit from a shared Client base that manages per-thread aiohttp sessions, a retry decorator for transient failures, and a background keepalive thread that healthchecks open connections.

Launched wrappers extend the base clients with auto-launch: they call seamless_config.tools.configure_*() to build a launch dict, pass it to remote_http_launcher.run(), and cache the resulting server address. If the server is already running, the cache returns the existing connection. DaskserverLaunchedHandle follows the same pattern but is fully synchronous — it constructs a distributed.Client and wraps it in a SeamlessDaskClient.

Activation modules

Each service type has an activation module that manages the active set of clients and exposes the async functions consumed by the rest of the Seamless stack:

Module Key functions Used by
buffer_remote get_buffer(), write_buffer(), get_buffer_lengths(), promise() seamless-core (Checksum.resolve, Buffer.write)
database_remote get_transformation_result(), set_transformation_result(), get_rev_transformations() seamless-transformer (cache lookup/store)
jobserver_remote run_transformation() seamless-transformer (remote job dispatch)
daskserver_remote activate(), deactivate() seamless-config (stage changes)

Each module maintains separate lists of read and write clients (or a single launched handle for the daskserver). activate() is called by seamless-config during stage transitions; it instantiates the appropriate clients from the cluster definition and makes them available to downstream consumers.

Client types: launched vs extern

Clients can be registered in two ways:

  • Launchedseamless-remote starts the service itself (via remote-http-launcher). Configuration comes from the cluster definition in seamless.yaml / seamless.profile.yaml.
  • Extern — the service is already running and a URL (or local directory, for buffer folders) is provided directly. Useful for shared infrastructure or debugging.

Both are registered through define_launched_client() and define_extern_client() on the activation module and are selected during activate().


Relation to the Seamless ecosystem

    Seamless runtime                   (Buffer/Checksum, direct/delayed, stages)
        │
        │  resolve/write buffers, check/store cached results,
        │  activate backends and delegate jobs to jobserver or daskserver
        ▼
    seamless-remote                    ◄── this package
        │
        │  async HTTP (aiohttp)
        ▼
    ┌──────────────┐  ┌──────────────────┐  ┌─────────────────┐  ┌────────────────────────┐
    │  hashserver   │  │ seamless-database │  │ seamless-jobserver│  │ seamless-dask-wrapper   │
    │  (buffers)    │  │  (result cache)   │  │  (job dispatch)  │  │  (Dask scheduler)       │
    └──────────────┘  └──────────────────┘  └─────────────────┘  └────────────────────────┘
                                                                          │
                                                          Dask workers run seamless-transformer,
                                                          which calls seamless-remote again
                                                          for nested transformations

The Seamless runtime above this layer consists mainly of seamless-core, seamless-transformer, and seamless-config. seamless-core uses seamless-remote for buffer resolution and writes, seamless-transformer uses it for buffer access, cache lookup/store, and job delegation, and seamless-config activates the appropriate backends during stage changes. seamless-remote in turn talks to four remote services: hashserver for buffers, seamless-database for transformation results, seamless-jobserver for lightweight job dispatch, and seamless-dask-wrapper (part of seamless-dask) for Dask-based execution. Inside Dask workers, the same path repeats — seamless-transformer runs again and calls seamless-remote for buffer/cache operations on nested transformations.

seamless-config is the only package that calls activate() / deactivate() directly. All other packages interact with seamless-remote through the module-level async functions (get_buffer, run_transformation, etc.).


Delegation levels

seamless-remote enables the tiered delegation model defined by seamless-config stages:

Level What seamless-remote provides
0 — in-process Nothing; all buffers are held in the client.
1 — persistent storage buffer_remote writes/reads buffers via the cluster's hashserver.
2 — cached execution Additionally, database_remote checks and records transformation results.
3 — remote execution Additionally, jobserver_remote or daskserver_remote delegates computation.

Moving between levels is a configuration change (seamless.yaml / seamless.profile.yaml), not a code change.


CLI scripts

Installing seamless-remote provides two utilities for working with content-addressed data from the command line:

Command Description
seamless-resolve Resolve a buffer from its SHA-256 checksum and write it to stdout or a file.
seamless-fingertip Like seamless-resolve, but uses fingertip resolution (with fallback to recomputation).

Both accept a --project and --stage flag to select the storage context, and read remote client configuration from environment variables or config files.

# Resolve a checksum to a file
seamless-resolve abc123...def --output result.bin

# Resolve with project/stage context
seamless-resolve abc123...def --project myproject --stage prod --output result.bin

# Fingertip (resolve with recomputation fallback)
seamless-fingertip abc123...def --output result.bin

Environment variables

Variable Default Effect
SEAMLESS_REMOTE_CONNECT_TIMEOUT 10 HTTP connect timeout (seconds)
SEAMLESS_REMOTE_READ_TIMEOUT 1200 HTTP read timeout (seconds) — set high to accommodate hashserver integrity checks on large buffers
SEAMLESS_REMOTE_TOTAL_TIMEOUT none Total request timeout (seconds)
SEAMLESS_REMOTE_HEALTHCHECK_TIMEOUT 10 Keepalive healthcheck timeout (seconds)
SEAMLESS_DATABASE_MAX_INFLIGHT 30 Maximum concurrent in-flight database requests (semaphore)
SEAMLESS_ALLOW_REMOTE_CLIENTS_IN_WORKER false Allow remote clients in child worker processes
SEAMLESS_DEBUG_REMOTE_DB Enable debug logging for database_remote
SEAMLESS_CLIENT_DEBUG Enable debug logging for client session lifecycle

Installation

pip install seamless-remote

Requires Python >= 3.10. Dependencies: seamless-core, seamless-config, aiohttp, aiofiles, frozendict.

Optional (activated at runtime when needed): remote-http-launcher (for launched clients), seamless-dask and distributed (for daskserver integration).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seamless_remote-0.2.0.tar.gz (25.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seamless_remote-0.2.0-py3-none-any.whl (27.3 kB view details)

Uploaded Python 3

File details

Details for the file seamless_remote-0.2.0.tar.gz.

File metadata

  • Download URL: seamless_remote-0.2.0.tar.gz
  • Upload date:
  • Size: 25.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for seamless_remote-0.2.0.tar.gz
Algorithm Hash digest
SHA256 90d4f4f3257d0f64c254abdcfb38e2d254f0ef80f15c618356fc52f90486ba90
MD5 b2b01821be0bb8d018f7b29c9f7f2cf7
BLAKE2b-256 558857684c84e6ebcbeecdd5e05d8e763b76c1f990b0be69baadabba04437717

See more details on using hashes here.

File details

Details for the file seamless_remote-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for seamless_remote-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2d2becc5bfd4ae8fdd3e9e166790ccd146548345d390074fa68fcebfd6100c2a
MD5 acf6bc0ae5b8ea979583f4192f10e99c
BLAKE2b-256 e646380f9e4d9f0b51961094bfdb62b645029511d35714356c873cad43368dec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page