Skip to main content

Metaflow extension: pre-bake conda environments into Docker images for fast cold starts

Project description

metaflow-prebuilt

A Metaflow extension that pre-bakes conda/PyPI dependencies into Docker images at deploy time. Tasks start immediately — no conda bootstrap overhead at runtime.

How it works

When you deploy a flow with --environment=prebuilt, the extension:

  1. Resolves the conda/PyPI environment spec for each step.
  2. Builds a Docker image with the environment already installed.
  3. Pushes the image to a registry.
  4. Configures the remote runner (Batch, Kubernetes, etc.) to pull that image instead of bootstrapping conda at task start.

Install

pip install metaflow-prebuilt

Optional extras for specific backends:

pip install 'metaflow-prebuilt[ecr]'       # ECR registry + CodeBuild service
pip install 'metaflow-prebuilt[gcr]'       # GCR registry
pip install 'metaflow-prebuilt[kaniko]'    # Kaniko build service (K8s)

Quick start

from metaflow import FlowSpec, step, conda

class MyFlow(FlowSpec):

    @conda(packages={"numpy": "1.26.4"})
    @step
    def start(self):
        import numpy
        print(numpy.__version__)
        self.next(self.end)

    @step
    def end(self):
        pass

if __name__ == "__main__":
    MyFlow()

Deploy with the prebuilt environment (requires a configured registry and build service — see below):

# Deploy: builds and pushes Docker images, then submits the flow
python my_flow.py --environment=prebuilt run

# Or on Batch/Kubernetes:
python my_flow.py --environment=prebuilt batch run

Configuration

Select backends via environment variables:

Variable Default Purpose
METAFLOW_PREBUILT_BUILD_SERVICE docker Which build service to use
METAFLOW_PREBUILT_IMAGE_REGISTRY dockerhub Which image registry to use

Extension points

DockerBuildService

Controls how Docker images are built and pushed. Select with METAFLOW_PREBUILT_BUILD_SERVICE=<name>.

Name Class Description
docker LocalDockerBuildService Local Docker daemon (docker build + docker push)
buildx BuildxBuildService Docker Buildx (docker buildx build --push), supports remote builders and cross-platform
kaniko KanikoBuildService Kaniko in a Kubernetes Job; uploads context to GCS/S3
codebuild CodeBuildService AWS CodeBuild; uploads context to S3, polls for completion

ImageRegistry

Controls where images are stored. Select with METAFLOW_PREBUILT_IMAGE_REGISTRY=<name>.

Name Class Description
dockerhub DockerHubRegistry Docker Hub (docker.io)
ecr ECRRegistry Amazon ECR
gcr GCRRegistry Google Container Registry
local LocalRegistry Local registry:2 container (dev/CI use)

Contributing a custom backend

Custom build service

Subclass DockerBuildService and implement build_and_push:

from metaflow_extensions.prebuilt.plugins.conda.build_service import DockerBuildService

class MyBuildService(DockerBuildService):
    def build_and_push(self, dockerfile, context_files, image_tag,
                       push_credentials, echo) -> bool:
        # build and push; return True on success, False on failure
        ...

Register it in your pyproject.toml:

[project.entry-points."metaflow_prebuilt.build_services"]
mybuild = "mypackage.my_module:MyBuildService"

Then use it with METAFLOW_PREBUILT_BUILD_SERVICE=mybuild.

Custom image registry

Subclass ImageRegistry and implement image_tag, push_credentials, and pull_config:

from metaflow_extensions.prebuilt.plugins.conda.image_registry import ImageRegistry

class MyRegistry(ImageRegistry):
    def image_tag(self, env_id) -> str: ...
    def push_credentials(self) -> dict: ...
    def pull_config(self, pull_tag) -> dict: ...

Register it:

[project.entry-points."metaflow_prebuilt.image_registries"]
myregistry = "mypackage.my_module:MyRegistry"

Then use it with METAFLOW_PREBUILT_IMAGE_REGISTRY=myregistry.

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metaflow_prebuilt-0.2.0.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metaflow_prebuilt-0.2.0-py3-none-any.whl (27.5 kB view details)

Uploaded Python 3

File details

Details for the file metaflow_prebuilt-0.2.0.tar.gz.

File metadata

  • Download URL: metaflow_prebuilt-0.2.0.tar.gz
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for metaflow_prebuilt-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b69c76aae8e14ba4bf8004bd0e172deeff59a8b4cfc98234e4cb6e0fd04b8aa7
MD5 680d6f6857aa59e2116751bd78a87821
BLAKE2b-256 4d2f1ce0bf8ac5de33ab19ec208603315658734dc6c23a4d82fc8af0e8b879ff

See more details on using hashes here.

Provenance

The following attestation bundles were made for metaflow_prebuilt-0.2.0.tar.gz:

Publisher: publish.yml on npow/metaflow-prebuilt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file metaflow_prebuilt-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for metaflow_prebuilt-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1ed76018a0ded292d70ef64976eb28e3a6b17557427c775a0e05fc09ba7d512f
MD5 3abf3989e9ec3c024034a075a526308b
BLAKE2b-256 fe565bd8d59a3abcb1e19d9c7098db65dafe3fcbe59cca559735c03510dbe4f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for metaflow_prebuilt-0.2.0-py3-none-any.whl:

Publisher: publish.yml on npow/metaflow-prebuilt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page