Add your description here
Project description
perago
perago is a typed Python runtime layer for Conductor workers that operate on versioned workspaces.
The first MVP targets LakeFS as the workspace backend. Task authors write ordinary typed Python functions; Perago owns task definition extraction, validation, worker process startup, workspace download/publish, guardrails, and Conductor completion.
Status
Early internal package. APIs are still being shaped before 1.0.
The current development slice implements the parts that do not require a live Conductor server or LakeFS server:
- task author API:
@task,WorkspaceSpec, guardrail helper functions, and groupedTaskControlsincluding explicit publish budgets; - import-time task validation for the single-task module contract;
perago checkdiagnostics for task declarations and local runtime config;perago extractgeneration of local Conductor TaskDef JSON.
perago start, Conductor polling/completion, LakeFS workspace download, and
LakeFS publication are integration-phase work and are intentionally not wired
to external services yet.
The current implementation target is documented in:
Task shape
Each Python module declares exactly one task worker. The function signature is the source of the business input and output contract.
from pathlib import Path
from pydantic import BaseModel, Field
from perago import WorkspaceSpec, require_dir, require_glob, task
class BuildFeaturesParams(BaseModel):
feature_set: str
min_rows: int = Field(ge=1)
class BuildFeaturesOutput(BaseModel):
row_count: int = Field(ge=0)
feature_count: int = Field(ge=0)
@task(
name="features.build",
description="Build feature parquet files.",
owner_email="data@example.com",
workspace=WorkspaceSpec(
prefix="/",
pre=[
require_dir("raw"),
require_glob("raw/**/*.parquet", min_count=1),
],
post=[
require_dir("features"),
require_glob("features/**/*.parquet", min_count=1),
],
),
)
def build_features(
workspace: Path,
params: BuildFeaturesParams,
) -> BuildFeaturesOutput:
return BuildFeaturesOutput(row_count=100, feature_count=24)
Workspace-free tasks use the same model without workspace: Path.
@task(
name="metadata.validate",
description="Validate metadata.",
owner_email="data@example.com",
)
def validate_metadata(params: ValidateMetadataParams) -> ValidateMetadataOutput:
return ValidateMetadataOutput(valid=True)
CLI
The perago command is a Typer CLI. MVP commands accept a Python module import path, not file paths or module:app targets.
perago check app.workers.features_build
perago extract app.workers.features_build --output generated/features.build.json
perago start app.workers.features_build -j 4
perago checkimports the module, validates the task declaration, validates Perago runtime config from.env, and reports CLI diagnostics.perago extractemits Conductor TaskDef JSON with embedded input/output schemas.perago startcurrently validates startup inputs and exits with a clear diagnostic until the Conductor/LakeFS worker integration is added.
Runtime configuration
Perago reads .env for local development. Real process environment variables take precedence over .env; .env only fills missing values.
CONDUCTOR_SERVER_URL=http://localhost:8080/api
CONDUCTOR_AUTH_KEY=...
CONDUCTOR_AUTH_SECRET=...
LAKECTL_SERVER_ENDPOINT_URL=http://localhost:8000
LAKECTL_CREDENTIALS_ACCESS_KEY_ID=...
LAKECTL_CREDENTIALS_SECRET_ACCESS_KEY=...
PERAGO_WORKSPACE_ROOT=/var/tmp/perago/workspaces
PERAGO_LOG_ROOT=/var/tmp/perago/logs
PERAGO_LOG_FILE_MAX_SIZE=100MB
PERAGO_LOG_RETENTION=30d
PERAGO_WORKER_ID_PREFIX=prodAFeaturesBuild
Runtime models and config validation use Pydantic. CLI commands use Typer. Runtime logs use loguru JSONL files with UTC+08:00 timestamps.
Conductor and LakeFS connection environment variables are parsed into local runtime config and checked for incomplete credential groups. perago check still does not connect to either service.
Workspace guardrails
Workspace guardrails are file-shape checks over the local workspace root exposed by WorkspaceSpec(prefix=...).
- task authors declare guardrails only through
require_file,require_dir,require_glob, andforbid_glob; - the internal guardrail model is not part of the public task author API;
- pre guardrail failure returns
FAILED_WITH_TERMINAL_ERROR; - post guardrail failure returns retryable
FAILED; - guardrail paths are relative workspace paths;
- absolute paths,
..segments, backslash-separated strings, and drive-qualified paths are rejected during module import andperago check; - invalid guardrail declarations fail import validation and
perago check.
Workspace runtime
For workspace tasks, Perago downloads the input workspace ref, runs the function against an attempt-local workspace directory, publishes changes through a staging LakeFS branch, attempts local cleanup, and then reports the task result to Conductor.
Every Conductor Task Attempt gets its own local workspace directory under PERAGO_WORKSPACE_ROOT; workspaces are not reused across attempts, task workers, or worker processes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file perago-0.1.0.tar.gz.
File metadata
- Download URL: perago-0.1.0.tar.gz
- Upload date:
- Size: 44.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
541766a834e71ca173e404cf032049ec9fcb3089247d61be0debf92a2352af3c
|
|
| MD5 |
d9e948d238e6ca4156c2097de0f5e228
|
|
| BLAKE2b-256 |
422c4f88adcb195e95eb93aca648ef9474e685ad8787753bf88693a7cc66ce01
|
Provenance
The following attestation bundles were made for perago-0.1.0.tar.gz:
Publisher:
publish.yml on Qiyin-Tech/perago
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
perago-0.1.0.tar.gz -
Subject digest:
541766a834e71ca173e404cf032049ec9fcb3089247d61be0debf92a2352af3c - Sigstore transparency entry: 1578766167
- Sigstore integration time:
-
Permalink:
Qiyin-Tech/perago@fea2d568dad22c62f6f1414e9bee01c7730e9073 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Qiyin-Tech
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fea2d568dad22c62f6f1414e9bee01c7730e9073 -
Trigger Event:
release
-
Statement type:
File details
Details for the file perago-0.1.0-py3-none-any.whl.
File metadata
- Download URL: perago-0.1.0-py3-none-any.whl
- Upload date:
- Size: 33.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac1de6aa93fd51bab924adfb7c68812662a9bfbc849df0fd9c35cd97cfa083ff
|
|
| MD5 |
b5f3a448167f2b6d1a1f57fa9b84c20f
|
|
| BLAKE2b-256 |
af263d626acd2facd2c2f07c68d04b43029dbb0cdd3e4dcbe9a54827509d383c
|
Provenance
The following attestation bundles were made for perago-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on Qiyin-Tech/perago
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
perago-0.1.0-py3-none-any.whl -
Subject digest:
ac1de6aa93fd51bab924adfb7c68812662a9bfbc849df0fd9c35cd97cfa083ff - Sigstore transparency entry: 1578766498
- Sigstore integration time:
-
Permalink:
Qiyin-Tech/perago@fea2d568dad22c62f6f1414e9bee01c7730e9073 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Qiyin-Tech
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fea2d568dad22c62f6f1414e9bee01c7730e9073 -
Trigger Event:
release
-
Statement type: