Skip to main content

ETL Subtask management library written in Rust

Project description

subtask-manager

subtask-manager is a Rust-powered Python package for discovering, classifying, loading, and rendering ETL subtasks from a filesystem structure.

It is designed for ETL projects where task metadata is encoded in folder names (entity, stage, system) and task content lives in files (.sql, .py, .sh, etc.).


Features

  • Fast core implementation in Rust (PyO3 extension module)
  • Python-friendly API
  • Recursive file scanning by supported extensions
  • Automatic classification of tasks from folder structure
  • Lazy loading of task contents
  • Rich filtering (stage, entity, system_type, task_type, is_common)
  • Parameter extraction and rendering with multiple placeholder styles
  • Immutable parameter application (returns new objects)

Installation

From PyPI

pip install subtask-manager

From source (local dev)

# Build extension and install in editable/dev mode
maturin develop

Or build wheels:

maturin build --release

Supported task types (by extension)

  • SQL: sql, psql, tsql, plpgsql
  • Shell: sh
  • PowerShell: ps1
  • Python: py
  • GraphQL: graphql, gql
  • JSON: json, jsonl
  • YAML: yaml, yml

Folder conventions

Classification is based on the file path relative to a base directory.

Expected relative folder depth: up to 3 components before the file.

Typical pattern:

<base>/<entity>/<stage>/<system>/<task_file>

Examples:

  • customers/01_extract/pg/extract_data.sql
  • orders/02_transform/duck/normalize.py

Common tasks

A file directly under <base> is treated as a common task:

<base>/shared.yaml

Enums and aliases

EtlStage

  • Setup
  • Extract
  • Transform
  • Load
  • Cleanup
  • Postprocessing
  • Other

Recognized aliases include names like:

  • 01_extract, extract, e, 01
  • etc.

SystemType

Includes:

  • PostgreSQL, Duckdb, Clickhouse, MySQL, OracleDB, SQLite, SqlServer, Vertica, Other

Example aliases:

  • pg, postgres, duck, duckdb, etc.

TaskType

  • Sql, Shell, Powershell, Python, Graphql, Json, Yaml, Other

Quick usage

from pathlib import Path
from subtask_manager import SubtaskManager, EtlStage, SystemType, ParamType

base = Path("tests/test_data/subtasks")
sm = SubtaskManager(base)

print(sm.base_path)
print(sm.num_files)
print(sm.file_paths[:3])

# Lazy-loaded subtasks
tasks = sm.subtasks
print(len(tasks))

# Get a single task
task = sm.get_task("extract_data.sql")
print(task.name, task.entity, task.stage, task.system_type)

# Filter tasks
extract_pg = sm.get_tasks(
    etl_stage=EtlStage.Extract,
    system_type=SystemType.PostgreSQL,
    include_common=False,
)
print(len(extract_pg))

# Inspect parameter names
params = task.get_params()
print(params)

# Apply parameters immutably
rendered = task.apply_parameters(
    {"date": "2025-01-01", "env": "prod"},
    styles=[ParamType.Curly, ParamType.DollarBrace],
    ignore_missing=True,
)

print(rendered.get_command())

Parameter styles

Supported placeholder styles:

  • Curly: {name}
  • Dollar: $name
  • DollarBrace: ${name}
  • DoubleCurly: {{name}}
  • DoubleUnderscore: __name__
  • Percent: %name%
  • Angle: <name>

Useful methods:

  • subtask.get_params(styles=None) -> set[str]
  • subtask.apply_parameters(params, styles=None, ignore_missing=False) -> Subtask
  • subtask.render_with_params(params, styles=None, ignore_missing=False) -> RenderedSubtask
  • subtask.render() -> Subtask
  • subtask.render_lightweight() -> RenderedSubtask
  • subtask.get_stored_params() -> dict[str, str]
  • subtask.get_command() -> str | None

Public classes

  • SubtaskManager
  • Subtask
  • RenderedSubtask
  • FileScanner
  • FileClassifier
  • EtlStage
  • SystemType
  • TaskType
  • ParamType

Development

Prerequisites

  • Rust toolchain
  • Python 3.12+
  • uv (recommended) or pip
  • maturin

Install dev dependencies

uv sync --dev

Run tests

cargo test
uv run -m pytest

or:

make test

Lint/format (Python)

uv run ruff check .
uv run ruff format .

Build and release

Cross-platform wheel publishing is automated with GitHub Actions.

See the full runbook:

It documents:

  • TestPyPI dry runs
  • PyPI production release flow
  • Trusted Publishing setup
  • version/tag conventions

Versioning notes

Keep versions aligned between:

  • Cargo.toml ([package].version)
  • pyproject.toml ([project].version)

Use Makefile version helpers (if present) to bump consistently.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subtask_manager-0.2.4.tar.gz (51.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

subtask_manager-0.2.4-cp39-abi3-win_amd64.whl (859.8 kB view details)

Uploaded CPython 3.9+Windows x86-64

subtask_manager-0.2.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file subtask_manager-0.2.4.tar.gz.

File metadata

  • Download URL: subtask_manager-0.2.4.tar.gz
  • Upload date:
  • Size: 51.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for subtask_manager-0.2.4.tar.gz
Algorithm Hash digest
SHA256 e977a34ba0633e4c68f6a1b3a1ea1e8c7fa92d5410c4de3b7bc79c44471c90f5
MD5 298055ecc5a8c0e546d4b6b256717fdb
BLAKE2b-256 56b799bb36ce93be26c351c6363be732559a232db54a95969fa758c05a986ecd

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtask_manager-0.2.4.tar.gz:

Publisher: release.yml on VladimirKosimovsky/subtask-manager

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file subtask_manager-0.2.4-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for subtask_manager-0.2.4-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 3f7ea6beb77ab4f5a7de541659fe673e2f2c98c06e23b36a794b14c49218e96f
MD5 6bcc76b7f3628c1e218f226b5491a5f7
BLAKE2b-256 7128ae4216b394f36e52ffdd4c752ea09442a9a34d2897218bbefc1256060224

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtask_manager-0.2.4-cp39-abi3-win_amd64.whl:

Publisher: release.yml on VladimirKosimovsky/subtask-manager

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file subtask_manager-0.2.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for subtask_manager-0.2.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7bbdace76c1e8092badad45b1c0a11d75e45bba9d6c6fb3fd19b46568c06d726
MD5 5c370d2d11f4fc21d40efd6108f316ad
BLAKE2b-256 e0a23c5ffdca88134143c78076ecfa0f55023047b57d875d220143eb78962c8a

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtask_manager-0.2.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on VladimirKosimovsky/subtask-manager

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page