Skip to main content

ETL Subtask management library written in Rust

Project description

subtask-manager

subtask-manager is a Rust-powered Python package for discovering, classifying, loading, and rendering ETL subtasks from a filesystem structure.

It is designed for ETL projects where task metadata is encoded in folder names (entity, stage, system) and task content lives in files (.sql, .py, .sh, etc.).


Features

  • Fast core implementation in Rust (PyO3 extension module)
  • Python-friendly API
  • Recursive file scanning by supported extensions
  • Automatic classification of tasks from folder structure
  • Lazy loading of task contents
  • Rich filtering (stage, entity, system_type, task_type, is_common)
  • Parameter extraction and rendering with multiple placeholder styles
  • Immutable parameter application (returns new objects)

Installation

From PyPI

pip install subtask-manager

From source (local dev)

# Build extension and install in editable/dev mode
maturin develop

Or build wheels:

maturin build --release

Supported task types (by extension)

  • SQL: sql, psql, tsql, plpgsql
  • Shell: sh
  • PowerShell: ps1
  • Python: py
  • GraphQL: graphql, gql
  • JSON: json, jsonl
  • YAML: yaml, yml

Folder conventions

Classification is based on the file path relative to a base directory.

Expected relative folder depth: up to 3 components before the file.

Typical pattern:

<base>/<entity>/<stage>/<system>/<task_file>

Examples:

  • customers/01_extract/pg/extract_data.sql
  • orders/02_transform/duck/normalize.py

Common tasks

A file directly under <base> is treated as a common task:

<base>/shared.yaml

Enums and aliases

EtlStage

  • Setup
  • Extract
  • Transform
  • Load
  • Cleanup
  • Postprocessing
  • Other

Recognized aliases include names like:

  • 01_extract, extract, e, 01
  • etc.

SystemType

Includes:

  • PostgreSQL, Duckdb, Clickhouse, MySQL, OracleDB, SQLite, SqlServer, Vertica, Other

Example aliases:

  • pg, postgres, duck, duckdb, etc.

TaskType

  • Sql, Shell, Powershell, Python, Graphql, Json, Yaml, Other

Quick usage

from pathlib import Path
from subtask_manager import SubtaskManager, EtlStage, SystemType, ParamType

base = Path("tests/test_data/subtasks")
sm = SubtaskManager(base)

print(sm.base_path)
print(sm.num_files)
print(sm.file_paths[:3])

# Lazy-loaded subtasks
tasks = sm.subtasks
print(len(tasks))

# Get a single task
task = sm.get_task("extract_data.sql")
print(task.name, task.entity, task.stage, task.system_type)

# Filter tasks
extract_pg = sm.get_tasks(
    etl_stage=EtlStage.Extract,
    system_type=SystemType.PostgreSQL,
    include_common=False,
)
print(len(extract_pg))

# Inspect parameter names
params = task.get_params()
print(params)

# Apply parameters immutably
rendered = task.apply_parameters(
    {"date": "2025-01-01", "env": "prod"},
    styles=[ParamType.Curly, ParamType.DollarBrace],
    ignore_missing=True,
)

print(rendered.get_command())

Parameter styles

Supported placeholder styles:

  • Curly: {name}
  • Dollar: $name
  • DollarBrace: ${name}
  • DoubleCurly: {{name}}
  • DoubleUnderscore: __name__
  • Percent: %name%
  • Angle: <name>

Useful methods:

  • subtask.get_params(styles=None) -> set[str]
  • subtask.apply_parameters(params, styles=None, ignore_missing=False) -> Subtask
  • subtask.render_with_params(params, styles=None, ignore_missing=False) -> RenderedSubtask
  • subtask.render() -> Subtask
  • subtask.render_lightweight() -> RenderedSubtask
  • subtask.get_stored_params() -> dict[str, str]
  • subtask.get_command() -> str | None

Public classes

  • SubtaskManager
  • Subtask
  • RenderedSubtask
  • FileScanner
  • FileClassifier
  • EtlStage
  • SystemType
  • TaskType
  • ParamType

Development

Prerequisites

  • Rust toolchain
  • Python 3.12+
  • uv (recommended) or pip
  • maturin

Install dev dependencies

uv sync --dev

Run tests

cargo test
uv run -m pytest

or:

make test

Lint/format (Python)

uv run ruff check .
uv run ruff format .

Build and release

Cross-platform wheel publishing is automated with GitHub Actions.

See the full runbook:

It documents:

  • TestPyPI dry runs
  • PyPI production release flow
  • Trusted Publishing setup
  • version/tag conventions

Versioning notes

Keep versions aligned between:

  • Cargo.toml ([package].version)
  • pyproject.toml ([project].version)

Use Makefile version helpers (if present) to bump consistently.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subtask_manager-0.3.0.tar.gz (51.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

subtask_manager-0.3.0-cp39-abi3-win_amd64.whl (860.5 kB view details)

Uploaded CPython 3.9+Windows x86-64

subtask_manager-0.3.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

subtask_manager-0.3.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

subtask_manager-0.3.0-cp39-abi3-macosx_11_0_arm64.whl (950.1 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file subtask_manager-0.3.0.tar.gz.

File metadata

  • Download URL: subtask_manager-0.3.0.tar.gz
  • Upload date:
  • Size: 51.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for subtask_manager-0.3.0.tar.gz
Algorithm Hash digest
SHA256 8fe802ac8f83207136c44a3f865dd5fe5f17c9c20cd09351092f9c364b27d118
MD5 9b4289eab1b4ff23b6dffe0284d06171
BLAKE2b-256 7aab744d553537a9a629c675b6be65fe35415d8432d56848f52bb77d0bf2d34d

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtask_manager-0.3.0.tar.gz:

Publisher: release.yml on VladimirKosimovsky/subtask-manager

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file subtask_manager-0.3.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for subtask_manager-0.3.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 5c60069f1355cd990b939004e84a8dda2928bb7ebd9585929376e5006e39b2eb
MD5 ba849fd972dcdf98261ade242852c2e6
BLAKE2b-256 46d79b59c65cc0ac799546f0f060852af79e2f2dfce47b6fc976c96cfd30c96d

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtask_manager-0.3.0-cp39-abi3-win_amd64.whl:

Publisher: release.yml on VladimirKosimovsky/subtask-manager

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file subtask_manager-0.3.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for subtask_manager-0.3.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5334e2b9d643c2ae60a167bf7a1f3b9707c2b377183ed0a76e1f16987337a1e0
MD5 73a517c0a1ed60a49bced6309cbf6aa3
BLAKE2b-256 b60721c9970b1e95d3ee5f6e47a8b5512c3e33a33c7ab6a329cc2ab43764e86b

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtask_manager-0.3.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on VladimirKosimovsky/subtask-manager

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file subtask_manager-0.3.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for subtask_manager-0.3.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 feb3cf9553befd1bb5031db8055455a11dabbd3fce4bf3bd92502c0e52dece0a
MD5 76241c0587ab02f2978a31dc107854b9
BLAKE2b-256 9141694ea67add43312d8eb582cae9a37c805364ba49c6c2cc3020981e601182

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtask_manager-0.3.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on VladimirKosimovsky/subtask-manager

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file subtask_manager-0.3.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for subtask_manager-0.3.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9087f776072e3a54627d298d24ab0769ac5d772f5f251df477c9dca88c7c21b6
MD5 2eba2094c1888d9f8e507d20df28c21a
BLAKE2b-256 fdd191e62b321bdaf8b7a99800a0ff9c25a254d7c426dbde0ac36ef06f0191e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtask_manager-0.3.0-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on VladimirKosimovsky/subtask-manager

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page