Skip to main content

ETL Subtask management library written in Rust

Project description

subtask-manager

subtask-manager is a Rust-powered Python package for discovering, classifying, loading, and rendering ETL subtasks from a filesystem structure.

It is designed for ETL projects where task metadata is encoded in folder names (entity, stage, system) and task content lives in files (.sql, .py, .sh, etc.).


Features

  • Fast core implementation in Rust (PyO3 extension module)
  • Python-friendly API
  • Recursive file scanning by supported extensions
  • Automatic classification of tasks from folder structure
  • Lazy loading of task contents
  • Rich filtering (stage, entity, system_type, task_type, is_common)
  • Parameter extraction and rendering with multiple placeholder styles
  • Immutable parameter application (returns new objects)

Installation

From PyPI

pip install subtask-manager

From source (local dev)

# Build extension and install in editable/dev mode
maturin develop

Or build wheels:

maturin build --release

Supported task types (by extension)

  • SQL: sql, psql, tsql, plpgsql
  • Shell: sh
  • PowerShell: ps1
  • Python: py
  • GraphQL: graphql, gql
  • JSON: json, jsonl
  • YAML: yaml, yml

Folder conventions

Classification is based on the file path relative to a base directory.

Expected relative folder depth: up to 3 components before the file.

Typical pattern:

<base>/<entity>/<stage>/<system>/<task_file>

Examples:

  • customers/01_extract/pg/extract_data.sql
  • orders/02_transform/duck/normalize.py

Common tasks

A file directly under <base> is treated as a common task:

<base>/shared.yaml

Enums and aliases

EtlStage

  • Setup
  • Extract
  • Transform
  • Load
  • Cleanup
  • Postprocessing
  • Other

Recognized aliases include names like:

  • 01_extract, extract, e, 01
  • etc.

SystemType

Includes:

  • PostgreSQL, Duckdb, Clickhouse, MySQL, OracleDB, SQLite, SqlServer, Vertica, Other

Example aliases:

  • pg, postgres, duck, duckdb, etc.

TaskType

  • Sql, Shell, Powershell, Python, Graphql, Json, Yaml, Other

Quick usage

from pathlib import Path
from subtask_manager import SubtaskManager, EtlStage, SystemType, ParamType

base = Path("tests/test_data/subtasks")
sm = SubtaskManager(base)

print(sm.base_path)
print(sm.num_files)
print(sm.file_paths[:3])

# Lazy-loaded subtasks
tasks = sm.subtasks
print(len(tasks))

# Get a single task
task = sm.get_task("extract_data.sql")
print(task.name, task.entity, task.stage, task.system_type)

# Filter tasks
extract_pg = sm.get_tasks(
    etl_stage=EtlStage.Extract,
    system_type=SystemType.PostgreSQL,
    include_common=False,
)
print(len(extract_pg))

# Inspect parameter names
params = task.get_params()
print(params)

# Apply parameters immutably
rendered = task.apply_parameters(
    {"date": "2025-01-01", "env": "prod"},
    styles=[ParamType.Curly, ParamType.DollarBrace],
    ignore_missing=True,
)

print(rendered.get_command())

Parameter styles

Supported placeholder styles:

  • Curly: {name}
  • Dollar: $name
  • DollarBrace: ${name}
  • DoubleCurly: {{name}}
  • DoubleUnderscore: __name__
  • Percent: %name%
  • Angle: <name>

Useful methods:

  • subtask.get_params(styles=None) -> set[str]
  • subtask.apply_parameters(params, styles=None, ignore_missing=False) -> Subtask
  • subtask.render_with_params(params, styles=None, ignore_missing=False) -> RenderedSubtask
  • subtask.render() -> Subtask
  • subtask.render_lightweight() -> RenderedSubtask
  • subtask.get_stored_params() -> dict[str, str]
  • subtask.get_command() -> str | None

Public classes

  • SubtaskManager
  • Subtask
  • RenderedSubtask
  • FileScanner
  • FileClassifier
  • EtlStage
  • SystemType
  • TaskType
  • ParamType

Development

Prerequisites

  • Rust toolchain
  • Python 3.12+
  • uv (recommended) or pip
  • maturin

Install dev dependencies

uv sync --dev

Run tests

cargo test
uv run -m pytest

or:

make test

Lint/format (Python)

uv run ruff check .
uv run ruff format .

Build and release

Cross-platform wheel publishing is automated with GitHub Actions.

See the full runbook:

It documents:

  • TestPyPI dry runs
  • PyPI production release flow
  • Trusted Publishing setup
  • version/tag conventions

Versioning notes

Keep versions aligned between:

  • Cargo.toml ([package].version)
  • pyproject.toml ([project].version)

Use Makefile version helpers (if present) to bump consistently.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subtask_manager-0.2.6.tar.gz (51.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

subtask_manager-0.2.6-cp39-abi3-win_amd64.whl (860.2 kB view details)

Uploaded CPython 3.9+Windows x86-64

subtask_manager-0.2.6-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

subtask_manager-0.2.6-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

subtask_manager-0.2.6-cp39-abi3-macosx_11_0_arm64.whl (950.3 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

subtask_manager-0.2.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file subtask_manager-0.2.6.tar.gz.

File metadata

  • Download URL: subtask_manager-0.2.6.tar.gz
  • Upload date:
  • Size: 51.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for subtask_manager-0.2.6.tar.gz
Algorithm Hash digest
SHA256 3feb895f4bbeb532e60b4f3c379b7895c4460abd65e22e7634d50f5381fe6225
MD5 4559ceccf59338e3718fa9648d0814c9
BLAKE2b-256 19e6acd685a58bd178560dffb03f6c6cac21c07b442fcc03e4cc722a95933362

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtask_manager-0.2.6.tar.gz:

Publisher: release.yml on VladimirKosimovsky/subtask-manager

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file subtask_manager-0.2.6-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for subtask_manager-0.2.6-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 2e89c6a569e366f2b8d53d9415ea94460e538ed5006b54545b40ba643082db09
MD5 272d6b69550a2784a991e29c49a25125
BLAKE2b-256 b0e30a2377e7e2e76b0622f8b61f27e84c487a6643b0472a5af74725f70dfada

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtask_manager-0.2.6-cp39-abi3-win_amd64.whl:

Publisher: release.yml on VladimirKosimovsky/subtask-manager

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file subtask_manager-0.2.6-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for subtask_manager-0.2.6-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5587cbaa05dbf82d040c6ceb6b439c5e5ee510903cc6909576e6e62ac08bb605
MD5 ef442164f1c34e1533ecb06367cd2463
BLAKE2b-256 485928fc1b1f272335965366ed8061e84de94d619bcf946ccc07a2ececaa0d18

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtask_manager-0.2.6-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on VladimirKosimovsky/subtask-manager

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file subtask_manager-0.2.6-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for subtask_manager-0.2.6-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 f4c7bb9b9bdb943a1734bcab758b4aaf2f47b86a5023959eb1586da6d194471c
MD5 4ed282966a96ba054c5cbea83c1b418b
BLAKE2b-256 30dc4780f5de1c20bf9c7a93d0359f35f35f8e8f1b39f835bd5f90d811d20455

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtask_manager-0.2.6-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on VladimirKosimovsky/subtask-manager

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file subtask_manager-0.2.6-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for subtask_manager-0.2.6-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5e5e663974e7e8d211e735b2fda40b3bb9cc26c0925ba291f8e6fe16bbbbb126
MD5 476d018bb3b1c868d8bbb3f46599144b
BLAKE2b-256 b00d72e577ae6a76536d73d76baf0bce9bb6c5bc9156f5f1d30266137672339b

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtask_manager-0.2.6-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on VladimirKosimovsky/subtask-manager

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file subtask_manager-0.2.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for subtask_manager-0.2.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 70667967c758c3b91b1e2015e9b1b48c6ca521d9c8e3bfe2068bcb7e72a1a26e
MD5 71c0ab0fa198e213d95eb650aa86218e
BLAKE2b-256 6c3ed4467f957b3087e80286732ce6ccd3806208786ff961ff847ead1ff32578

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtask_manager-0.2.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on VladimirKosimovsky/subtask-manager

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page