Skip to main content

ETL Subtask management library written in Rust

Project description

subtask-manager

subtask-manager is a Rust-powered Python package for discovering, classifying, loading, and rendering ETL subtasks from a filesystem structure.

It is designed for ETL projects where task metadata is encoded in folder names (entity, stage, system) and task content lives in files (.sql, .py, .sh, etc.).


Features

  • Fast core implementation in Rust (PyO3 extension module)
  • Python-friendly API
  • Recursive file scanning by supported extensions
  • Automatic classification of tasks from folder structure
  • Lazy loading of task contents
  • Rich filtering (stage, entity, system_type, task_type, is_common)
  • Parameter extraction and rendering with multiple placeholder styles
  • Immutable parameter application (returns new objects)

Installation

From PyPI

pip install subtask-manager

From source (local dev)

# Build extension and install in editable/dev mode
maturin develop

Or build wheels:

maturin build --release

Supported task types (by extension)

  • SQL: sql, psql, tsql, plpgsql
  • Shell: sh
  • PowerShell: ps1
  • Python: py
  • GraphQL: graphql, gql
  • JSON: json, jsonl
  • YAML: yaml, yml

Folder conventions

Classification is based on the file path relative to a base directory.

Expected relative folder depth: up to 3 components before the file.

Typical pattern:

<base>/<entity>/<stage>/<system>/<task_file>

Examples:

  • customers/01_extract/pg/extract_data.sql
  • orders/02_transform/duck/normalize.py

Common tasks

A file directly under <base> is treated as a common task:

<base>/shared.yaml

Enums and aliases

EtlStage

  • Setup
  • Extract
  • Transform
  • Load
  • Cleanup
  • Postprocessing
  • Other

Recognized aliases include names like:

  • 01_extract, extract, e, 01
  • etc.

SystemType

Includes:

  • PostgreSQL, Duckdb, Clickhouse, MySQL, OracleDB, SQLite, SqlServer, Vertica, Other

Example aliases:

  • pg, postgres, duck, duckdb, etc.

TaskType

  • Sql, Shell, Powershell, Python, Graphql, Json, Yaml, Other

Quick usage

from pathlib import Path
from subtask_manager import SubtaskManager, EtlStage, SystemType, ParamType

base = Path("tests/test_data/subtasks")
sm = SubtaskManager(base)

print(sm.base_path)
print(sm.num_files)
print(sm.file_paths[:3])

# Lazy-loaded subtasks
tasks = sm.subtasks
print(len(tasks))

# Get a single task
task = sm.get_task("extract_data.sql")
print(task.name, task.entity, task.stage, task.system_type)

# Filter tasks
extract_pg = sm.get_tasks(
    etl_stage=EtlStage.Extract,
    system_type=SystemType.PostgreSQL,
    include_common=False,
)
print(len(extract_pg))

# Inspect parameter names
params = task.get_params()
print(params)

# Apply parameters immutably
rendered = task.apply_parameters(
    {"date": "2025-01-01", "env": "prod"},
    styles=[ParamType.Curly, ParamType.DollarBrace],
    ignore_missing=True,
)

print(rendered.get_command())

Parameter styles

Supported placeholder styles:

  • Curly: {name}
  • Dollar: $name
  • DollarBrace: ${name}
  • DoubleCurly: {{name}}
  • DoubleUnderscore: __name__
  • Percent: %name%
  • Angle: <name>

Useful methods:

  • subtask.get_params(styles=None) -> set[str]
  • subtask.apply_parameters(params, styles=None, ignore_missing=False) -> Subtask
  • subtask.render_with_params(params, styles=None, ignore_missing=False) -> RenderedSubtask
  • subtask.render() -> Subtask
  • subtask.render_lightweight() -> RenderedSubtask
  • subtask.get_stored_params() -> dict[str, str]
  • subtask.get_command() -> str | None

Public classes

  • SubtaskManager
  • Subtask
  • RenderedSubtask
  • FileScanner
  • FileClassifier
  • EtlStage
  • SystemType
  • TaskType
  • ParamType

Development

Prerequisites

  • Rust toolchain
  • Python 3.12+
  • uv (recommended) or pip
  • maturin

Install dev dependencies

uv sync --dev

Run tests

cargo test
uv run -m pytest

or:

make test

Lint/format (Python)

uv run ruff check .
uv run ruff format .

Build and release

Cross-platform wheel publishing is automated with GitHub Actions.

See the full runbook:

It documents:

  • TestPyPI dry runs
  • PyPI production release flow
  • Trusted Publishing setup
  • version/tag conventions

Versioning notes

Keep versions aligned between:

  • Cargo.toml ([package].version)
  • pyproject.toml ([project].version)

Use Makefile version helpers (if present) to bump consistently.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subtask_manager-0.2.3.tar.gz (50.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

subtask_manager-0.2.3-cp39-abi3-win_amd64.whl (858.0 kB view details)

Uploaded CPython 3.9+Windows x86-64

subtask_manager-0.2.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

File details

Details for the file subtask_manager-0.2.3.tar.gz.

File metadata

  • Download URL: subtask_manager-0.2.3.tar.gz
  • Upload date:
  • Size: 50.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for subtask_manager-0.2.3.tar.gz
Algorithm Hash digest
SHA256 1172e8b287cd45e5a71f94dcdaaddd11fa89e78850af34666271fad1f536dc03
MD5 9d7ac42b19e85fc9a3f356e36a583337
BLAKE2b-256 2f7f9950fe672639a7f581791ca5dc8b060f6d47d942688a62b97f20378a7b0f

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtask_manager-0.2.3.tar.gz:

Publisher: release.yml on VladimirKosimovsky/subtask-manager

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file subtask_manager-0.2.3-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for subtask_manager-0.2.3-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 260b19a2a621ab5f1349e5ba49c3ba19e7b22bc076bbf5cb3fcf99edc8b19f07
MD5 1fac2d2ab5caef24b7cd58d8614c3de7
BLAKE2b-256 aee6a9808a7c1f6e2c54ba17a1df7a4bbc694010daea87be48f6396f4f1dbffb

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtask_manager-0.2.3-cp39-abi3-win_amd64.whl:

Publisher: release.yml on VladimirKosimovsky/subtask-manager

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file subtask_manager-0.2.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for subtask_manager-0.2.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5f4ee5dcaaba5bb63315b2ebfa8369b528fd798d3d298d9191c40d5b9047558a
MD5 a401235acd44f46943f73dcd3fdf832a
BLAKE2b-256 f6779d85e3164fe3f8cf669adaebf05d78284e3d4c2656966ea86b886ba6337a

See more details on using hashes here.

Provenance

The following attestation bundles were made for subtask_manager-0.2.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on VladimirKosimovsky/subtask-manager

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page