Skip to main content

YTsaurus pipeline framework with utilities and common modules

Project description

YT Framework

PyPI - Version Documentation Status CI Ask DeepWiki PyPI - Python Version coverage GitHub License

PyPI | Docs | DeepWiki | Examples


Overview

A powerful Python framework for building and executing data processing pipelines on YTsaurus (YT) clusters. YT Framework simplifies pipeline development with automatic stage discovery, seamless dev/prod mode switching, and comprehensive support for YT operations.

Architecture

YT Framework follows a pipeline-based architecture where pipelines consist of stages, and stages execute operations.

Key Components:

  • Pipeline: Orchestrates stages, their execution order, and configuration management
  • Stages: Reusable units of work that execute operations
  • Operations: Specific tasks (Map, Vanilla, YQL, S3, Table operations)
  • Configuration: YAML-based configuration system for flexible pipeline setup

Key Features

  • Pipeline & Stage Architecture: Organize complex workflows into reusable stages
  • Automatic Stage Discovery: No manual registration needed - just create stages and run
  • Dev/Prod Modes: Develop locally with file system simulation, deploy to YT cluster seamlessly
  • Multiple Operation Types: Support for Map, Vanilla, YQL, and S3 operations
  • Code Upload: Automatic code packaging and deployment to YT cluster
  • Docker Support: Custom Docker images for special dependencies
  • Checkpoint Management: Built-in support for ML model checkpoints
  • Configuration Management: Flexible YAML-based configuration with multiple config support

Installation

For Users

Install from PyPI into any Python 3.11+ environment (system Python, a virtualenv, or a Conda env):

pip install yt-framework

For Developers and Contributors

Recommended: one Conda environment for tests, formatting, pre-commit, and local documentation builds (avoids reinstalling tooling for each task):

git clone https://github.com/GregoryKogan/yt-framework.git
cd yt-framework
conda create -n yt-framework python=3.11
conda activate yt-framework
pip install -e ".[dev,docs]"

Use conda-forge as the channel when creating the env if that matches your setup (conda create -n yt-framework python=3.11 -c conda-forge).

Alternative: pip only — install in editable mode from source:

git clone https://github.com/GregoryKogan/yt-framework.git
cd yt-framework
pip install -e .

For development with testing tools (without the docs extra):

pip install -e ".[dev]"

For local Sphinx builds without the full dev extra, use pip install -e ".[docs]".

See CONTRIBUTING.md for the full development setup and Installation Guide for prerequisites.

Quick Start

Create your first pipeline in 3 steps:

What you'll build: A simple pipeline that creates a stage, logs a message, and demonstrates the basic framework structure.

  1. Create pipeline structure:

    mkdir my_pipeline && cd my_pipeline
    mkdir -p stages/my_stage configs
    
  2. Create pipeline.py:

    from yt_framework.core.pipeline import DefaultPipeline
    
    if __name__ == "__main__":
        DefaultPipeline.main()
    
  3. Create stage and config:

    # stages/my_stage/stage.py
    from yt_framework.core.stage import BaseStage
    
    class MyStage(BaseStage):
        def run(self, debug):
            self.logger.info("Hello from YT Framework!")
            return debug
    
    # configs/config.yaml
    stages:
      enabled_stages:
        - my_stage
    
    pipeline:
      mode: "dev"  # Use "dev" for local development
    

Run your pipeline:

python pipeline.py

Next Steps:

Examples

The examples/ directory contains comprehensive examples demonstrating most framework features. Each example includes a README explaining what it demonstrates and how to run it.

Requirements

Prerequisites Checklist

  • Python 3.11+ installed
  • YT cluster access and credentials (for production mode)

YT Cluster Requirements

When running pipelines in production mode, code from ytjobs executes on YT cluster nodes. The cluster's Docker image (default or custom) must include:

  • Python 3.11+
  • ytsaurus-client >= 0.13.0 (for checkpoint operations)
  • boto3 == 1.35.99 (for S3 operations)
  • botocore == 1.35.99 (auto-installed with boto3)

Important: Ensure your cluster's default Docker image satisfies these dependencies, or always use custom Docker images for your pipelines. See Cluster Requirements and Custom Docker Images for details.

Documentation

Full documentation available at: yt-framework.readthedocs.io

For local development, source documentation is available in the docs/ directory.

Examples - Complete working examples for most features

Getting Help

Contributing

We welcome contributions! Whether it's bug fixes, new features, documentation improvements, or examples, your help makes YT Framework better.
See CONTRIBUTING.md for detailed guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yt_framework-1.3.2.tar.gz (133.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yt_framework-1.3.2-py3-none-any.whl (103.2 kB view details)

Uploaded Python 3

File details

Details for the file yt_framework-1.3.2.tar.gz.

File metadata

  • Download URL: yt_framework-1.3.2.tar.gz
  • Upload date:
  • Size: 133.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yt_framework-1.3.2.tar.gz
Algorithm Hash digest
SHA256 1b5be5540a6235e9571ea5db4639f3266e32112ea35a6719cc735c3a235d73fe
MD5 35d1b6923385e37df130d86e0edb9eda
BLAKE2b-256 9c2b8516ead21008695b7497593014a94b426497b41b7925f66d2264b1a1f656

See more details on using hashes here.

Provenance

The following attestation bundles were made for yt_framework-1.3.2.tar.gz:

Publisher: publish.yml on GregoryKogan/yt-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yt_framework-1.3.2-py3-none-any.whl.

File metadata

  • Download URL: yt_framework-1.3.2-py3-none-any.whl
  • Upload date:
  • Size: 103.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yt_framework-1.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5f336413c7231f76f76571215ff0acc814f7f9692b07094f5eb6ea55d8006dd6
MD5 daf21d4679fb14ad0f3b380326271922
BLAKE2b-256 d5584fad7de59eee54bb940136b94a3cbe07fc627cdce2bb27f3bb51665ba292

See more details on using hashes here.

Provenance

The following attestation bundles were made for yt_framework-1.3.2-py3-none-any.whl:

Publisher: publish.yml on GregoryKogan/yt-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page