YTsaurus pipeline framework with utilities and common modules
Project description
YT Framework
PyPI | Docs | DeepWiki | Examples
Overview
A powerful Python framework for building and executing data processing pipelines on YTsaurus (YT) clusters. YT Framework simplifies pipeline development with automatic stage discovery, seamless dev/prod mode switching, and comprehensive support for YT operations.
Architecture
YT Framework follows a pipeline-based architecture where pipelines consist of stages, and stages execute operations.
Key Components:
- Pipeline: Orchestrates stages, their execution order, and configuration management
- Stages: Reusable units of work that execute operations
- Operations: Specific tasks (Map, Vanilla, YQL, S3, Table operations)
- Configuration: YAML-based configuration system for flexible pipeline setup
Key Features
- Pipeline & Stage Architecture: Organize complex workflows into reusable stages
- Automatic Stage Discovery: No manual registration needed - just create stages and run
- Dev/Prod Modes: Develop locally with file system simulation, deploy to YT cluster seamlessly
- Multiple Operation Types: Support for Map, Vanilla, YQL, and S3 operations
- Code Upload: Automatic code packaging and deployment to YT cluster
- Docker Support: Custom Docker images for special dependencies
- Checkpoint Management: Built-in support for ML model checkpoints
- Configuration Management: Flexible YAML-based configuration with multiple config support
Installation
For Users
Install from PyPI:
pip install yt-framework
For Developers and Contributors
Install in editable mode from source:
git clone https://github.com/GregoryKogan/yt-framework.git
cd yt-framework
pip install -e .
For development with testing tools:
pip install -e ".[dev]"
See Installation Guide for prerequisites and detailed setup instructions.
Quick Start
Create your first pipeline in 3 steps:
What you'll build: A simple pipeline that creates a stage, logs a message, and demonstrates the basic framework structure.
-
Create pipeline structure:
mkdir my_pipeline && cd my_pipeline mkdir -p stages/my_stage configs
-
Create
pipeline.py:from yt_framework.core.pipeline import DefaultPipeline if __name__ == "__main__": DefaultPipeline.main()
-
Create stage and config:
# stages/my_stage/stage.py from yt_framework.core.stage import BaseStage class MyStage(BaseStage): def run(self, debug): self.logger.info("Hello from YT Framework!") return debug
# configs/config.yaml stages: enabled_stages: - my_stage pipeline: mode: "dev" # Use "dev" for local development
Run your pipeline:
python pipeline.py
Next Steps:
- See the Quick Start Guide for a complete example with table operations
- Explore Examples to see more complex use cases
- Read about Pipelines and Stages in the documentation
Examples
The examples/ directory contains comprehensive examples demonstrating most framework features.
Each example includes a README explaining what it demonstrates and how to run it.
Requirements
Prerequisites Checklist
- Python 3.11+ installed
- YT cluster access and credentials (for production mode)
YT Cluster Requirements
When running pipelines in production mode, code from ytjobs executes on YT cluster nodes. The cluster's Docker image (default or custom) must include:
- Python 3.11+
- ytsaurus-client >= 0.13.0 (for checkpoint operations)
- boto3 == 1.35.99 (for S3 operations)
- botocore == 1.35.99 (auto-installed with boto3)
Important: Ensure your cluster's default Docker image satisfies these dependencies, or always use custom Docker images for your pipelines. See Cluster Requirements and Custom Docker Images for details.
Documentation
Full documentation available at: yt-framework.readthedocs.io
For local development, source documentation is available in the docs/ directory.
Examples - Complete working examples for most features
Getting Help
- Documentation: Check the full documentation for detailed guides
- Troubleshooting: See the Troubleshooting Guide for common issues
- Examples: Browse working examples to see how features are used
- GitHub Issues: Report bugs or request features on GitHub Issues
- Questions: Open a GitHub issue with the
questionlabel
Contributing
We welcome contributions! Whether it's bug fixes, new features, documentation improvements, or examples, your help makes YT Framework better.
See CONTRIBUTING.md for detailed guidelines.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yt_framework-1.3.0.tar.gz.
File metadata
- Download URL: yt_framework-1.3.0.tar.gz
- Upload date:
- Size: 85.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5ce27c46ea6be431cdf834a267905bb9008d383c94f5f3e5c6e4021fca0d94d
|
|
| MD5 |
b0bea5d0bdfe578d67bdad2a2c87b9c6
|
|
| BLAKE2b-256 |
19a1fb2ed8142bf1b8ad773bbd61964aff5b408eabfa259824e0e455a3709d61
|
Provenance
The following attestation bundles were made for yt_framework-1.3.0.tar.gz:
Publisher:
publish.yml on GregoryKogan/yt-framework
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yt_framework-1.3.0.tar.gz -
Subject digest:
a5ce27c46ea6be431cdf834a267905bb9008d383c94f5f3e5c6e4021fca0d94d - Sigstore transparency entry: 1172682897
- Sigstore integration time:
-
Permalink:
GregoryKogan/yt-framework@8fb18aab45c251b75846b00f8d8953069c81fbb4 -
Branch / Tag:
refs/tags/v1.3.0 - Owner: https://github.com/GregoryKogan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8fb18aab45c251b75846b00f8d8953069c81fbb4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file yt_framework-1.3.0-py3-none-any.whl.
File metadata
- Download URL: yt_framework-1.3.0-py3-none-any.whl
- Upload date:
- Size: 101.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd8f5c31d6a313d5da99b9a82bad8e364295290468984361a011a3535993006a
|
|
| MD5 |
6da7947fdecd870056942e3715a35f0f
|
|
| BLAKE2b-256 |
60ace51e9eadae87c0fb1a3ef10560c6d9ebedfae153bdcf45e29d74d68070c5
|
Provenance
The following attestation bundles were made for yt_framework-1.3.0-py3-none-any.whl:
Publisher:
publish.yml on GregoryKogan/yt-framework
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yt_framework-1.3.0-py3-none-any.whl -
Subject digest:
bd8f5c31d6a313d5da99b9a82bad8e364295290468984361a011a3535993006a - Sigstore transparency entry: 1172682952
- Sigstore integration time:
-
Permalink:
GregoryKogan/yt-framework@8fb18aab45c251b75846b00f8d8953069c81fbb4 -
Branch / Tag:
refs/tags/v1.3.0 - Owner: https://github.com/GregoryKogan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8fb18aab45c251b75846b00f8d8953069c81fbb4 -
Trigger Event:
push
-
Statement type: