YTsaurus pipeline framework with utilities and common modules
Project description
YT Framework
A powerful Python framework for building and executing data processing pipelines on YTsaurus (YT) clusters. YT Framework simplifies pipeline development with automatic stage discovery, seamless dev/prod mode switching, and comprehensive support for YT operations.
Key Features
- Pipeline & Stage Architecture: Organize complex workflows into reusable stages
- Automatic Stage Discovery: No manual registration needed - just create stages and run
- Dev/Prod Modes: Develop locally with file system simulation, deploy to YT cluster seamlessly
- Multiple Operation Types: Support for Map, Vanilla, YQL, and S3 operations
- Code Upload: Automatic code packaging and deployment to YT cluster
- Docker Support: Custom Docker images for GPU workloads and special dependencies
- Checkpoint Management: Built-in support for ML model checkpoints
- Configuration Management: Flexible YAML-based configuration with multiple config support
Quick Links
- Installation & Quick Start - Get up and running in minutes
- Pipelines & Stages - Core concepts and architecture
- Operations Guide - Map, Vanilla, YQL, and S3 operations
- Advanced Topics - Docker, checkpoints, code upload, and more
- Examples - Complete working examples for every feature
Installation
For Users
Install from PyPI:
pip install yt_framework
For Developers and Contributors
Install in editable mode from source:
pip install -e .
See Installation Guide for prerequisites and detailed setup instructions.
Quick Start
Create your first pipeline in 3 steps:
-
Create pipeline structure:
mkdir my_pipeline && cd my_pipeline mkdir -p stages/my_stage configs
-
Create
pipeline.py:from yt_framework.core.pipeline import DefaultPipeline if __name__ == "__main__": DefaultPipeline.main()
-
Create stage and config:
# stages/my_stage/stage.py from yt_framework.core.stage import BaseStage class MyStage(BaseStage): def run(self, debug): self.logger.info("Hello from YT Framework!") return debug
See Quick Start Guide for complete example.
Examples
The examples/ directory contains comprehensive examples demonstrating all framework features:
- 01_hello_world - Basic pipeline and table operations
- 02_multi_stage_pipeline - Multiple stages with data flow
- 03_yql_operations - All YQL table operations
- 04_map_operation - Map operations with custom code
- 05_vanilla_operation - Vanilla standalone jobs
- 06_s3_integration - S3 file listing and processing
- 07_custom_docker - Custom Docker images
- 08_multiple_configs - Multiple configuration files
- 09_multiple_operations - Combining operations in one stage
- environment_log - Comprehensive environment logging
- video_gpu - GPU processing workflows
Documentation
Full documentation is available in the docs/ directory:
- Main Documentation - Installation, quick start, and overview
- Pipelines & Stages - Core architecture
- Configuration - Config files and secrets management
- Dev vs Prod - Development and production modes
- Operations - Map, Vanilla, YQL, S3 operations
- Advanced Topics - Docker, checkpoints, code upload
- API Reference - Complete API documentation
- Troubleshooting - Common issues and solutions
Requirements
- Python 3.11+
- YTsaurus cluster access (for production mode)
- YT credentials (for production mode)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yt_framework-0.1.0.tar.gz.
File metadata
- Download URL: yt_framework-0.1.0.tar.gz
- Upload date:
- Size: 64.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7893ab70d10cf9b6a0c641ae99aeed74fb319b6b3ccf8767cc52fe6f5699cf5d
|
|
| MD5 |
ef1cbd384f5c59671a4760825f505238
|
|
| BLAKE2b-256 |
0f4204386a56afb6d8fb2b8c333b4588c26597994501c32a027a64c47ef7a361
|
Provenance
The following attestation bundles were made for yt_framework-0.1.0.tar.gz:
Publisher:
publish.yml on GregoryKogan/yt-framework
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yt_framework-0.1.0.tar.gz -
Subject digest:
7893ab70d10cf9b6a0c641ae99aeed74fb319b6b3ccf8767cc52fe6f5699cf5d - Sigstore transparency entry: 910550936
- Sigstore integration time:
-
Permalink:
GregoryKogan/yt-framework@475789002c98265e853781afd6df13688da2370a -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/GregoryKogan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@475789002c98265e853781afd6df13688da2370a -
Trigger Event:
push
-
Statement type:
File details
Details for the file yt_framework-0.1.0-py3-none-any.whl.
File metadata
- Download URL: yt_framework-0.1.0-py3-none-any.whl
- Upload date:
- Size: 79.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
234d19598d82e601d5f1b6c508ea631a1983001ab6bbb9bb9cf26e86f3a73f46
|
|
| MD5 |
0c717ea1a35673d25c795a51fe1d6dfa
|
|
| BLAKE2b-256 |
e2459da86246825f2a5e3dee49caea60b0b5b2310b7932767f86cff13b64b99a
|
Provenance
The following attestation bundles were made for yt_framework-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on GregoryKogan/yt-framework
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yt_framework-0.1.0-py3-none-any.whl -
Subject digest:
234d19598d82e601d5f1b6c508ea631a1983001ab6bbb9bb9cf26e86f3a73f46 - Sigstore transparency entry: 910550939
- Sigstore integration time:
-
Permalink:
GregoryKogan/yt-framework@475789002c98265e853781afd6df13688da2370a -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/GregoryKogan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@475789002c98265e853781afd6df13688da2370a -
Trigger Event:
push
-
Statement type: