An open-source NLP framework that offers high-level wrappers designed for effortless launch, enhanced reproducibility, superior control, and unmatched flexibility for your experiments.
Project description
Urartu 🦁
The intelligent ML Pipeline Framework that chains actions into powerful workflows!
Urartu is a framework for building machine learning workflows by chaining Actions into Pipelines. Each Action is a self-contained, reusable component with built-in caching, and Pipelines orchestrate multiple Actions with automatic data flow.
Installation
pip install urartu
Or from source:
bash git clone git@github.com:tamohannes/urartu.git cd urartu pip install -e .
Quick Start
Running Pipelines
# Run a pipeline (pipeline name is the first argument)
urartu my_pipeline
# With config group selectors (unquoted = config group, quoted = string override)
urartu my_pipeline machine=local aim=local slurm=no_slurm debug=true
# With string overrides (quoted values)
urartu my_pipeline machine="custom" descr="my experiment"
Project Structure
my_project/
├── actions/ # Action implementations
│ └── my_action.py
├── pipelines/ # Pipeline implementations
│ └── my_pipeline.py
└── configs/
├── action/ # Action configurations
│ └── my_action.yaml
└── pipeline/ # Pipeline configurations
└── my_pipeline.yaml
Core Concepts
Actions
Actions are self-contained components that perform specific ML tasks:
from urartu.common import Action
class MyAction(Action):
def run(self):
# Your ML task here
data = self.load_data()
results = self.process(data)
# Save to cache using unified API
cache_dir = self.get_cache_entry_dir("my_data")
# Save machine-readable data to cache
# Save plots to run directory (always regenerated)
plots_dir = self.get_run_dir("plots")
# Save human-readable outputs here
def get_outputs(self):
return {
"results_path": str(self.get_cache_entry_dir("results")),
"run_dir": str(self.get_run_dir())
}
Pipelines
Pipelines chain Actions together with automatic dependency resolution:
# configs/pipeline/my_pipeline.yaml
pipeline_name: my_pipeline
pipeline:
device: cuda
seed: 42
actions:
- action_name: data_preprocessing
dataset:
source: "data.csv"
- action_name: model_training
depends_on:
data_preprocessing:
processed_data: dataset.data_path
model:
architecture: "transformer"
Configuration
Action Config
# configs/action/my_action.yaml
action_name: my_action
action:
experiment_name: "My Experiment"
device: cuda
dataset:
source: "data.csv"
Pipeline Config
# configs/pipeline/my_pipeline.yaml
pipeline_name: my_pipeline
pipeline:
experiment_name: "My Pipeline"
device: cuda
actions:
- action_name: action1
- action_name: action2
Key Features
Unified Caching
Actions automatically cache results. Use the unified APIs:
# For machine-readable cached data
cache_dir = self.get_cache_entry_dir("subdirectory")
# Structure: cache/{action_name}/{cache_hash}/{subdirectory}/
# For human-readable outputs (plots, reports)
run_dir = self.get_run_dir("plots")
# Structure: .runs/{pipeline_name}/{timestamp}/{subdirectory}/
Important: Plots should always be saved to run_dir and regenerated from cached data.
Dependency Resolution
Pipelines automatically inject outputs from previous actions:
- action_name: model_training
depends_on:
data_preprocessing:
processed_data: dataset.data_path
stats: model.feature_stats
Caching Configuration
action:
cache_enabled: true
force_rerun: false
cache_max_age_days: 7
pipeline:
cache_enabled: true
force_rerun: false
cache_max_age_days: 7
Advanced Usage
Remote Execution
Execute workflows on remote machines:
# configs_tamoyan/machine/remote.yaml
type: remote
host: "cluster.example.com"
username: "user"
ssh_key: "~/.ssh/id_rsa"
remote_workdir: "/path/to/workspace"
project_name: "my_project"
urartu my_pipeline machine=remote slurm=slurm
Multi-run
# Note: Multirun/sweep functionality is not yet implemented in the new CLI
# For now, use nested loops in your pipeline code or run multiple times manually
Citation
If you find Urartu helpful in your research, please cite it:
@software{Tamoyan_Urartu_2023,
author = {Hovhannes Tamoyan},
license = {Apache-2.0},
month = {8},
title = {{Urartu}},
url = {https://github.com/tamohannes/urartu},
year = {2023}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file urartu-5.0.0.tar.gz.
File metadata
- Download URL: urartu-5.0.0.tar.gz
- Upload date:
- Size: 93.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95982746bdf32b51220f72e29e44e65602b10e0a8a26c9eb5e7b07873a3cbb8a
|
|
| MD5 |
76a11d565039b5bf89c5989330c0d291
|
|
| BLAKE2b-256 |
99b170b74715059e6049fdf60cb2df40a9f84c91569c3586ed1cf71b339984df
|
File details
Details for the file urartu-5.0.0-py3-none-any.whl.
File metadata
- Download URL: urartu-5.0.0-py3-none-any.whl
- Upload date:
- Size: 110.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5481120fe49fff44261849c9ce6ee21795e043470ee6e8ad013a679dc31d6cde
|
|
| MD5 |
937e0c2b034a3615728bc080abb3bbfd
|
|
| BLAKE2b-256 |
1df65305c07458517617f26ffb1530f18dcb1dc5e678f6599873187e605d4ee5
|