Skip to main content

An open-source NLP framework that offers high-level wrappers designed for effortless launch, enhanced reproducibility, superior control, and unmatched flexibility for your experiments.

Project description

PyPI - Package Version PyPI - Python Version GitHub - License

Urartu 🦁

The intelligent ML Pipeline Framework that chains actions into powerful workflows!

Urartu is a framework for building machine learning workflows by chaining Actions into Pipelines. Each Action is a self-contained, reusable component with built-in caching, and Pipelines orchestrate multiple Actions with automatic data flow.

Installation

pip install urartu

Or from source:

git clone git@github.com:tamohannes/urartu.git
cd urartu
pip install -e .

Quick Start

Running Actions and Pipelines

# Run an action
urartu action=my_action

# Run a pipeline
urartu pipeline=my_pipeline

# With options
urartu pipeline=my_pipeline aim=local slurm=no_slurm machine=local

Project Structure

my_project/
├── actions/              # Action implementations
│   └── my_action.py
├── pipelines/            # Pipeline implementations
│   └── my_pipeline.py
└── configs/
    ├── action/           # Action configurations
    │   └── my_action.yaml
    └── pipeline/         # Pipeline configurations
        └── my_pipeline.yaml

Core Concepts

Actions

Actions are self-contained components that perform specific ML tasks:

from urartu.common import Action

class MyAction(Action):
    def run(self):
        # Your ML task here
        data = self.load_data()
        results = self.process(data)
        
        # Save to cache using unified API
        cache_dir = self.get_cache_entry_dir("my_data")
        # Save machine-readable data to cache
        
        # Save plots to run directory (always regenerated)
        plots_dir = self.get_run_dir("plots")
        # Save human-readable outputs here
    
    def get_outputs(self):
        return {
            "results_path": str(self.get_cache_entry_dir("results")),
            "run_dir": str(self.get_run_dir())
        }

Pipelines

Pipelines chain Actions together with automatic dependency resolution:

# configs/pipeline/my_pipeline.yaml
pipeline_name: my_pipeline

pipeline:
  device: cuda
  seed: 42
  actions:
    - action_name: data_preprocessing
      dataset:
        source: "data.csv"
    
    - action_name: model_training
      depends_on:
        data_preprocessing:
          processed_data: dataset.data_path
      model:
        architecture: "transformer"

Configuration

Action Config

# configs/action/my_action.yaml
action_name: my_action

action:
  experiment_name: "My Experiment"
  device: cuda
  dataset:
    source: "data.csv"

Pipeline Config

# configs/pipeline/my_pipeline.yaml
pipeline_name: my_pipeline

pipeline:
  experiment_name: "My Pipeline"
  device: cuda
  actions:
    - action_name: action1
    - action_name: action2

Key Features

Unified Caching

Actions automatically cache results. Use the unified APIs:

# For machine-readable cached data
cache_dir = self.get_cache_entry_dir("subdirectory")
# Structure: cache/{action_name}/{cache_hash}/{subdirectory}/

# For human-readable outputs (plots, reports)
run_dir = self.get_run_dir("plots")
# Structure: .runs/{pipeline_name}/{timestamp}/{subdirectory}/

Important: Plots should always be saved to run_dir and regenerated from cached data.

Dependency Resolution

Pipelines automatically inject outputs from previous actions:

- action_name: model_training
  depends_on:
    data_preprocessing:
      processed_data: dataset.data_path
      stats: model.feature_stats

Caching Configuration

action:
  cache_enabled: true
  force_rerun: false
  cache_max_age_days: 7

pipeline:
  cache_enabled: true
  force_rerun: false
  cache_max_age_days: 7

Advanced Usage

Remote Execution

Execute workflows on remote machines:

# configs_tamoyan/machine/remote.yaml
type: remote
host: "cluster.example.com"
username: "user"
ssh_key: "~/.ssh/id_rsa"
remote_workdir: "/path/to/workspace"
project_name: "my_project"
urartu pipeline=my_pipeline machine=remote slurm=slurm

Multi-run

urartu --multirun pipeline=my_pipeline pipeline.actions[0].learning_rate=1e-3,1e-4,1e-5

Citation

If you find Urartu helpful in your research, please cite it:

@software{Tamoyan_Urartu_2023,
  author = {Hovhannes Tamoyan},
  license = {Apache-2.0},
  month = {8},
  title = {{Urartu}},
  url = {https://github.com/tamohannes/urartu},
  year = {2023}
}

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

urartu-4.2.1.tar.gz (64.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

urartu-4.2.1-py3-none-any.whl (75.0 kB view details)

Uploaded Python 3

File details

Details for the file urartu-4.2.1.tar.gz.

File metadata

  • Download URL: urartu-4.2.1.tar.gz
  • Upload date:
  • Size: 64.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for urartu-4.2.1.tar.gz
Algorithm Hash digest
SHA256 7bcfcf311651b2ee4a3dea6aadb6fa67de5b512dce45665ab9b3d6e08cda85d6
MD5 df4474e20e64ad613375005daafc0675
BLAKE2b-256 d07b615e2d6b2148674f9b67b788f8396988bb0064d9cce7cdc0d7fb8ff96b41

See more details on using hashes here.

File details

Details for the file urartu-4.2.1-py3-none-any.whl.

File metadata

  • Download URL: urartu-4.2.1-py3-none-any.whl
  • Upload date:
  • Size: 75.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for urartu-4.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 89604bf75895fc1f376de333a3e514e44d7cc0b7cb0eeaa0c6ee8b10f658d2b7
MD5 d7588ea9deeab35be4d0f84a340af1b5
BLAKE2b-256 3e0f9597496464c6ae163bfc970af264b8317980da5bdbfb0f7031892ea8e0c4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page