Skip to main content

A streamlined MLflow orchestrator for hybrid LLM training and tuning

Project description

Flowkestra logo
A Lightweight Orchestrator for Machine Learning Experiments

Running machine learning experiments often involves a series of steps, such as data processing, training, and evaluation. Managing the dependencies and parameters for each step can become complex. Flowkestra simplifies this process by allowing you to define your entire workflow in a single configuration file.

This approach makes it easier to run, reproduce, and track your experiments, whether you are doing initial exploration on your local machine or preparing for more complex workflows.


Core Features

  • YAML-based Workflows: Define your experiment as a series of tasks in a simple config.yml file.
  • Sequential Task Execution: Runs your Python scripts in the order you define them.
  • MLflow Integration: Automatically logs your runs, parameters, and artifacts to an MLflow tracking server.
  • Local Execution: Currently supports running experiments on your local machine.

Getting Started

1. Installation

pip install flowkestra

(Note: The package is not yet published to PyPI. To install locally, use pip install .)

2. Create a Configuration File

Create a config.yml file to define your experiment. This file specifies the scripts to run, their inputs/outputs, and any parameters.

Here is an example for a local run:

# A descriptive name for your MLflow experiment.
mlflow_uri: "http://localhost:5000"
experiment_name: "example_experiment"

# Define your experiment instances. Each instance represents a distinct run
# with its own configuration.
instances:
  - mode: local # Currently, 'local' is the supported execution mode.
    # The working directory for the instance.
    workdir: "./test_data"
    # The target directory where training and virtual environment will be created.
    target_workdir: "./local_train"
    # Path to a requirements.txt file for this instance's environment.
    requirements: "requirements_local.txt"
    # Define pipelines (e.g., 'train', 'evaluate') with their scripts and arguments.
    pipelines:
      train:
        script: "mlflow_example.py" # The Python script to execute.
        args: # Optional arguments to pass to the script.
          [
            "--epoch", "30"
          ]

3. Run Your Experiment

Execute your experiment using the Flowkestra CLI, pointing it to your configuration file.

flowkestra -f config.yml

Flowkestra will then run your defined tasks in order.


Potential Use Cases

  • Organizing Experiments: Structure your ML code into reusable scripts and orchestrate them for different experiments.
  • Reproducible Runs: Keep your configuration, parameters, and scripts together, ensuring that you can easily rerun an experiment.
  • Basic ML Pipelines: Create simple, sequential pipelines for tasks like data preprocessing followed by model training.

Roadmap & Next Features

  • Remote Training: Support for executing tasks on remote machines via SSH.
  • Automated Parameter Tuning: Integrate with libraries to automate hyperparameter searches.
  • Expanded Cloud Support: Add direct support for cloud environments.

Notes

  • This project is in its early stages. Contributions and feedback are welcome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flowkestra-0.1.0.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flowkestra-0.1.0-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file flowkestra-0.1.0.tar.gz.

File metadata

  • Download URL: flowkestra-0.1.0.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for flowkestra-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a3ccae2db46584172909cfe8c7795a4bdef98e87c2feb3fc383f44df96ed3969
MD5 7ec7401d0e942bcecf26b4f705dab3da
BLAKE2b-256 f2e7d5902fbd44efcd49f3091c3eff259d0be2c5a7d5eadf60800cbbaf1f8fe7

See more details on using hashes here.

File details

Details for the file flowkestra-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: flowkestra-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for flowkestra-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 57ae84f568066712bdaaff8cd478bb95748ca3cf8b1abecf50f9bcfb1ff4cb3f
MD5 18d523aa929ecad83921fd02631e5ccd
BLAKE2b-256 672d35df4ff57ea05fd90c5b116bfdcdc1f8d181f60f438fa868c6cdd57a0b16

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page