Skip to main content

Local ML workbench configured for Databricks using uv

Project description

ML Workbench

Installation

Install ML Workbench using pip:

pip install ml_workbench

Basic Usage

Command Line Interface (CLI)

Run experiments directly from YAML configuration files:

# Run all experiments in a YAML file
cli-experiment experiment.yaml

# Run specific experiment(s)
cli-experiment experiment.yaml --experiments my_experiment

# Run with variable substitution
cli-experiment experiment.yaml --var path=/data/datasets

# Inspect configuration without running
cli-experiment experiment.yaml --show-config

Python API

Use the Python API for programmatic control:

from ml_workbench import YamlConfig, Experiment, Runner

# Load configuration
config = YamlConfig("experiment.yaml")

# Create experiment
experiment = Experiment(config, "my_experiment")

# Run experiment
runner = Runner(experiment, verbose=True)
results = runner.run()

# Access results
print(f"Best model: {results['best_model']}")
print(f"Best score: {results['best_model_score']}")

Documentation

CLI Experiment Guide

Complete guide to using the command-line interface for running experiments. Learn how to execute experiments from YAML files, use variable substitution, inspect configurations, and view dataset statistics without running experiments.

Runner Class Documentation

Comprehensive documentation for the Runner class, the core execution engine for ML experiments. Includes workflow orchestration, dataset management, preprocessing pipelines, model training, evaluation metrics, feature analysis, and MLFlow integration.

YAML Configuration Specification

Detailed specification for YAML configuration files. Covers all sections including datasets, features, models, experiments, and MLflow settings. Includes validation rules and complete examples for defining ML experiments declaratively.

Implementation Summary

Technical overview of the Runner implementation, including architecture decisions, testing strategy, and implementation details. Useful for understanding the internal workings of the ML Workbench.

Packaging and CodeArtifact Guide

Step-by-step guide for packaging and publishing ML Workbench to AWS CodeArtifact. Covers prerequisites, authentication, version management, and CI/CD integration for distributing the package within your organization.

Setup

Environment Configuration for MLFlow Databricks Integration

To direct MLFlow to your Databricks workspace (dev-internal), create a .env file in the project root with the following configuration:

# Set MLflow tracking URI to your Databricks workspace
MLFLOW_TRACKING_URI="databricks"

# Define Databricks datapoint that match your workspace (this one is for dev-internal)
DATABRICKS_HOST="https://dbc-787720e9-26e6.cloud.databricks.com"

# Getting Your Databricks Token
# - Go to your Databricks workspace: https://dbc-787720e9-26e6.cloud.databricks.com
# - Click on your profile icon (top-right)
# - Select "Settings"
# - In "User" section, select "Developer"
# - Go to Access Tokens tab
# - Click Generate New Token
# - Give it a name (e.g., "MLFlow Local Development") and expiry
# - Copy the token (you'll only see it once!)
DATABRICKS_TOKEN="dapi123456781234567890"   # <- replace with your own

Steps to set up:

  1. Copy .env.template to .env:

    cp .env.template .env
    
  2. Edit .env and replace DATABRICKS_TOKEN with your personal access token (see instructions in the comments above).

  3. The .env file is already in .gitignore, so your token won't be committed to version control.

Once configured, MLFlow will automatically log experiments to your Databricks workspace when you run experiments using the ML Workbench.

Git Pre-commit Hook for Automatic Version Increment

This project includes a pre-commit hook that automatically increments the patch version (last number) in pyproject.toml on each commit. For example, 0.0.20.0.3.

To set up the pre-commit hook:

Option 1: Use the setup script (recommended)

./scripts/setup-pre-commit.sh

Option 2: Manual installation

cp scripts/pre-commit .git/hooks/pre-commit && chmod +x .git/hooks/pre-commit

Verify the hook is set up correctly:

ls -la .git/hooks/pre-commit

You should see the file is executable (-rwxr-xr-x).

How it works:

  • On each commit, the hook automatically:
    • Reads the current version from pyproject.toml
    • Increments the patch version (e.g., 0.0.20.0.3)
    • Updates pyproject.toml with the new version
    • Stages the updated file so it's included in your commit

Note: The hook only increments the patch version (last number). To bump minor or major versions, manually edit pyproject.toml before committing.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml_workbench-0.2.4.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ml_workbench-0.2.4-py3-none-any.whl (41.6 kB view details)

Uploaded Python 3

File details

Details for the file ml_workbench-0.2.4.tar.gz.

File metadata

  • Download URL: ml_workbench-0.2.4.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.6

File hashes

Hashes for ml_workbench-0.2.4.tar.gz
Algorithm Hash digest
SHA256 4bc10c688b72d701ad8d047c9b8121f9dbd8a453261fdd66738fef3c9a6132fe
MD5 fbe010e477a33bcb09b4649f8c4ff752
BLAKE2b-256 5f54ff33a93f1ab0a512192d37afc89d1d30c27613245eff4a73203e148bd6e6

See more details on using hashes here.

File details

Details for the file ml_workbench-0.2.4-py3-none-any.whl.

File metadata

File hashes

Hashes for ml_workbench-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 370cec8155e4350d22891babea5c8eddb99b9799585cf400aa6b60811cacf547
MD5 638bbba57c5c635a110715443d595f75
BLAKE2b-256 4377bf8dbb0f6292b86807135b7937b4c164aebad92b2dc7a29059542b9ee867

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page