Skip to main content

Local ML workbench configured for Databricks using uv

Project description

ML Workbench

Installation

Install ML Workbench using pip:

pip install ml_workbench

Basic Usage

Command Line Interface (CLI)

Run experiments directly from YAML configuration files:

# Run all experiments in a YAML file
cli-experiment experiment.yaml

# Run specific experiment(s)
cli-experiment experiment.yaml --experiments my_experiment

# Run with variable substitution
cli-experiment experiment.yaml --var path=/data/datasets

# Inspect configuration without running
cli-experiment experiment.yaml --show-config

Python API

Use the Python API for programmatic control:

from ml_workbench import YamlConfig, Experiment, Runner

# Load configuration
config = YamlConfig("experiment.yaml")

# Create experiment
experiment = Experiment(config, "my_experiment")

# Run experiment
runner = Runner(experiment, verbose=True)
results = runner.run()

# Access results
print(f"Best model: {results['best_model']}")
print(f"Best score: {results['best_model_score']}")

Documentation

CLI Experiment Guide

Complete guide to using the command-line interface for running experiments. Learn how to execute experiments from YAML files, use variable substitution, inspect configurations, and view dataset statistics without running experiments.

Runner Class Documentation

Comprehensive documentation for the Runner class, the core execution engine for ML experiments. Includes workflow orchestration, dataset management, preprocessing pipelines, model training, evaluation metrics, feature analysis, and MLFlow integration.

YAML Configuration Specification

Detailed specification for YAML configuration files. Covers all sections including datasets, features, models, experiments, and MLflow settings. Includes validation rules and complete examples for defining ML experiments declaratively.

Implementation Summary

Technical overview of the Runner implementation, including architecture decisions, testing strategy, and implementation details. Useful for understanding the internal workings of the ML Workbench.

Packaging and CodeArtifact Guide

Step-by-step guide for packaging and publishing ML Workbench to AWS CodeArtifact. Covers prerequisites, authentication, version management, and CI/CD integration for distributing the package within your organization.

Setup

Environment Configuration for MLFlow Databricks Integration

To direct MLFlow to your Databricks workspace (dev-internal), create a .env file in the project root with the following configuration:

# Set MLflow tracking URI to your Databricks workspace
MLFLOW_TRACKING_URI="databricks"

# Define Databricks datapoint that match your workspace (this one is for dev-internal)
DATABRICKS_HOST="https://dbc-787720e9-26e6.cloud.databricks.com"

# Getting Your Databricks Token
# - Go to your Databricks workspace: https://dbc-787720e9-26e6.cloud.databricks.com
# - Click on your profile icon (top-right)
# - Select "Settings"
# - In "User" section, select "Developer"
# - Go to Access Tokens tab
# - Click Generate New Token
# - Give it a name (e.g., "MLFlow Local Development") and expiry
# - Copy the token (you'll only see it once!)
DATABRICKS_TOKEN="dapi123456781234567890"   # <- replace with your own

Steps to set up:

  1. Copy .env.template to .env:

    cp .env.template .env
    
  2. Edit .env and replace DATABRICKS_TOKEN with your personal access token (see instructions in the comments above).

  3. The .env file is already in .gitignore, so your token won't be committed to version control.

Once configured, MLFlow will automatically log experiments to your Databricks workspace when you run experiments using the ML Workbench.

Git Pre-commit Hook for Automatic Version Increment

This project includes a pre-commit hook that automatically increments the patch version (last number) in pyproject.toml on each commit. For example, 0.0.20.0.3.

To set up the pre-commit hook:

Option 1: Use the setup script (recommended)

./scripts/setup-pre-commit.sh

Option 2: Manual installation

cp scripts/pre-commit .git/hooks/pre-commit && chmod +x .git/hooks/pre-commit

Verify the hook is set up correctly:

ls -la .git/hooks/pre-commit

You should see the file is executable (-rwxr-xr-x).

How it works:

  • On each commit, the hook automatically:
    • Reads the current version from pyproject.toml
    • Increments the patch version (e.g., 0.0.20.0.3)
    • Updates pyproject.toml with the new version
    • Stages the updated file so it's included in your commit

Note: The hook only increments the patch version (last number). To bump minor or major versions, manually edit pyproject.toml before committing.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml_workbench-0.2.2.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ml_workbench-0.2.2-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file ml_workbench-0.2.2.tar.gz.

File metadata

  • Download URL: ml_workbench-0.2.2.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.6

File hashes

Hashes for ml_workbench-0.2.2.tar.gz
Algorithm Hash digest
SHA256 f741a72241ef88565e8260a2247530e3206518b49871a6f7c7fa5f6dffd0a1b2
MD5 af56681e747021f68bd7e5f523816561
BLAKE2b-256 85a3a33de5b4b33f8009132181dbf1b4b614eeadc2cab7983ee7861ce354bb43

See more details on using hashes here.

File details

Details for the file ml_workbench-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for ml_workbench-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f52d998e554e819b9335a080af1b3e6a7c32323c992fb5555eae99e2d657f89a
MD5 e1335977587f39d0979e48616b6ae3cd
BLAKE2b-256 5ec7c82f218cb165dd01e9070a02facb957ab0b9331bed1708440e4a726b360a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page