Local ML workbench configured for Databricks using uv
Project description
ML Workbench
Installation
Install ML Workbench using pip:
pip install ml_workbench
Basic Usage
Command Line Interface (CLI)
Run experiments directly from YAML configuration files:
# Run all experiments in a YAML file
cli-experiment experiment.yaml
# Run specific experiment(s)
cli-experiment experiment.yaml --experiments my_experiment
# Run with variable substitution
cli-experiment experiment.yaml --var path=/data/datasets
# Inspect configuration without running
cli-experiment experiment.yaml --show-config
Python API
Use the Python API for programmatic control:
from ml_workbench import YamlConfig, Experiment, Runner
# Load configuration
config = YamlConfig("experiment.yaml")
# Create experiment
experiment = Experiment(config, "my_experiment")
# Run experiment
runner = Runner(experiment, verbose=True)
results = runner.run()
# Access results
print(f"Best model: {results['best_model']}")
print(f"Best score: {results['best_model_score']}")
Documentation
CLI Experiment Guide
Complete guide to using the command-line interface for running experiments. Learn how to execute experiments from YAML files, use variable substitution, inspect configurations, and view dataset statistics without running experiments.
Runner Class Documentation
Comprehensive documentation for the Runner class, the core execution engine for ML experiments. Includes workflow orchestration, dataset management, preprocessing pipelines, model training, evaluation metrics, feature analysis, and MLFlow integration.
YAML Configuration Specification
Detailed specification for YAML configuration files. Covers all sections including datasets, features, models, experiments, and MLflow settings. Includes validation rules and complete examples for defining ML experiments declaratively.
Implementation Summary
Technical overview of the Runner implementation, including architecture decisions, testing strategy, and implementation details. Useful for understanding the internal workings of the ML Workbench.
Packaging and CodeArtifact Guide
Step-by-step guide for packaging and publishing ML Workbench to AWS CodeArtifact. Covers prerequisites, authentication, version management, and CI/CD integration for distributing the package within your organization.
Setup
Environment Configuration for MLFlow Databricks Integration
To direct MLFlow to your Databricks workspace (dev-internal), create a .env file in the project root with the following configuration:
# Set MLflow tracking URI to your Databricks workspace
MLFLOW_TRACKING_URI="databricks"
# Define Databricks datapoint that match your workspace (this one is for dev-internal)
DATABRICKS_HOST="https://dbc-787720e9-26e6.cloud.databricks.com"
# Getting Your Databricks Token
# - Go to your Databricks workspace: https://dbc-787720e9-26e6.cloud.databricks.com
# - Click on your profile icon (top-right)
# - Select "Settings"
# - In "User" section, select "Developer"
# - Go to Access Tokens tab
# - Click Generate New Token
# - Give it a name (e.g., "MLFlow Local Development") and expiry
# - Copy the token (you'll only see it once!)
DATABRICKS_TOKEN="dapi123456781234567890" # <- replace with your own
Steps to set up:
-
Copy
.env.templateto.env:cp .env.template .env
-
Edit
.envand replaceDATABRICKS_TOKENwith your personal access token (see instructions in the comments above). -
The
.envfile is already in.gitignore, so your token won't be committed to version control.
Once configured, MLFlow will automatically log experiments to your Databricks workspace when you run experiments using the ML Workbench.
Git Pre-commit Hook for Automatic Version Increment
This project includes a pre-commit hook that automatically increments the patch version (last number) in pyproject.toml on each commit. For example, 0.0.2 → 0.0.3.
To set up the pre-commit hook:
Option 1: Use the setup script (recommended)
./scripts/setup-pre-commit.sh
Option 2: Manual installation
cp scripts/pre-commit .git/hooks/pre-commit && chmod +x .git/hooks/pre-commit
Verify the hook is set up correctly:
ls -la .git/hooks/pre-commit
You should see the file is executable (-rwxr-xr-x).
How it works:
- On each commit, the hook automatically:
- Reads the current version from
pyproject.toml - Increments the patch version (e.g.,
0.0.2→0.0.3) - Updates
pyproject.tomlwith the new version - Stages the updated file so it's included in your commit
- Reads the current version from
Note: The hook only increments the patch version (last number). To bump minor or major versions, manually edit pyproject.toml before committing.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ml_workbench-0.2.4.tar.gz.
File metadata
- Download URL: ml_workbench-0.2.4.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4bc10c688b72d701ad8d047c9b8121f9dbd8a453261fdd66738fef3c9a6132fe
|
|
| MD5 |
fbe010e477a33bcb09b4649f8c4ff752
|
|
| BLAKE2b-256 |
5f54ff33a93f1ab0a512192d37afc89d1d30c27613245eff4a73203e148bd6e6
|
File details
Details for the file ml_workbench-0.2.4-py3-none-any.whl.
File metadata
- Download URL: ml_workbench-0.2.4-py3-none-any.whl
- Upload date:
- Size: 41.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
370cec8155e4350d22891babea5c8eddb99b9799585cf400aa6b60811cacf547
|
|
| MD5 |
638bbba57c5c635a110715443d595f75
|
|
| BLAKE2b-256 |
4377bf8dbb0f6292b86807135b7937b4c164aebad92b2dc7a29059542b9ee867
|