Skip to main content

A transparent AutoML library built from scratch with NumPy

Project description

GlassBox AutoML

GlassBox AutoML is a transparent machine learning library built from scratch with NumPy. The aim is to provide an end-to-end AutoML pipeline that remains readable, explainable, and easy to debug.

Project Goal

The project covers the full machine learning workflow inside the glassbox/ package:

  • exploratory data analysis
  • preprocessing
  • models
  • evaluation
  • hyperparameter optimization
  • agent-level AutoFit integration

Core library modules are built from scratch with NumPy only. No Scikit-Learn code belongs inside glassbox/.

Installation

Create and activate a virtual environment, then install dependencies:

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -r requirements.txt

If you need local development tools:

python3 -m pip install pytest jupyter

Quick Start

Run the full AutoFit pipeline on the included sample CSV:

from glassbox.agent import auto_fit

report = auto_fit(
    "data/sample.csv",
    target_column="purchased",
    task="auto",
    search="random",
    time_budget=20,
)

print(report["best_model"])
print(report["cv_score"])
print(report["eda_summary"]["overview"])

The returned report is JSON-safe and includes:

  • EDA overview, numerical profile table, correlations, and outlier rows
  • selected task type
  • candidate model leaderboard
  • best model, best parameters, and cross-validation score
  • feature importances or coefficient-style importances when available

Manual Workflow Example

import numpy as np

from glassbox.preprocessing import OneHotEncoder, SimpleImputer, StandardScaler
from glassbox.models import RandomForestClassifier
from glassbox.evaluation.classification import classification_report

X_num = np.array([
    [22.0, 32000.0],
    [24.0, np.nan],
    [42.0, 76000.0],
])
X_cat = np.array([["basic"], ["basic"], ["plus"]], dtype=object)
y = np.array([0, 0, 1])

X_num = SimpleImputer(strategy="mean").fit_transform(X_num)
X_num = StandardScaler().fit_transform(X_num)
X_cat = OneHotEncoder().fit_transform(X_cat)
X = np.hstack([X_num, X_cat])

model = RandomForestClassifier(n_estimators=10, max_depth=4, random_state=42)
model.fit(X, y)
predictions = model.predict(X)

print(classification_report(y, predictions))

Model Zoo

Classification:

  • LogisticRegression
  • DecisionTreeClassifier
  • RandomForestClassifier
  • GaussianNaiveBayes
  • KNearestNeighbors(task="classification")

Regression:

  • LinearRegression
  • DecisionTreeRegressor
  • RandomForestRegressor
  • KNearestNeighbors(task="regression")

Demo And Benchmarks

Launch the notebook:

jupyter notebook notebooks/demo.ipynb

Run the Scikit-Learn comparison benchmark for regression:

python benchmarks/sklearn_comparison.py --task regression --csv data/_uploaded.csv --target Delay

Run the Scikit-Learn comparison benchmark for classification:

python benchmarks/sklearn_comparison.py --task classification --csv data/classification.csv --target stroke

If --target is omitted, the script uses the last column in the CSV as the prediction target.

Scikit-Learn is used only in the benchmark script. The glassbox/ package itself remains NumPy-only.

Repository Structure

GlassBox-AutoML-Agent/
|-- .github/
|   `-- pull_request_template.md
|-- glassbox/
|   |-- agent/
|   |-- eda/
|   |-- evaluation/
|   |-- models/
|   |-- optimization/
|   |-- preprocessing/
|   `-- utils/
|-- tests/
|-- notebooks/
|-- benchmarks/
|-- data/
|-- README.md
|-- pyproject.toml
`-- requirements.txt

Testing

Run the test suite from the repository root:

python3 -m pytest -q

IronClaw / MCP Tool

The library exposes a single tool, auto_fit, through three surfaces:

  • glassbox.agent.mcp_server — a FastMCP server over stdio, the IronClaw deployment target.
  • glassbox.agent.mcp_tool — a JSON-in/JSON-out CLI shim for scripted/sandbox testing.
  • mcp.json — reference manifest documenting the tool schema (not consumed by IronClaw directly).

Register with IronClaw

After SSHing to your IronClaw box and pip install -e .[mcp]:

ironclaw mcp add glassbox \
  --transport stdio \
  --command python \
  --arg -m --arg glassbox.agent.mcp_server

IronClaw stores the registration in ~/.ironclaw/mcp-servers.json and spawns the server over stdio whenever the agent calls the tool. Verify with ironclaw mcp list.

Run the tool directly (path mode)

python -m glassbox.agent.mcp_tool --input '{
  "csv_path": "data/sample.csv",
  "target_column": "purchased",
  "task": "auto",
  "search": "random",
  "time_budget": 20
}'

Run the tool inside a sandbox where there is no host filesystem (bytes mode):

python -c "import base64,json,sys; \
  print(json.dumps({'csv_b64': base64.b64encode(open('data/sample.csv','rb').read()).decode(), \
                    'target_column':'purchased'}))" \
  | python -m glassbox.agent.mcp_tool

The response is a single JSON object: {"ok": true, "report": {...}} on success, or {"ok": false, "error": "..."} on failure. The report includes an explanation array of short bullets the agent can repeat back to the user.

Agent private key

Never commit the IronClaw agent private key. Use one of:

  1. Environment variable: export IRONCLAW_AGENT_PRIVATE_KEY=... (or place it in a local .env — already gitignored).
  2. The IronClaw CLI's own keystore at ~/.ironclaw/credentials (preferred for production).

The glassbox/ package itself never reads the key; only the IronClaw runtime does, when registering the agent.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glassbox_automl_agent-0.1.0.tar.gz (52.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

glassbox_automl_agent-0.1.0-py3-none-any.whl (52.9 kB view details)

Uploaded Python 3

File details

Details for the file glassbox_automl_agent-0.1.0.tar.gz.

File metadata

  • Download URL: glassbox_automl_agent-0.1.0.tar.gz
  • Upload date:
  • Size: 52.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for glassbox_automl_agent-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d0ca077b300f2a4f71150124df8a604f626e4ad9048abd0bff9ab065b5e7f7c4
MD5 199226730f466c41ac2d77976ce919c4
BLAKE2b-256 6492df68019fa858d9664b4657f20fa03affc263586eb6234c63524455bc6f78

See more details on using hashes here.

File details

Details for the file glassbox_automl_agent-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for glassbox_automl_agent-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b37bab4ad6df899a07a4fbe95ae626cdc5ca311dd791a7fe0853434b0bef89db
MD5 ed8e5ab17fcf6c4a3e59274eb2936988
BLAKE2b-256 e2582d515b073c6260c03c7311ca2e70f1c1471a27f92d7ed29b2c025760c7be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page