A transparent AutoML library built from scratch with NumPy
Project description
GlassBox AutoML
GlassBox AutoML is a transparent machine learning library built from scratch with NumPy. The aim is to provide an end-to-end AutoML pipeline that remains readable, explainable, and easy to debug.
Project Goal
The project covers the full machine learning workflow inside the glassbox/ package:
- exploratory data analysis
- preprocessing
- models
- evaluation
- hyperparameter optimization
- agent-level AutoFit integration
Core library modules are built from scratch with NumPy only. No Scikit-Learn code belongs inside glassbox/.
Installation
Create and activate a virtual environment, then install dependencies:
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -r requirements.txt
If you need local development tools:
python3 -m pip install pytest jupyter
Quick Start
Run the full AutoFit pipeline on the included sample CSV:
from glassbox.agent import auto_fit
report = auto_fit(
"data/sample.csv",
target_column="purchased",
task="auto",
search="random",
time_budget=20,
)
print(report["best_model"])
print(report["cv_score"])
print(report["eda_summary"]["overview"])
The returned report is JSON-safe and includes:
- EDA overview, numerical profile table, correlations, and outlier rows
- selected task type
- candidate model leaderboard
- best model, best parameters, and cross-validation score
- feature importances or coefficient-style importances when available
Manual Workflow Example
import numpy as np
from glassbox.preprocessing import OneHotEncoder, SimpleImputer, StandardScaler
from glassbox.models import RandomForestClassifier
from glassbox.evaluation.classification import classification_report
X_num = np.array([
[22.0, 32000.0],
[24.0, np.nan],
[42.0, 76000.0],
])
X_cat = np.array([["basic"], ["basic"], ["plus"]], dtype=object)
y = np.array([0, 0, 1])
X_num = SimpleImputer(strategy="mean").fit_transform(X_num)
X_num = StandardScaler().fit_transform(X_num)
X_cat = OneHotEncoder().fit_transform(X_cat)
X = np.hstack([X_num, X_cat])
model = RandomForestClassifier(n_estimators=10, max_depth=4, random_state=42)
model.fit(X, y)
predictions = model.predict(X)
print(classification_report(y, predictions))
Model Zoo
Classification:
LogisticRegressionDecisionTreeClassifierRandomForestClassifierGaussianNaiveBayesKNearestNeighbors(task="classification")
Regression:
LinearRegressionDecisionTreeRegressorRandomForestRegressorKNearestNeighbors(task="regression")
Demo And Benchmarks
Launch the notebook:
jupyter notebook notebooks/demo.ipynb
Run the Scikit-Learn comparison benchmark for regression:
python benchmarks/sklearn_comparison.py --task regression --csv data/_uploaded.csv --target Delay
Run the Scikit-Learn comparison benchmark for classification:
python benchmarks/sklearn_comparison.py --task classification --csv data/classification.csv --target stroke
If --target is omitted, the script uses the last column in the CSV as the prediction target.
Scikit-Learn is used only in the benchmark script. The glassbox/ package itself remains NumPy-only.
Repository Structure
GlassBox-AutoML-Agent/
|-- .github/
| `-- pull_request_template.md
|-- glassbox/
| |-- agent/
| |-- eda/
| |-- evaluation/
| |-- models/
| |-- optimization/
| |-- preprocessing/
| `-- utils/
|-- tests/
|-- notebooks/
|-- benchmarks/
|-- data/
|-- README.md
|-- pyproject.toml
`-- requirements.txt
Testing
Run the test suite from the repository root:
python3 -m pytest -q
IronClaw / MCP Tool
The library exposes a single tool, auto_fit, through three surfaces:
glassbox.agent.mcp_server— a FastMCP server over stdio, the IronClaw deployment target.glassbox.agent.mcp_tool— a JSON-in/JSON-out CLI shim for scripted/sandbox testing.mcp.json— reference manifest documenting the tool schema (not consumed by IronClaw directly).
Register with IronClaw
After SSHing to your IronClaw box and pip install -e .[mcp]:
ironclaw mcp add glassbox \
--transport stdio \
--command python \
--arg -m --arg glassbox.agent.mcp_server
IronClaw stores the registration in ~/.ironclaw/mcp-servers.json and spawns the server over stdio whenever the agent calls the tool. Verify with ironclaw mcp list.
Run the tool directly (path mode)
python -m glassbox.agent.mcp_tool --input '{
"csv_path": "data/sample.csv",
"target_column": "purchased",
"task": "auto",
"search": "random",
"time_budget": 20
}'
Run the tool inside a sandbox where there is no host filesystem (bytes mode):
python -c "import base64,json,sys; \
print(json.dumps({'csv_b64': base64.b64encode(open('data/sample.csv','rb').read()).decode(), \
'target_column':'purchased'}))" \
| python -m glassbox.agent.mcp_tool
The response is a single JSON object: {"ok": true, "report": {...}} on success, or {"ok": false, "error": "..."} on failure. The report includes an explanation array of short bullets the agent can repeat back to the user.
Agent private key
Never commit the IronClaw agent private key. Use one of:
- Environment variable:
export IRONCLAW_AGENT_PRIVATE_KEY=...(or place it in a local.env— already gitignored). - The IronClaw CLI's own keystore at
~/.ironclaw/credentials(preferred for production).
The glassbox/ package itself never reads the key; only the IronClaw runtime does, when registering the agent.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file glassbox_automl_agent-0.1.0.tar.gz.
File metadata
- Download URL: glassbox_automl_agent-0.1.0.tar.gz
- Upload date:
- Size: 52.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0ca077b300f2a4f71150124df8a604f626e4ad9048abd0bff9ab065b5e7f7c4
|
|
| MD5 |
199226730f466c41ac2d77976ce919c4
|
|
| BLAKE2b-256 |
6492df68019fa858d9664b4657f20fa03affc263586eb6234c63524455bc6f78
|
File details
Details for the file glassbox_automl_agent-0.1.0-py3-none-any.whl.
File metadata
- Download URL: glassbox_automl_agent-0.1.0-py3-none-any.whl
- Upload date:
- Size: 52.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b37bab4ad6df899a07a4fbe95ae626cdc5ca311dd791a7fe0853434b0bef89db
|
|
| MD5 |
ed8e5ab17fcf6c4a3e59274eb2936988
|
|
| BLAKE2b-256 |
e2582d515b073c6260c03c7311ca2e70f1c1471a27f92d7ed29b2c025760c7be
|