Skip to main content

An LLM agent that sits next to you through your whole ML pipeline

Project description

mlcompass

An LLM agent that sits next to you through your whole ML pipeline โ€” from data, through training, all the way to deployment.

PyPI Python License

๐Ÿšง Pre-alpha (v0.0.1) โ€” under active development. APIs will change before v0.1.

What it does

mlcompass is a single CLI that follows your ML project from start to finish, keeping context across every step.

data.csv          train.py             results.csv         production
   โ”‚                  โ”‚                     โ”‚                  โ”‚
   โ–ผ                  โ–ผ                     โ–ผ                  โ–ผ
 advise   โ”€โ”€โ”€โ”€โ–บ   audit + watch  โ”€โ”€โ”€โ”€โ–บ  evaluate  โ”€โ”€โ”€โ”€โ–บ  deploy
                      compare

Each command writes to and reads from a shared project context (.mlcompass/), so by the time you reach deploy, the tool already knows your dataset, your model choice, your training history, and your evaluation results.

Six commands, one tool

Command When you run it What you get
init Starting a new project A .mlcompass/ folder that tracks decisions
advise You have a CSV, what now? Models to try, features to derive, pitfalls to avoid
audit Before you press train Static analysis of training script (seed, val, etc.)
watch While training runs Live plateau / overfit / NaN detection
compare After several runs Hypothesis-driven diff between two runs
evaluate Training done Threshold tuning, confusion matrix, hard examples
deploy Going to production Latency estimate, dependency check, ONNX advice

Quick example โ€” advise mode

mlcompass init churn-project
mlcompass advise data/customers.csv --target churn

Output:

๐Ÿ“Š Dataset analysis (data/customers.csv)
   โ€ข 10,000 rows ร— 23 columns
   โ€ข Target: churn (binary, 12% positive)
   โ€ข 4 categorical, 18 numerical, 1 datetime
   โ€ข 3 columns with >50% missing values (consider dropping)

๐Ÿ’ก Recommended models
   1. XGBoost / LightGBM   โ†’ tabular binary baseline
                             expected AUC: 0.82 โ€“ 0.87
   2. Logistic Regression  โ†’ interpretable baseline
                             expected AUC: 0.76 โ€“ 0.80
   3. FT-Transformer       โ†’ if GPU budget allows
                             expected AUC: 0.83 โ€“ 0.86

๐Ÿ”ง Suggested feature engineering
   โ€ข signup_date โ†’ derive days_since_signup, month, dayofweek
   โ€ข income (3 outliers >3ฯƒ) โ†’ winsorize at 99th percentile
   โ€ข country (47 categories) โ†’ target encoding or top-N

โš ๏ธ  Class imbalance (12% positive)
   โ€ข Don't optimize accuracy โ€” use AUC, F1, or recall@k
   โ€ข Consider class_weight='balanced' or focal loss

Generate a baseline notebook? [y/N]

Quick example โ€” watch mode (Faz 2)

mlcompass watch train.py

After 8 epochs:

โš ๏ธ  Epoch 8 โ€” overfitting detected
   Train loss: 0.118  |  Val loss: 0.387  (gap 0.27, normal <0.1)

   Likely cause: regularization is too weak for the model capacity.

   Suggested fix: increase dropout 0.1 โ†’ 0.3
   Apply and restart training? [y/N]

Why mlcompass

The ML ecosystem already has great tools โ€” but each owns one slice of the pipeline, and none of them advise:

pandas-profiling W&B / TensorBoard Cursor / Devin mlcompass
Analyzes raw data โœ… โŒ โŒ โœ…
Recommends models + features โŒ โŒ partial โœ…
Audits training scripts โŒ โŒ reactive โœ…
Watches training in real time โŒ dashboard โŒ โœ…
Diagnoses problems proactively โŒ โŒ reactive โœ…
Post-training evaluation advice โŒ basic โŒ โœ…
Deployment readiness check โŒ โŒ โŒ โœ…
Persistent project memory โŒ per-run โŒ โœ…
Permission-gated actions โŒ โŒ partial first-class

mlcompass is the advisor that sits next to all of these tools โ€” not a replacement for any.

Install

pip install mlcompass
export ANTHROPIC_API_KEY="sk-ant-..."

Usage

# Start a project
mlcompass init my-project

# Pre-training
mlcompass advise data.csv --target label

# Training-time          (Faz 2)
mlcompass audit train.py
mlcompass watch train.py
mlcompass compare run-3 run-7

# Post-training          (Faz 3)
mlcompass evaluate results.csv

# Deployment             (Faz 4)
mlcompass deploy --target sagemaker

How it works

Built on agentlite โ€” a small Claude agent library โ€” mlcompass uses one orchestrator agent per command, plus focused sub-agents for sub-tasks:

       cli.py
         โ”‚
   โ”Œโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”
   โ–ผ           โ–ผ
 advise      watch                ... deploy
 agent       agent
   โ”‚           โ”‚
   โ–ผ           โ–ผ
 ModelAdvisor  MetricsWatcher (Haiku, polls)
  (Opus)       Diagnostician  (Opus, called on anomaly)

Every action that would modify your code, config, or run a training process asks permission first โ€” agentlite's permission system is first-class, not an afterthought.

See ARCHITECTURE.md for the full design.

Project context

Each mlcompass project keeps a small folder, similar in spirit to .git/:

.mlcompass/
โ”œโ”€โ”€ project.yaml        # metadata
โ”œโ”€โ”€ context.json        # decisions, recommendations, active state
โ”œโ”€โ”€ datasets/           # registered datasets
โ””โ”€โ”€ runs/               # training run history

This is what makes mlcompass more than a chat tool: by the time you run deploy, every earlier decision is still in memory.

Roadmap

Phase Commands Status
Faz 1 (v0.1) init, advise ๐Ÿšง In progress
Faz 2 (v0.2) audit, watch, compare ๐Ÿ“… Planned
Faz 3 (v0.3) evaluate ๐Ÿ“… Planned
Faz 4 (v0.4) deploy ๐Ÿ“… Planned

See CHANGELOG.md for detailed plans and ARCHITECTURE.md for the design.

Non-goals

To stay focused, mlcompass will not try to be:

  • AutoML (use AutoGluon, AutoSklearn)
  • Experiment tracker (use MLflow, W&B)
  • Code assistant (use Cursor, Copilot, aider)
  • Monitoring dashboard (use Grafana, Streamlit)

mlcompass advises; you decide.

Contributing

Pre-alpha โ€” issues and discussions welcome, PRs after v0.1.

License

MIT ยฉ 2026 Hakan Sabunis

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlcompass-0.1.0.tar.gz (29.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlcompass-0.1.0-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file mlcompass-0.1.0.tar.gz.

File metadata

  • Download URL: mlcompass-0.1.0.tar.gz
  • Upload date:
  • Size: 29.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for mlcompass-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8bd95633b579d6f0a78fbd3df1c63afa716976f5806660d74b2476652bfdd402
MD5 3a46a6af8dd771b0d9d8c31d98846c1a
BLAKE2b-256 43cbdb52e6ebda7f540d3e9b5ae12a9efe5461730076e86128bf135e51badf89

See more details on using hashes here.

File details

Details for the file mlcompass-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mlcompass-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for mlcompass-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a71ccaded17136c16f303b6ac01440b6827de15e7e68dacf44b2fa6ddb683ef3
MD5 1aea3c3b1dcc16c3de3a3457ba6038e8
BLAKE2b-256 211d80e99a5ef38c9b74a807b6ba6408c08ecfe5f3697c4e7eb6820642c29c86

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page