An LLM agent that sits next to you through your whole ML pipeline

These details have not been verified by PyPI

Project links

Project description

mlcompass

An LLM agent that sits next to you through your whole ML pipeline — from data, through training, all the way to deployment.

🚧 Alpha (v0.2.0) — under active development. APIs may change before v1.0.

What it does

mlcompass is a single CLI that follows your ML project from data to production, keeping context across every step.

data.csv         train.py          two runs        results.csv      production
   │                │                  │                │                │
   ▼                ▼                  ▼                ▼                ▼
 advise   ────►   audit   ────►    compare     ────► evaluate ─────►  deploy
                  watch

Each command writes to and reads from a shared project context (.mlcompass/), so by the time you reach deploy, the tool already knows your dataset, your model choice, your training history, and your evaluation results.

What's in v0.2

Five commands are implemented; two are planned.

Command	When you run it	What you get	Status
`init`	Starting a new project	A `.mlcompass/` folder that tracks decisions	✅ v0.1
`advise`	You have a CSV, what now?	Models to try, features to derive, pitfalls to avoid	✅ v0.1
`audit`	Before you press train	Static analysis of training script (seed, val, optimizer, …)	✅ v0.2
`watch`	While training runs	Plateau / overfit / NaN / divergence detection	✅ v0.2
`compare`	After several runs	Side-by-side config + final-metric diff with verdict	✅ v0.2
`evaluate`	Training done	Threshold tuning, confusion matrix, hard examples	📅 v0.3
`deploy`	Going to production	Latency estimate, dependency check, ONNX advice	📅 v0.4

Every Faz 2 command (audit, watch, compare) keeps a fully deterministic default path and gains an opt-in --llm flag that adds a Claude-driven interpretation step on top.

Install

pip install mlcompass
export ANTHROPIC_API_KEY="sk-ant-..."   # only needed for --llm modes

Five-minute tour

mlcompass init my-project

# Pre-training
mlcompass advise data.csv --target churn

# Training-time
mlcompass audit train.py                     # static checks
mlcompass audit train.py --llm               # + prioritized synthesis
mlcompass watch train.log                    # one-shot anomaly scan
mlcompass watch train.log --follow           # live tail mode
mlcompass watch train.log --llm              # + diagnostician

# Comparing runs
mlcompass compare run-3 run-7                # deterministic diff
mlcompass compare run-3 run-7 --llm          # + hypothesis + next experiment

Example — `advise`

mlcompass advise examples/customer_churn.csv

📊 Dataset analysis
   Path:    examples/customer_churn.csv
   Shape:   500 rows × 8 columns
   Target:  churn (high confidence)
   Task:    binary classification (0=98%, 1=2%)

⚠ Warnings
  • Class imbalance detected (1.6% minority class). Don't optimise
    accuracy — use AUC/F1/recall@k. Consider class_weight='balanced'
    or focal loss.

✨ Recommended models  (with --llm)
  • XGBoost                 AUC 0.78 – 0.83
  • Logistic Regression     AUC 0.70 – 0.74
  • LightGBM                AUC 0.78 – 0.84

Example — `audit`

mlcompass audit train.py

🔎 Script audit
   Path: train.py | Lines: 23 | Frameworks: torch

   ✗ error    seed              No random seed set anywhere
   ✗ error    optimizer   L17   Adam does not accept momentum=
   ⚠ warning  val_split         No validation split detected
   ⚠ warning  grad_clipping L8  LSTM but no clip_grad_norm_
   ⚠ warning  dataloader  L20   DataLoader missing shuffle=
   ⚠ warning  loss_stability L23 log(x) without epsilon clipping
   ℹ info     batch_size  L20   batch_size=1 is very small

   Summary: 2 error   4 warning   1 info

Eight pure-AST rules:

Rule	Catches
`seed`	No `torch.manual_seed` / `np.random.seed` / `set_seed` call
`val_split`	No split detected, or split implausibly small
`optimizer`	Adam-family + `momentum=`, weird lr, SGD without momentum
`loss_stability`	`log(x)` / `np.log(x)` without clamp or epsilon
`dataloader`	`DataLoader(...)` without explicit `shuffle=`
`grad_clipping`	RNN / Transformer built but `clip_grad_norm_` never called
`eval_mode`	`model.train()` appears but `.eval()` never does
`batch_size`	Implausibly small (<4) or huge (>4096)

Example — `watch`

mlcompass watch train.log

👁  Watch report
   Log:        train.log
   Snapshots:  9
   Last epoch: 7
   Findings:   1 warning

Recent metrics (last 8)
┌───────┬────────────┬──────────┬─────────┐
│ Epoch │ train_loss │ val_loss │ val_acc │
├───────┼────────────┼──────────┼─────────┤
│   0   │       0.65 │     0.68 │   0.612 │
│   …   │        …   │      …   │    …    │
│   7   │       0.08 │     0.59 │   0.773 │
└───────┴────────────┴──────────┴─────────┘

⚠ warning  overfitting  L7  train_loss dropped -0.17 but val_loss
                            rose +0.11; current gap is 0.51

Four detectors:

Rule	Triggers when
`nan`	Any loss-like metric becomes NaN or ±Inf
`divergence`	Train loss jumps ≥10× between consecutive snapshots
`plateau`	Primary loss flat across the last 5 snapshots
`overfitting`	Train falling, val rising, with a meaningful gap

Add --follow to tail the log file and surface new findings live.

Example — `compare`

mlcompass compare run-3 run-7

🆚 Run comparison
   Run A  run-3  (baseline)             · 20 epochs
   Run B  run-7  (lower-lr-more-dropout) · 20 epochs

Final-epoch metrics
   Metric      Run A    Run B    Δ (B − A)   Winner
   train_loss  0.18     0.24     +0.06       A
   val_acc     0.79     0.87     +0.08       B
   val_loss    0.42     0.28     -0.14       B

Config differences
   dropout     0.1      0.3
   lr          0.001    0.0003

⚖️ Mixed result: A wins 1, B wins 2, 0 tie(s).

Why mlcompass

The ML ecosystem already has great tools — but each owns one slice of the pipeline, and none of them advise:

	pandas-profiling	W&B / TensorBoard	Cursor / Devin	mlcompass
Analyzes raw data	✅	❌	❌	✅
Recommends models + features	❌	❌	partial	✅
Audits training scripts	❌	❌	reactive	✅
Watches training in real time	❌	dashboard	❌	✅
Diagnoses problems proactively	❌	❌	reactive	✅
Persistent project memory	❌	per-run	❌	✅
Permission-gated actions	❌	❌	partial	first-class

mlcompass is the advisor that sits next to all of these tools — not a replacement for any.

How it works

Built on agentlite — a small Claude agent library — mlcompass uses one deterministic analyzer per command (pure pandas / pure AST / pure log parser) plus an optional LLM agent layer that runs on top of the analyzer's structured output.

        cli.py
          │
   ┌──────┼──────┬─────────┬──────────┐
   ▼      ▼      ▼         ▼          ▼
 init  advise  audit     watch     compare
                │         │           │
                ▼         ▼           ▼
            (--llm)    (--llm)     (--llm)
            priori-   diagnos-   hypothes-
            tizer     tician     izer

Every action that would modify your code, config, or run a training process asks permission first — agentlite's permission system is first-class, not an afterthought.

See ARCHITECTURE.md for the full design.

Project context

Each mlcompass project keeps a small folder, similar in spirit to .git/:

.mlcompass/
├── project.yaml        # metadata
├── context.json        # decisions, recommendations, active state
├── datasets/           # registered datasets
├── runs/               # training run history (consumed by compare)
└── advice.log          # JSONL of every command run

This is what makes mlcompass more than a chat tool: by the time you run deploy, every earlier decision is still in memory.

Roadmap

Phase	Commands	Status
Faz 1 (v0.1)	`init`, `advise`	✅ Shipped
Faz 2 (v0.2)	`audit`, `watch`, `compare` + `--llm`	✅ Shipped
Faz 2.x (planned)	TensorBoard / W&B log support,	🚧 In progress
	permission-gated config edits
Faz 3 (v0.3)	`evaluate`	📅 Planned
Faz 4 (v0.4)	`deploy`	📅 Planned

See CHANGELOG.md for the detailed log and ARCHITECTURE.md for the design.

Non-goals

To stay focused, mlcompass will not try to be:

AutoML (use AutoGluon, AutoSklearn)
Experiment tracker (use MLflow, W&B)
Code assistant (use Cursor, Copilot, aider)
Monitoring dashboard (use Grafana, Streamlit)

mlcompass advises; you decide.

Contributing

Alpha-stage — issues and discussions welcome, see CONTRIBUTING.md for the dev setup.

License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.7.1

May 31, 2026

0.7.0

May 31, 2026

0.6.1

May 31, 2026

0.6.0

May 30, 2026

0.5.0

May 30, 2026

0.4.0

May 30, 2026

0.3.1

May 30, 2026

This version

0.2.0

May 30, 2026

0.1.0

May 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlcompass-0.2.0.tar.gz (66.0 kB view details)

Uploaded May 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlcompass-0.2.0-py3-none-any.whl (51.5 kB view details)

Uploaded May 30, 2026 Python 3

File details

Details for the file mlcompass-0.2.0.tar.gz.

File metadata

Download URL: mlcompass-0.2.0.tar.gz
Upload date: May 30, 2026
Size: 66.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for mlcompass-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`28fe504c74ce20d4099e215f64acb2295018d7af445666fda72cb4d7853f1099`
MD5	`a6a368f4914e58f5297697bf05066a08`
BLAKE2b-256	`df53483cb6419de64afd1fe3d61b9f824c652b2fd591b42d4db6052cc3369bbf`

See more details on using hashes here.

File details

Details for the file mlcompass-0.2.0-py3-none-any.whl.

File metadata

Download URL: mlcompass-0.2.0-py3-none-any.whl
Upload date: May 30, 2026
Size: 51.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for mlcompass-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3d7f2a331f0d285da397b00a1b3403b6bab9d67c794677036f689d8182e8d4b5`
MD5	`6993861c89279031811e8db26b9835fd`
BLAKE2b-256	`3bddd53abc075533abaf64d30fe0230c081b9560fd32b5a0e8c499bd8041e667`

See more details on using hashes here.

mlcompass 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mlcompass

What it does

What's in v0.2

Install

Five-minute tour

Example — advise

Example — audit

Example — watch

Example — compare

Why mlcompass

How it works

Project context

Roadmap

Non-goals

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Example — `advise`

Example — `audit`

Example — `watch`

Example — `compare`