An LLM agent that sits next to you through your whole ML pipeline
Project description
mlcompass
An LLM agent that sits next to you through your whole ML pipeline โ from data, through training, all the way to deployment.
๐ง Alpha (v0.2.0) โ under active development. APIs may change before v1.0.
What it does
mlcompass is a single CLI that follows your ML project from data to production, keeping context across every step.
data.csv train.py two runs results.csv production
โ โ โ โ โ
โผ โผ โผ โผ โผ
advise โโโโโบ audit โโโโโบ compare โโโโโบ evaluate โโโโโโบ deploy
watch
Each command writes to and reads from a shared project context
(.mlcompass/), so by the time you reach deploy, the tool already
knows your dataset, your model choice, your training history, and your
evaluation results.
What's in v0.2
Five commands are implemented; two are planned.
| Command | When you run it | What you get | Status |
|---|---|---|---|
init |
Starting a new project | A .mlcompass/ folder that tracks decisions |
โ v0.1 |
advise |
You have a CSV, what now? | Models to try, features to derive, pitfalls to avoid | โ v0.1 |
audit |
Before you press train | Static analysis of training script (seed, val, optimizer, โฆ) | โ v0.2 |
watch |
While training runs | Plateau / overfit / NaN / divergence detection | โ v0.2 |
compare |
After several runs | Side-by-side config + final-metric diff with verdict | โ v0.2 |
evaluate |
Training done | Threshold tuning, confusion matrix, hard examples | ๐ v0.3 |
deploy |
Going to production | Latency estimate, dependency check, ONNX advice | ๐ v0.4 |
Every Faz 2 command (audit, watch, compare) keeps a fully
deterministic default path and gains an opt-in --llm flag that adds
a Claude-driven interpretation step on top.
Install
pip install mlcompass
export ANTHROPIC_API_KEY="sk-ant-..." # only needed for --llm modes
Five-minute tour
mlcompass init my-project
# Pre-training
mlcompass advise data.csv --target churn
# Training-time
mlcompass audit train.py # static checks
mlcompass audit train.py --llm # + prioritized synthesis
mlcompass watch train.log # one-shot anomaly scan
mlcompass watch train.log --follow # live tail mode
mlcompass watch train.log --llm # + diagnostician
# Comparing runs
mlcompass compare run-3 run-7 # deterministic diff
mlcompass compare run-3 run-7 --llm # + hypothesis + next experiment
Example โ advise
mlcompass advise examples/customer_churn.csv
๐ Dataset analysis
Path: examples/customer_churn.csv
Shape: 500 rows ร 8 columns
Target: churn (high confidence)
Task: binary classification (0=98%, 1=2%)
โ Warnings
โข Class imbalance detected (1.6% minority class). Don't optimise
accuracy โ use AUC/F1/recall@k. Consider class_weight='balanced'
or focal loss.
โจ Recommended models (with --llm)
โข XGBoost AUC 0.78 โ 0.83
โข Logistic Regression AUC 0.70 โ 0.74
โข LightGBM AUC 0.78 โ 0.84
Example โ audit
mlcompass audit train.py
๐ Script audit
Path: train.py | Lines: 23 | Frameworks: torch
โ error seed No random seed set anywhere
โ error optimizer L17 Adam does not accept momentum=
โ warning val_split No validation split detected
โ warning grad_clipping L8 LSTM but no clip_grad_norm_
โ warning dataloader L20 DataLoader missing shuffle=
โ warning loss_stability L23 log(x) without epsilon clipping
โน info batch_size L20 batch_size=1 is very small
Summary: 2 error 4 warning 1 info
Eight pure-AST rules:
| Rule | Catches |
|---|---|
seed |
No torch.manual_seed / np.random.seed / set_seed call |
val_split |
No split detected, or split implausibly small |
optimizer |
Adam-family + momentum=, weird lr, SGD without momentum |
loss_stability |
log(x) / np.log(x) without clamp or epsilon |
dataloader |
DataLoader(...) without explicit shuffle= |
grad_clipping |
RNN / Transformer built but clip_grad_norm_ never called |
eval_mode |
model.train() appears but .eval() never does |
batch_size |
Implausibly small (<4) or huge (>4096) |
Example โ watch
mlcompass watch train.log
๐ Watch report
Log: train.log
Snapshots: 9
Last epoch: 7
Findings: 1 warning
Recent metrics (last 8)
โโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโ
โ Epoch โ train_loss โ val_loss โ val_acc โ
โโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโค
โ 0 โ 0.65 โ 0.68 โ 0.612 โ
โ โฆ โ โฆ โ โฆ โ โฆ โ
โ 7 โ 0.08 โ 0.59 โ 0.773 โ
โโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโ
โ warning overfitting L7 train_loss dropped -0.17 but val_loss
rose +0.11; current gap is 0.51
Four detectors:
| Rule | Triggers when |
|---|---|
nan |
Any loss-like metric becomes NaN or ยฑInf |
divergence |
Train loss jumps โฅ10ร between consecutive snapshots |
plateau |
Primary loss flat across the last 5 snapshots |
overfitting |
Train falling, val rising, with a meaningful gap |
Add --follow to tail the log file and surface new findings live.
Example โ compare
mlcompass compare run-3 run-7
๐ Run comparison
Run A run-3 (baseline) ยท 20 epochs
Run B run-7 (lower-lr-more-dropout) ยท 20 epochs
Final-epoch metrics
Metric Run A Run B ฮ (B โ A) Winner
train_loss 0.18 0.24 +0.06 A
val_acc 0.79 0.87 +0.08 B
val_loss 0.42 0.28 -0.14 B
Config differences
dropout 0.1 0.3
lr 0.001 0.0003
โ๏ธ Mixed result: A wins 1, B wins 2, 0 tie(s).
Why mlcompass
The ML ecosystem already has great tools โ but each owns one slice of the pipeline, and none of them advise:
| pandas-profiling | W&B / TensorBoard | Cursor / Devin | mlcompass | |
|---|---|---|---|---|
| Analyzes raw data | โ | โ | โ | โ |
| Recommends models + features | โ | โ | partial | โ |
| Audits training scripts | โ | โ | reactive | โ |
| Watches training in real time | โ | dashboard | โ | โ |
| Diagnoses problems proactively | โ | โ | reactive | โ |
| Persistent project memory | โ | per-run | โ | โ |
| Permission-gated actions | โ | โ | partial | first-class |
mlcompass is the advisor that sits next to all of these tools โ not a replacement for any.
How it works
Built on agentlite โ a small Claude agent library โ mlcompass uses one deterministic analyzer per command (pure pandas / pure AST / pure log parser) plus an optional LLM agent layer that runs on top of the analyzer's structured output.
cli.py
โ
โโโโโโโโผโโโโโโโฌโโโโโโโโโโฌโโโโโโโโโโโ
โผ โผ โผ โผ โผ
init advise audit watch compare
โ โ โ
โผ โผ โผ
(--llm) (--llm) (--llm)
priori- diagnos- hypothes-
tizer tician izer
Every action that would modify your code, config, or run a training process asks permission first โ agentlite's permission system is first-class, not an afterthought.
See ARCHITECTURE.md for the full design.
Project context
Each mlcompass project keeps a small folder, similar in spirit to
.git/:
.mlcompass/
โโโ project.yaml # metadata
โโโ context.json # decisions, recommendations, active state
โโโ datasets/ # registered datasets
โโโ runs/ # training run history (consumed by compare)
โโโ advice.log # JSONL of every command run
This is what makes mlcompass more than a chat tool: by the time you
run deploy, every earlier decision is still in memory.
Roadmap
| Phase | Commands | Status |
|---|---|---|
| Faz 1 (v0.1) | init, advise |
โ Shipped |
| Faz 2 (v0.2) | audit, watch, compare + --llm |
โ Shipped |
| Faz 2.x (planned) | TensorBoard / W&B log support, | ๐ง In progress |
| permission-gated config edits | ||
| Faz 3 (v0.3) | evaluate |
๐ Planned |
| Faz 4 (v0.4) | deploy |
๐ Planned |
See CHANGELOG.md for the detailed log and ARCHITECTURE.md for the design.
Non-goals
To stay focused, mlcompass will not try to be:
- AutoML (use AutoGluon, AutoSklearn)
- Experiment tracker (use MLflow, W&B)
- Code assistant (use Cursor, Copilot, aider)
- Monitoring dashboard (use Grafana, Streamlit)
mlcompass advises; you decide.
Contributing
Alpha-stage โ issues and discussions welcome, see CONTRIBUTING.md for the dev setup.
License
MIT ยฉ 2026 Hakan Sabunis
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlcompass-0.2.0.tar.gz.
File metadata
- Download URL: mlcompass-0.2.0.tar.gz
- Upload date:
- Size: 66.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28fe504c74ce20d4099e215f64acb2295018d7af445666fda72cb4d7853f1099
|
|
| MD5 |
a6a368f4914e58f5297697bf05066a08
|
|
| BLAKE2b-256 |
df53483cb6419de64afd1fe3d61b9f824c652b2fd591b42d4db6052cc3369bbf
|
File details
Details for the file mlcompass-0.2.0-py3-none-any.whl.
File metadata
- Download URL: mlcompass-0.2.0-py3-none-any.whl
- Upload date:
- Size: 51.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d7f2a331f0d285da397b00a1b3403b6bab9d67c794677036f689d8182e8d4b5
|
|
| MD5 |
6993861c89279031811e8db26b9835fd
|
|
| BLAKE2b-256 |
3bddd53abc075533abaf64d30fe0230c081b9560fd32b5a0e8c499bd8041e667
|