AI-powered autonomous ML research framework — agent runs experiments against frozen evaluation contracts on your existing pipelines.

These details have not been verified by PyPI

Project links

Project description

🥋 Dojo — An AI-powered autonomous ML research framework.

Run controlled, reproducible ML experiments on your existing pipelines and build a memory of what actually works.

What is Dojo?

You define a domain — a research area pointing at your data with a fixed evaluation contract. An AI agent runs experiments inside that contract: writing training code, calling frozen load_data and evaluate tools, logging metrics, and recording findings as durable knowledge atoms.

Domain (you define)
  ├── Task            — the contract: load_data + evaluate (frozen, AI-generated at setup)
  ├── Workspace       — your repo / pipeline (local path or git url)
  └── Experiments     — agent-created, many per domain
        └── Knowledge atoms — linked across experiments, accumulating over time

The agent owns the training code. The framework owns evaluation. That separation is what makes the metrics trustworthy run-over-run, and what makes it safe to leave the agent unsupervised.

Inspired by Karpathy's autoresearch — prepare.py is frozen, train.py is fair game, program.md is what the human iterates on. Dojo generalises that pattern to any well-defined ML problem class.

Status

⚠️ Proof of Concept — under active development. Open source. Single-tenant, local-first, by design.

Agent: Claude Agent SDK (uses your local claude CLI auth — no API key needed for runs)
Storage: Local JSON files in .dojo/ — your data stays on your machine
Tasks supported: RegressionTask (more types to come once regression is solid)

Install

The package on PyPI is dojoml; the CLI binary it installs is dojo.

uv tool install dojoml         # recommended — isolated, on your PATH
# or
pipx install dojoml
# or
pip install dojoml

Prerequisites:

Python 3.13+
The claude CLI logged in (Claude Code) — Dojo shells out to it; no ANTHROPIC_API_KEY needed for agent runs.

Quickstart — `dojo onboard`

dojo onboard is the recommended entry point. Run it inside an existing Python project — it adds .dojo/ next to your code, reuses your pyproject.toml / requirements.txt for dependencies, and walks you through everything else:

cd path/to/your/python/project
uv tool install dojoml          # one-time
dojo onboard                    # answers a few questions, generates load_data + evaluate, freezes the task
dojo run                        # the agent starts running experiments

That's it. Your research lives at .dojo/ in the project (knowledge, runs, frozen tools), and your code stays where it always was. dojo onboard asks for:

the agent + tracking + linker backends (sensible defaults — hit enter)
a domain name + description
how to fill in PROGRAM.md + SETUP.md — open them in $EDITOR now (recommended for short content) or skip and finish manually (writes default templates, stops before tool generation so you can edit at your own pace and run dojo task setup when ready)

When you pick "open in $EDITOR", onboard then runs the AI tool generator, verifies load_data + evaluate against the frozen regression contract, and freezes the task. If the verifier hits a missing import, onboard offers to install it into the workspace venv and retries automatically.

When you pick "skip", onboard stops cleanly after writing the templates. Edit PROGRAM.md + SETUP.md, then run dojo task setup to generate + verify + freeze when you're ready.

Existing codebase? Use the `dojo-onboard` Claude Code skill

For real projects — where you already have data loaders, a metric, and paragraphs of context — the Typer prompts are the wrong UI. We ship a Claude Code skill that runs the whole flow as a conversation: it reads your code, asks a few targeted questions, drafts PROGRAM.md + SETUP.md from your answers, drives dojo task setup, and iterates on verifier failures until the AI-written load_data.py + evaluate.py connectors verify cleanly against your data.

uv tool install dojoml      # if you haven't already
dojo skill install dojo-onboard
# then in Claude Code, from your project directory:
/dojo-onboard

dojo skill install fetches the skill from this repo into ~/.claude/skills/dojo-onboard/. Pass --scope project to install into ./.claude/skills/ for the current project only, or --ref main to pull the latest from main instead of the installed version's tag. The skill requires Claude Code installed locally — it's not invoked by dojo directly.

Don't have a project yet? Try a preset

If you just want to see Dojo work end-to-end on a canned dataset:

mkdir housing && cd housing
dojo onboard --preset california_housing   # ready-to-run PROGRAM.md + SETUP.md
dojo run --max-turns 30

The california_housing preset uses sklearn.datasets.fetch_california_housing and pre-installs scikit-learn, pandas, numpy, matplotlib into a fresh venv. More presets coming.

Scripted setup (`dojo init`)

For CI or non-interactive use where prompts aren't acceptable, the older four-step path is still available:

dojo init --name housing --task-type regression --non-interactive
$EDITOR PROGRAM.md SETUP.md
dojo task setup
dojo run

If the AI keeps generating the wrong adapters on real-world pipelines (unusual pandas multi-indexes, custom dataset APIs, wrapping an existing evaluator), set Opus 4.7 as the tool-generation model:
DOJO_AGENT__TOOL_GENERATION_MODEL=claude-opus-4-7 dojo task setup
Opus is slower (~30–60s vs 15–30s) but noticeably better at translating a messy SETUP.md into correct load_data / evaluate modules. Set it permanently in .dojo/config.yaml under agent.tool_generation_model.

What happens under the hood:

dojo onboard / dojo init writes .dojo/config.yaml, creates the domain + regression task with expected_metrics = [rmse, r2, mae], scaffolds PROGRAM.md and SETUP.md, and sets current_domain_id.
dojo task setup reads SETUP.md, asks the AI to generate load_data + evaluate, runs each tool in a sandbox against its ToolContract, and freezes the task. Verification failures tell you which tool failed and why — fix SETUP.md (or the tool code) and re-run. (dojo onboard runs this automatically as its last step.)
dojo run starts the agent in-process. The agent writes training code; load_data and evaluate stay frozen. The metric dict from evaluate is the only source of truth — complete_experiment rejects metric keys outside the contract, so the agent can't smuggle in custom numbers.

Useful neighbours:

dojo task show               # current task status, tools, frozen?
dojo runs ls                 # recent runs
dojo runs show               # last run's events + cost
dojo program show            # print the live PROGRAM.md
dojo domain use <name>       # switch active domain

Stopping a run

dojo run blocks the foreground until the agent finishes. To stop it early:

Ctrl-C in the running terminal — the canonical path. The orchestrator is interrupted, the framework asks the backend to summarise any durable findings as knowledge atoms (a small one-shot LLM call), then prints a final cost line. A second Ctrl-C aborts the cleanup immediately.
dojo stop [run_id] from another terminal — marks the run STOPPED on disk. This does not halt an in-process foreground run (the orchestrator lives inside the other terminal's Python process); use it to recover records left RUNNING after a hard kill, or to stop server-mode runs.

Reviewing what happened

dojo experiments ls          # rank experiments by the primary metric (best first)
dojo experiments best        # show the single best experiment so far
dojo experiments show <id>   # full detail: hypothesis, metrics, code path, errors
dojo runs show               # last run's events + total cost

dojo experiments ls orders by the task's primary_metric and direction (e.g. rmse minimised), so the leader sits on top regardless of run order. The agent's training code is preserved per-experiment in the workspace as __dojo_train_<experiment_id>.py — cat it to reproduce a run by hand.

Artifacts

Each experiment gets a fresh .dojo/domains/{id}/runs/{eid}/artifacts/ directory. The runner passes its path as artifacts_dir to both train() and evaluate().

evaluate(..., artifacts_dir) writes durable per-run diagnostics — residual plots, calibration curves, error breakdowns. These are produced on every run and are part of the user-defined evaluation contract in SETUP.md.
train(..., artifacts_dir) writes opportunistic artifacts — model checkpoints (joblib.dump(model, artifacts_dir / "model.pkl")), training curves, feature importances. The agent decides when an artifact is worth keeping; not every run will write here.

Everything written to artifacts_dir is:

Copied into the durable Dojo archive at .dojo/artifacts/experiments/{eid}/....
Forwarded to the active tracking backend (MlflowTracker.log_artifact uploads to MLflow; FileTracker records a reference; NoopTracker drops it).

Configuration

Create .dojo/config.yaml in your project root:

agent:
  backend: claude      # "stub" (no LLM, deterministic) or "claude"
tracking:
  backend: file        # "file" or "mlflow"

Or override via environment variables (note the double underscore for nested fields):

DOJO_AGENT__BACKEND=claude
DOJO_TRACKING__BACKEND=mlflow

Web UI / HTTP API (optional)

dojo start                   # FastAPI server on http://localhost:8000

The server reads the same .dojo/ your CLI commands write to, so a CLI-started run is visible to the API and vice versa.

Note: the React frontend is not bundled in the PyPI release yet. If you want the web UI, run it from a checkout — see Development below.

Migrating from v0.0.10

If your domain has a v0.0.10 PROGRAM.md with mixed Goal/Dataset/Evaluate content:

Create SETUP.md next to PROGRAM.md with the existing ## Dataset and ## Evaluate sections.
Trim PROGRAM.md to ## Goal, ## Target, ## Success, ## Notes.
Run dojo task setup again — the regression contract is now v4 (train receives artifacts_dir), so any frozen task needs re-verification anyway.

Development

Most of the contributor reference lives in CLAUDE.md (architecture, directory map, "how do I add X" recipes, conventions). This section is the minimum to clone and run tests.

Additional prerequisites for the dev path:

uv
just
Node.js 18+ (only if you want to run the web UI)

git clone https://github.com/Garsdal/Dojo.git && cd Dojo
just dev                     # install backend + frontend deps
just test                    # run the test suite
just lint                    # ruff check
just format                  # auto-fix lint + format

For the full server + web UI dev loop:

just run-stub                # API + frontend with the stub agent (no LLM, deterministic)
just run-claude              # API + frontend with the Claude agent

Backend → http://localhost:8000 · Frontend → http://localhost:5173.

Pointers

CLAUDE.md — architecture, directory map, conventions, recipes.
docs/MASTER_PLAN.md — vision and the typed-Task design.
docs/RELEASING.md — release flow.

HTTP API endpoints

Method	Path	Description
`POST`	`/domains`	Create a research domain
`POST`	`/domains/{id}/task`	Attach a Task (regression today)
`POST`	`/domains/{id}/tools/generate`	AI-generate `load_data` / `evaluate` from SETUP.md, verify against contract
`POST`	`/domains/{id}/task/freeze`	Freeze the task — gated on every required tool's verification
`POST`	`/domains/{id}/workspace/setup`	One-time workspace prep (venv + deps)
`POST`	`/agent/run`	Start an agent run on a domain (requires a frozen task)
`GET`	`/agent/runs/{id}/events`	Live SSE event stream
`GET`	`/experiments?domain_id=`	List experiments
`GET`	`/knowledge?domain_id=`	List knowledge atoms
`GET`	`/health`	Health check

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.23

May 14, 2026

0.0.22

May 14, 2026

0.0.21

May 14, 2026

This version

0.0.20

May 11, 2026

0.0.19

May 11, 2026

0.0.18

May 9, 2026

0.0.17

May 9, 2026

0.0.16

May 9, 2026

0.0.15

May 9, 2026

0.0.14

May 9, 2026

0.0.13

May 9, 2026

0.0.12

May 6, 2026

0.0.11

May 6, 2026

0.0.10

May 5, 2026

0.0.9

May 5, 2026

0.0.8

May 5, 2026

0.0.7

May 4, 2026

0.0.6

May 4, 2026

0.0.5

May 4, 2026

0.0.4

May 3, 2026

0.0.3

May 3, 2026

0.0.2

May 3, 2026

0.0.1

May 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dojoml-0.0.20.tar.gz (122.8 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dojoml-0.0.20-py3-none-any.whl (169.6 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file dojoml-0.0.20.tar.gz.

File metadata

Download URL: dojoml-0.0.20.tar.gz
Upload date: May 11, 2026
Size: 122.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dojoml-0.0.20.tar.gz
Algorithm	Hash digest
SHA256	`848a54151d71480af9077d51686b20a0a5acc362507a1e5fd65bb24a26de7952`
MD5	`f8df2c1961d23d0e4cfbccd682a0a57d`
BLAKE2b-256	`590a40b91b43677bb9a9095bb4094b5f64b942e2b26a99bc177ce1018357ae0a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dojoml-0.0.20.tar.gz:

Publisher: release.yml on Garsdal/Dojo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dojoml-0.0.20.tar.gz
- Subject digest: 848a54151d71480af9077d51686b20a0a5acc362507a1e5fd65bb24a26de7952
- Sigstore transparency entry: 1510756100
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: Garsdal/Dojo@a194df27d870e384ccae157c6d9b575145bb4b68
- Branch / Tag: refs/tags/v0.0.20
- Owner: https://github.com/Garsdal
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@a194df27d870e384ccae157c6d9b575145bb4b68
- Trigger Event: push

File details

Details for the file dojoml-0.0.20-py3-none-any.whl.

File metadata

Download URL: dojoml-0.0.20-py3-none-any.whl
Upload date: May 11, 2026
Size: 169.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dojoml-0.0.20-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ba091eb7c1a4a21eebe8b2f920eff6d61cea9cdea2900d7baf758cadbb0d1adb`
MD5	`a514192ba7064a47a9ca228a23386cf7`
BLAKE2b-256	`ea517cc0665c746910da3d6861fc0c54262b7fb1b852fa7e35c5279600085c33`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dojoml-0.0.20-py3-none-any.whl:

Publisher: release.yml on Garsdal/Dojo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dojoml-0.0.20-py3-none-any.whl
- Subject digest: ba091eb7c1a4a21eebe8b2f920eff6d61cea9cdea2900d7baf758cadbb0d1adb
- Sigstore transparency entry: 1510756236
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: Garsdal/Dojo@a194df27d870e384ccae157c6d9b575145bb4b68
- Branch / Tag: refs/tags/v0.0.20
- Owner: https://github.com/Garsdal
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@a194df27d870e384ccae157c6d9b575145bb4b68
- Trigger Event: push

dojoml 0.0.20

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🥋 Dojo — An AI-powered autonomous ML research framework.

What is Dojo?

Status

Install

Quickstart — dojo onboard

Existing codebase? Use the dojo-onboard Claude Code skill

Don't have a project yet? Try a preset

Scripted setup (dojo init)

Stopping a run

Reviewing what happened

Artifacts

Configuration

Web UI / HTTP API (optional)

Migrating from v0.0.10

Development

Pointers

HTTP API endpoints

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Quickstart — `dojo onboard`

Existing codebase? Use the `dojo-onboard` Claude Code skill

Scripted setup (`dojo init`)