Agnostic autonomous improvement engine — point it at any codebase, it makes it measurably better overnight

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

tccw-autoresearch

Agnostic autonomous improvement engine. Point it at any codebase — it makes it measurably better overnight while you sleep.

Inspired by karpathy/autoresearch — methodology only, zero code dependency. Clean-room implementation of the loop pattern, stripped of all ML assumptions.

What It Does

LOOP:
  1. Edit target files
  2. Run immutable harness
  3. Measure single metric
  4. If improved → keep (git commit)
  5. If worse  → discard (git reset)
  6. REPEAT until budget exhausted

Works on anything with a measurable outcome: test pass rates, build times, response latency, error counts, coverage percentages. No GPU. No ML. Tests replace training runs.

Hard Dependency: Claude Code

AutoResearch does not edit code itself. It is an orchestrator. The actual code changes are made by Claude Code agents — Anthropic's CLI for autonomous coding.

You must have claude installed and on your PATH. The engine spawns claude as a subprocess for each experiment, passing it the marker's mutable/immutable file rules, the agent profile, and permission flags. Claude Code reads the code, forms hypotheses, makes edits, and runs the harness. AutoResearch decides whether to keep or discard the result.

autoresearch (orchestrator)
  └── claude (agent) ← does the actual coding
        ├── reads mutable files
        ├── edits code
        ├── runs metric harness
        └── commits if improved

Install Claude Code: https://docs.anthropic.com/en/docs/claude-code

# Verify it's available
claude --version

Without claude on PATH, autoresearch run will fail.

The Marker

A .autoresearch/config.yaml in any repository declares what to improve:

markers:
  - name: test-suite-health
    description: "Improve test coverage and reduce test runtime"
    status: active
    target:
      mutable:
        - tests/test_daemon.py
        - tests/test_engine.py
      immutable:
        - src/autoresearch/daemon.py
        - src/autoresearch/engine.py
    metric:
      command: "python3 -m pytest --tb=no -q 2>&1 | tail -1"
      extract: "grep -oP '\\d+(?= passed)'"
      direction: higher
      baseline: 2541
    guard:
      command: "python3 -m pytest --tb=short -q 2>&1"
      extract: "grep -oP '\\d+(?= passed)'"
      threshold: 2541
      rework_attempts: 2
    loop:
      model: sonnet
      budget_per_experiment: 25m
      max_experiments: 10
    agent:
      name: copilot
      model: sonnet
      permission_mode: bypassPermissions
      allowed_tools:
        - "Edit(tests/*)"
        - "Bash(python3 *)"
        - "Bash(pytest *)"
      disallowed_tools:
        - "Bash(rm *)"
        - "Bash(git push *)"
        - "Bash(curl *)"

Add the file, run autoresearch — the engine handles the rest.

Usage

# Interactive TUI (select marker, press 'r' to run)
autoresearch

# Headless — for AI agents, CI/CD, cron
autoresearch run -m test-suite-health --headless

# Initialize .autoresearch/ in a repo with default agent profile
autoresearch init

# Status, results, confidence
autoresearch status -m tccw-autoresearch:test-suite-health --headless
autoresearch results -m tccw-autoresearch:test-suite-health --headless
autoresearch confidence -m tccw-autoresearch:test-suite-health --headless

# Finalize: clean branches from messy experiment history
autoresearch finalize -m tccw-autoresearch:test-suite-health --headless

# Daemon — scheduled overnight runs
autoresearch daemon start
autoresearch daemon status
autoresearch daemon stop

Intelligence Features

Feature	Description
Ideas backlog	Failed experiments log why they were interesting — future sessions don't repeat mistakes
Graduated escalation	3 failures → REFINE → 5 → PIVOT → 2 PIVOTs → SEARCH → 3 PIVOTs → HALT
Statistical confidence	MAD-based scoring after 3+ experiments — ignores benchmark noise
Dual-gate guard	Metric gate + regression guard — prevents gaming the metric by breaking something else
Finalization	Clean, reviewable branches from messy experimental history
Agent profiles	Per-marker Claude Code settings.json + CLAUDE.md generated at runtime
Permission enforcement	Mutable/immutable translated to `--allowedTools`/`--disallowedTools` CLI flags
Telemetry	Stream-json parsing into TelemetryReport (tokens, cost, tools, errors)

Quick Start — Set Up Any Repo in 3 Steps

1. Install the CLI

Prerequisites:

Python 3.10+
Claude Code (claude CLI) installed and authenticated — this is what actually edits code

# Install autoresearch
git clone git@github.com:The-Cloud-Clock-Work/tccw-autoresearch.git
cd tccw-autoresearch
pip install -e .

# Verify both are available
autoresearch --help
claude --version

2. Initialize your target repo

cd /path/to/your-project
autoresearch init

This creates .autoresearch/config.yaml with a starter marker and .autoresearch/agents/ with the default agent profile.

3. Configure your marker

Edit .autoresearch/config.yaml to match your project:

markers:
  - name: my-improvement          # Pick a short, descriptive name
    description: "What you want to improve"
    status: active                 # Set to 'active' to run

    target:
      mutable:                     # Files the engine CAN edit
        - src/**/*.py
      immutable:                   # Test/harness files — NEVER touched
        - tests/test_main.py

    metric:
      command: "pytest tests/test_main.py -q --tb=no 2>&1 | tail -1"
      extract: "grep -oP '\\d+(?= passed)'"
      direction: higher            # 'higher' = more is better, 'lower' = less is better
      baseline: 10                 # Current value before any improvement

    loop:
      model: sonnet                # AI model: sonnet, opus, haiku
      budget_per_experiment: 10m   # Time limit per experiment
      max_experiments: 20          # Stop after N experiments

Key rules for the marker:

metric.command must be a shell command that produces output (not a bare regex)
metric.extract must be a shell command that filters the output to a single number
target.immutable files are protected — the agent cannot edit them
target.mutable files are the only ones the agent is allowed to change

Run it

# Interactive — pick marker from TUI menu
autoresearch

# Headless — for AI agents, CI/CD, cron, scripts
autoresearch run -m my-improvement --headless

# Check progress
autoresearch status --headless
autoresearch results -m my-improvement --headless

Common marker examples

Increase test pass count:

metric:
  command: "pytest tests/ -q --tb=no 2>&1 | tail -1"
  extract: "grep -oP '\\d+(?= passed)'"
  direction: higher
  baseline: 42

Reduce build time (seconds):

metric:
  command: "bash -c 'TIMEFORMAT=%R; time make build 2>&1'"
  extract: "tail -1"
  direction: lower
  baseline: 120

Increase code coverage (%):

metric:
  command: "pytest --cov=src --cov-report=term 2>&1 | tail -1"
  extract: "grep -oP '\\d+(?=%)'"
  direction: higher
  baseline: 65

Reduce lint warnings:

metric:
  command: "ruff check src/ 2>&1 | tail -1"
  extract: "grep -oP '\\d+(?= error)'"
  direction: lower
  baseline: 30

Architecture

src/autoresearch/
  marker.py          # .autoresearch/config.yaml schema + parser (Pydantic)
  engine.py          # Core experiment loop + AgentRunner ABC
  worktree.py        # Git worktree isolation per marker
  metrics.py         # Harness execution + metric extraction + MAD confidence
  program.py         # Runtime program.md generation (string.Template)
  agent_profile.py   # settings.json + CLAUDE.md generation + permission flags
  telemetry.py       # Stream-json telemetry parsing
  finalize.py        # Cherry-pick + squash winning commits
  cli.py             # CLI entry point (Typer, 13 commands, dual-mode)
  cli_utils.py       # Headless helpers (JSON output, prompts)
  daemon.py          # Daemon service (cron, double-fork, concurrent runs)
  state.py           # Global state (~/.autoresearch/state.json)
  config.py          # Config defaults (~/.autoresearch/config.yaml)
  results.py         # results.tsv read/write
  ideas.py           # ideas.md backlog
  utils.py           # Shared utilities (parse_duration)
  agents/default/    # Default agent profile (CLAUDE.md, settings, rules)

Design principles:

Agnostic — no assumptions about what it improves
Self-contained — repo + marker file = everything needed to run
Dual-mode — every command works interactively AND headlessly (--headless)
Permission-locked — agent can only edit mutable files, enforced via CLI flags
No ML dependencies — no torch, no CUDA, no GPU

Status

All blocks complete. 2,557 tests passing. CI gates PRs via .github/workflows/ci.yml.

Block	Description	Status
Block 1	Foundation — marker schema, state, results, ideas	Done
Block 2	Engine — loop, worktree, metrics, escalation, confidence	Done
Block 3	CLI — interactive TUI + headless JSON, 13 commands	Done
Block 4	Daemon + packaging — cron, double-fork, pip install	Done
Block 5	Agent profiles + telemetry — permissions, stream-json	Done

Dependencies

pydantic>=2.0 — schema validation
pyyaml>=6.0 — YAML parsing
rich>=13.0 — TUI rendering
typer>=0.12 — CLI framework
croniter>=2.0 — cron schedule evaluation

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

tccw

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.3

Apr 6, 2026

0.2.0

Apr 6, 2026

This version

0.1.0

Apr 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tccw_autoresearch-0.1.0.tar.gz (186.3 kB view details)

Uploaded Apr 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tccw_autoresearch-0.1.0-py3-none-any.whl (56.3 kB view details)

Uploaded Apr 5, 2026 Python 3

File details

Details for the file tccw_autoresearch-0.1.0.tar.gz.

File metadata

Download URL: tccw_autoresearch-0.1.0.tar.gz
Upload date: Apr 5, 2026
Size: 186.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tccw_autoresearch-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`1c1185b4aeccfbd88769d07c9b11706870975c0197b8cf2166077ea2937ddc04`
MD5	`5f8add58f2733f29c3c94cf115fda2e8`
BLAKE2b-256	`1c6153cb4f0658017e56b9ffddfddae5a5cf7555dbdeba04bffc6133b56daf34`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tccw_autoresearch-0.1.0.tar.gz:

Publisher: workflow.yaml on The-Cloud-Clock-Work/tccw-autoresearch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tccw_autoresearch-0.1.0.tar.gz
- Subject digest: 1c1185b4aeccfbd88769d07c9b11706870975c0197b8cf2166077ea2937ddc04
- Sigstore transparency entry: 1239378745
- Sigstore integration time: Apr 5, 2026
Source repository:
- Permalink: The-Cloud-Clock-Work/tccw-autoresearch@c63292c9df273dea6050c629813671624c0f3c13
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/The-Cloud-Clock-Work
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflow.yaml@c63292c9df273dea6050c629813671624c0f3c13
- Trigger Event: push

File details

Details for the file tccw_autoresearch-0.1.0-py3-none-any.whl.

File metadata

Download URL: tccw_autoresearch-0.1.0-py3-none-any.whl
Upload date: Apr 5, 2026
Size: 56.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tccw_autoresearch-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b1dedc46df72181cef3b508f02bf261f243df7864da98283416054fffe539dff`
MD5	`c97f9461e13b0f82e0794cf81fd63f27`
BLAKE2b-256	`bc4978338419146401ba9098d46a0f2f6010c77593afa4bd7923fc8729ed1d11`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tccw_autoresearch-0.1.0-py3-none-any.whl:

Publisher: workflow.yaml on The-Cloud-Clock-Work/tccw-autoresearch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tccw_autoresearch-0.1.0-py3-none-any.whl
- Subject digest: b1dedc46df72181cef3b508f02bf261f243df7864da98283416054fffe539dff
- Sigstore transparency entry: 1239378747
- Sigstore integration time: Apr 5, 2026
Source repository:
- Permalink: The-Cloud-Clock-Work/tccw-autoresearch@c63292c9df273dea6050c629813671624c0f3c13
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/The-Cloud-Clock-Work
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflow.yaml@c63292c9df273dea6050c629813671624c0f3c13
- Trigger Event: push

tccw-autoresearch 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

tccw-autoresearch

What It Does

Hard Dependency: Claude Code

The Marker

Usage

Intelligence Features

Quick Start — Set Up Any Repo in 3 Steps

1. Install the CLI

2. Initialize your target repo

3. Configure your marker

Run it

Common marker examples

Architecture

Status

Dependencies

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance