Agentic synthetic-data generation framework inspired by Meta FAIR's Autodata / Agentic Self-Instruct.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

autosynth

Generate synthetic datasets with an LLM loop that proposes, audits, solves, and judges its own work. Inspired by Meta FAIR's Autodata / Agentic Self-Instruct paper, but rewritten to be domain-agnostic: every domain-specific piece lives in a small Python plugin, and the runtime is the same regardless of whether you're generating math word problems, support-ticket triage data, or QA pairs from your own docs.

The headline trick: for each candidate datapoint, run a weak solver and a strong solver, score both against an LLM-generated rubric, and only keep the example if the strong solver clearly beats the weak one on a quality-passing example. Failed rounds are reflected on and fed back into the next attempt.

Status: alpha (0.1.0). The API is still moving. Pin a commit if you're depending on it.

Install

uv venv
uv pip install -e .             # core
uv pip install -e ".[dev]"      # + pytest, ruff
uv pip install -e ".[hf]"       # + Hugging Face export

Python 3.10+. Either activate the venv (source .venv/bin/activate) or prefix commands with uv run.

Quick start (no API keys)

uv run autosynth run --config configs/mock_demo.yaml
uv run autosynth status outputs/mock-demo
uv run autosynth export --run outputs/mock-demo --format jsonl

The mock demo uses an in-process scripted "provider" and finishes in about a second. It writes outputs/mock-demo/run.db plus a frozen config snapshot. The export step is opt-in — the SQLite database is the source of truth.

Real providers

LLM calls go through LiteLLM, so any provider it supports should work. Set the relevant key and reference the model in YAML:

export OPENAI_API_KEY=...
export ANTHROPIC_API_KEY=...

challenger:    { provider_model: anthropic/claude-haiku-4-5, temperature: 0.8 }
weak_solver:   { provider_model: openai/gpt-4o-mini }
strong_solver: { provider_model: openai/gpt-4o }
judge:         { provider_model: anthropic/claude-haiku-4-5, temperature: 0.0 }

You can mix providers across roles. The cheaper-vs-frontier split between the two solvers is the whole point — that's what produces the weak/strong gap that drives acceptance.

${VAR} and ${VAR:default} substitution works in any string field, so api_base: ${OLLAMA_HOST:http://localhost:11434} does what you'd expect.

See configs/example_qa.yaml and configs/example_math.yaml for full real-provider configs.

How it works

For each source item, autosynth runs the same five-step loop until the candidate is accepted or loop.max_rounds is exhausted:

Challenger proposes a candidate (input, reference_output, rubric).
Quality audits the candidate for obvious problems.
Weak and strong solvers each take N attempts at the input.
Judge scores every attempt against the rubric.
Evaluator decides accept / reject. If reject, reflector writes feedback for the next round.

The acceptance defaults come from §3 of the paper:

weak average ≤ 0.65, weak max ≤ 0.75
strong average in [0.60, 0.95)
strong − weak gap ≥ 0.20
quality must have passed

All of these are overridable in acceptance: in your config.

Architecture

The runtime is an event-sourced pipeline over a SQLite database. A pure step() function advances item state; the dispatcher fulfills LLM requests and writes responses back; the store is the durable record.

pipeline.step()        pure state machine: (state, responses) -> (state, requests)
dispatcher             reads ready items, calls step(), fulfills requests
  ├─ fulfill_local     threadpool over HTTP
  └─ fulfill_batch     provider batch APIs (see "Batch" below)
store                  SQLite + WAL, one run.db per run
llm                    provider routing, rate-limit, retry, cost accounting

Item states: PENDING → NEED_CANDIDATE → NEED_QUALITY → NEED_SCORES with NEED_REFLECTION on the reject branch and ACCEPTED / REJECTED as terminals. NEED_SCORES fans out N × weak + N × strong solver requests in parallel; each judge fires the moment its solver lands. Concurrency is bounded by cfg.dispatcher.concurrency.

The fact that step() is pure is the only reason resume works. Kill the process at any point — including mid-batch — and autosynth resume picks up exactly where it left off. In-flight local requests revert to pending; in-flight batch requests stay tagged and get polled.

CLI

autosynth run --config CONFIG.yaml [--run-id ID] [--resume RUN_ID] [-v]
autosynth resume RUN_DIR
autosynth status RUN_DIR
autosynth inspect-run RUN_DIR [--stuck]
autosynth export --run RUN_DIR --format jsonl|hf [--out PATH]
autosynth metaopt --config CONFIG.yaml
autosynth init-domain NAME --out my_domain.py

status is the one-liner; inspect-run is the detailed per-item table. --stuck filters to items that haven't reached a terminal state, which is what you want when something looks wrong.

Run outputs

Everything for a run lives under outputs/<run_id>/:

run.db — SQLite. Tables: runs, items, rounds, requests, responses, solver_scores, accepted. Queryable with the sqlite3 CLI and safe to share.
config.snapshot.yaml — the exact config used. Resume reads this if you don't pass --config.
accepted.jsonl / hf_export/ — produced on autosynth export, not written automatically.

Each accepted record contains input, reference_output, rubric, domain, source_id, metadata, the weak/strong/gap scores, per-attempt solver scores, and the acceptance rationale.

Writing a domain

A domain plugin is one class subclassing DomainAdapter with six methods. Scaffold one with:

uv run autosynth init-domain customer_support -o my_domain.py

Fill in load_grounding, generation_prompt, validate_candidate, solver_prompt, quality_prompt, and judge_prompt, then point your config at it:

domain:
  path: ./my_domain.py:CustomerSupport
  params:
    source_csv: ./tickets.csv

The two bundled domains (src/autosynth/domains/qa_from_documents.py, src/autosynth/domains/math_word_problems.py) are short and worth reading before you write your own.

Meta-optimization

autosynth metaopt --config CONFIG.yaml runs the paper's secondary loop: evolve the orchestrator's prompts over generations. The unit of evolution is a HarnessSpec — a structured bag of rule strings that get injected into each agent's system prompt, plus a couple of numeric knobs.

The loop, roughly:

Score the seed harness on training and validation source items.
Each iteration: Boltzmann-sample a parent from the population (T=0.1 over training scores), summarize that parent's most recent rejection reasons, ask the mutator LLM for a structured diff, apply it, dedupe, and re-evaluate.
Accept the mutation only if child.val > parent.val — the paper's gate.

Mutations operate on the harness, not on Python source. That preserves the main lever the paper exercises (prompt-text edits) without the sandboxing headache of a code-editing agent. Swap in your own mutator if you want richer edits.

Try it without keys:

uv run autosynth metaopt --config configs/metaopt_mock.yaml

The mock scenario seeds at 0% accept, the mutator proposes a source-specificity rule on iteration 1 that lifts both train and val to 100%, that mutation is accepted, and subsequent iterations get deduplicated. Population, lineage, and per-iteration decisions are written under outputs/metaopt/<run_id>/iterations/.

To run for real, add metaopt: { enabled: true, max_iterations: 50, ... } to your existing config and point metaopt.mutator at a strong reasoning model. Meta-opt reuses your existing domain, acceptance, loop, and agent settings.

Batch mode

The dispatcher can submit requests through provider batch APIs (OpenAI /v1/batches, Anthropic message batches) for the 50% cost discount. The BatchProvider protocol and a MockBatchProvider are in the box. Real provider implementations are not — wiring those up is the next piece of work. If you only have a few thousand requests, fulfill_local is fine.

Safety and quality notes

Every accepted datapoint carries an acceptance_rationale and a serialized EvalReport. There is no silent acceptance path.
The built-in PII filter (safety.enabled: true) is a conservative heuristic, not a real DLP. For anything regulated, plug your own module in via safety.filter.
Solvers are never told they're the weak or strong solver — the differential comes from the model/temperature choice. The paper flags adversarial prompting here as a gaming vector, so don't.
There is no diversity / near-duplicate check on accepted examples yet. If you need that, extend store.insert_accepted with MinHash or embedding-based dedupe.
LLM-as-judge bias is what it is. The rubric weight cap (≤ 7) and the positive-only rule from the paper help, but don't pretend they eliminate it.

Tests

uv run pytest

The full suite (~130 tests) runs against the in-process mock provider — no keys, no network. The interesting bits to look at if you're touching the core:

test_pure_pipeline.py — exhaustive state-transition coverage of step(), including the partial-completion no-op invariant.
test_store.py — claim_pending atomicity under threads, resume normalization.
test_dispatcher.py — end-to-end accept, 100-request concurrent fulfill, budget abort, kill/resume.

License

MIT. See LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Ahmad8864

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

May 13, 2026

0.1.0

May 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autosynth-0.1.1.tar.gz (98.1 kB view details)

Uploaded May 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

autosynth-0.1.1-py3-none-any.whl (83.5 kB view details)

Uploaded May 13, 2026 Python 3

File details

Details for the file autosynth-0.1.1.tar.gz.

File metadata

Download URL: autosynth-0.1.1.tar.gz
Upload date: May 13, 2026
Size: 98.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for autosynth-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`992bea30c81c28490eddb2089a69e18cda254c2408a04ec48a722033b2727c7d`
MD5	`7f6f527f93c5c202f821aaf09f9b4d94`
BLAKE2b-256	`f94f336483743e90df7b6a0a5680fc065c208339be0a6e3c5bafd9286b45c215`

See more details on using hashes here.

File details

Details for the file autosynth-0.1.1-py3-none-any.whl.

File metadata

Download URL: autosynth-0.1.1-py3-none-any.whl
Upload date: May 13, 2026
Size: 83.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for autosynth-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5cad9ecaf46227b46eae0d99c0f451d05698947c9b1d0627fa56e73203266066`
MD5	`a349da2560ca09593c1f3dd533eef63c`
BLAKE2b-256	`003b3019cb99770756fe3ea07aa55c21867d7d389b02adf18ac0ddf379992842`

See more details on using hashes here.

autosynth 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

autosynth

Install

Quick start (no API keys)

Real providers

How it works

Architecture

CLI

Run outputs

Writing a domain

Meta-optimization

Batch mode

Safety and quality notes

Tests

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes