Collect data from multiple sources and run autonomous LLM training via autoresearch

These details have not been verified by PyPI

Project description

MultiAgentTrainer

Collect data from multiple sources, run autonomous LLM training experiments using autoresearch, and fine-tune open-source or managed models on your corpus. Configure everything in a YAML file and let mat handle ingestion, corpus building, training, and fine-tuning.

MultiAgentTrainer can be used standalone, but is designed as a companion to AgentTester — use AgentTester to evaluate and compare coding agents, then use MultiAgentTrainer to train models on the data those agents produce and consume.

Install

uv pip install -e ".[dev]"

# For open-source fine-tuning (HuggingFace + PEFT/LoRA):
uv pip install -e ".[opensource]"

Quick Start

# List configured data sources
mat sources

# Ingest data sources without training (inspect the corpus)
mat ingest

# Run the full pipeline: ingest → corpus → train
mat train

# Run with overrides
mat train --max-experiments 10 --output-dir ./my-runs

# Label a run so it shows up clearly in mat watch
mat train --name llama3

# Run multiple models in parallel and watch progress live
mat train --config llama3.yaml --name llama3 &
mat train --config mistral.yaml --name mistral &
mat watch

# Check past training runs
mat status

Data Sources

Configure data sources in multiagenttrainer.yaml:

sources:
  # Local git repository
  - type: local_repo
    path: /home/user/my-project

  # Any git-cloneable URL
  - type: remote_repo
    url: "https://github.com/user/repo.git"
    branch: main

  # GitHub repository (web URL)
  - type: github_repo
    url: "https://github.com/user/repo"

  # All repos in a GitHub organisation
  - type: github_org
    url: "https://github.com/my-org"
    max_repos: 50
    visibility: all   # all | public | private

  # AWS Bedrock knowledge base
  - type: bedrock_knowledge_base
    knowledge_base_id: "ABCDEF1234"
    region: "us-east-1"
    query: "training data for code generation"
    max_results: 100

Configuration

Copy config.example.yaml to multiagenttrainer.yaml in your working directory.

Top-level sections

Section	Description
`autoresearch`	Autoresearch repo URL/path, branch, train time, optional `program.md` override
`sources`	List of data sources to ingest
`training`	Agent command, max experiments, output directory, execution target

Execution targets

By default experiments run locally. Set training.execution.type to run on a remote host or inside a container instead.

SSH — rsync the workspace to a remote machine and run experiments over SSH:

training:
  execution:
    type: ssh
    ssh_host: user@gpu-box.example.com   # required
    ssh_key: ~/.ssh/id_ed25519           # optional; uses SSH default otherwise
    remote_dir: /tmp/mat-runs            # base dir on the remote host

Docker — copy the workspace into a running container and exec commands inside it:

training:
  execution:
    type: docker
    container: my-training-container     # required; must already be running
    container_dir: /tmp/mat-runs         # base dir inside the container

Source Types

Type	Required Fields	Optional Fields
`local_repo`	`path`	`include`, `exclude`, `name`
`remote_repo`	`url`	`branch`, `name`
`github_repo`	`url`	`branch`, `name`
`github_org`	`url`	`max_repos`, `visibility`, `name`
`bedrock_knowledge_base`	`knowledge_base_id`	`region`, `query`, `max_results`, `name`

Fine-Tuning

Fine-tune models directly on your ingested corpus using mat finetune. Two backends are supported today; more can be added by subclassing FineTuner.

Open-source models (HuggingFace + LoRA/QLoRA)

Requires a local GPU and pip install 'multiagenttrainer[opensource]'.

# multiagenttrainer.yaml
finetuner:
  backend: opensource
  jobs_dir: ./finetune-jobs
  opensource:
    model_id: meta-llama/Llama-3.2-1B
    output_dir: ./finetuned-models
    lora_r: 16
    lora_alpha: 32
    num_epochs: 3
    batch_size: 4
    use_4bit: true          # QLoRA — requires bitsandbytes + CUDA

# Ingest sources first (or skip if you already have a corpus)
mat ingest

# Fine-tune on the ingested corpus
mat finetune start

# Or point at an arbitrary corpus file
mat finetune start --corpus /path/to/corpus.txt --name my-run

# List all jobs
mat finetune list

# Check a job
mat finetune status <job-id>

Training runs in-process and blocks until complete. The LoRA adapter and tokenizer are saved to output_dir/<job-id>/.

AWS Bedrock model customization

Uses your existing boto3 credentials. Submits a Bedrock customization job and returns immediately — poll with mat finetune status.

finetuner:
  backend: bedrock
  jobs_dir: ./finetune-jobs
  bedrock:
    base_model_id: amazon.titan-text-lite-v1
    region: us-east-1
    role_arn: arn:aws:iam::123456789012:role/BedrockFineTuningRole
    output_s3_uri: s3://my-bucket/finetuned-models/
    training_data_s3_uri: s3://my-bucket/training-data/
    customization_type: CONTINUED_PRE_TRAINING   # or FINE_TUNING
    epochs: 1

mat finetune start
mat finetune status <job-arn>
mat finetune cancel <job-arn>

Adding a new backend

Subclass FineTuner, implement the four abstract methods, add a config dataclass, and register it in finetuner/registry.py:

# finetuner/finetuner.py
class AnthropicFineTuner(FineTuner):
    def prepare_dataset(self, corpus_path): ...
    def start_job(self, dataset, job_name): ...
    def get_status(self, job_id): ...
    def cancel_job(self, job_id): ...
    def describe(self): ...

# finetuner/registry.py
if cfg.backend == "anthropic":
    return AnthropicFineTuner(cfg.anthropic, jobs_dir, console)

How It Works

Ingest — Fetch data from all configured sources (clone repos, query Bedrock KBs)
Build corpus — Walk fetched files, filter by include/exclude globs, concatenate into a single corpus
Setup — Clone autoresearch, inject the corpus, optionally override program.md
Train — Launch the agent command iteratively for up to max_experiments rounds
Report — Generate a markdown report with experiment results, best val_bpb, and stats
Fine-tune (optional) — Run mat finetune start to fine-tune a model on the same corpus

Development

uv pip install -e ".[dev]"
ruff check src/ tests/
ruff format src/ tests/
pytest

Docker

docker compose run --rm mat train
docker compose run --rm mat sources

Library Usage

import asyncio
from pathlib import Path
from multiagenttrainer import Ingester, Runner, load_config

async def main():
    cfg = load_config()
    ingester = Ingester(cfg.sources, Path(".staging"))
    ingester.fetch_all()
    ingester.build_corpus(Path("corpus.txt"))

    runner = Runner(cfg.autoresearch, cfg.training, name="my-run")
    workspace = runner.setup_workspace(Path("corpus.txt"))
    results = await runner.run_experiments(workspace)
    for r in results:
        print(f"experiment {r.experiment_id}: val_bpb={r.val_bpb}")

asyncio.run(main())

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2.0.1

May 13, 2026

This version

2.0.0

May 13, 2026

1.2.2

May 13, 2026

1.2.1

May 13, 2026

1.1.2

May 12, 2026

1.1.0

May 10, 2026

1.0.1

May 9, 2026

0.1.0

May 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multiagenttrainer-2.0.0.tar.gz (221.6 kB view details)

Uploaded May 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

multiagenttrainer-2.0.0-py3-none-any.whl (40.5 kB view details)

Uploaded May 13, 2026 Python 3

File details

Details for the file multiagenttrainer-2.0.0.tar.gz.

File metadata

Download URL: multiagenttrainer-2.0.0.tar.gz
Upload date: May 13, 2026
Size: 221.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for multiagenttrainer-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`f47a7b147998370f31bd614edceff439fa2d4efefc5f63e684ed9acf08d584a7`
MD5	`938cc2f57b9973da3db1678d262ce430`
BLAKE2b-256	`4f38ec8bb7134ea78b32d24090f322e2fce07377aa2d8d1dfce9d818ffbbd11b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for multiagenttrainer-2.0.0.tar.gz:

Publisher: publish.yml on sroomberg/MultiAgentTrainer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: multiagenttrainer-2.0.0.tar.gz
- Subject digest: f47a7b147998370f31bd614edceff439fa2d4efefc5f63e684ed9acf08d584a7
- Sigstore transparency entry: 1525475551
- Sigstore integration time: May 13, 2026
Source repository:
- Permalink: sroomberg/MultiAgentTrainer@63e27b228ee5785927ef11c9d722cff12870b64a
- Branch / Tag: refs/tags/v2.0.0
- Owner: https://github.com/sroomberg
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@63e27b228ee5785927ef11c9d722cff12870b64a
- Trigger Event: push

File details

Details for the file multiagenttrainer-2.0.0-py3-none-any.whl.

File metadata

Download URL: multiagenttrainer-2.0.0-py3-none-any.whl
Upload date: May 13, 2026
Size: 40.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for multiagenttrainer-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a148fb06122319b3ecc61674c028993fb4cb2d9166d443aeb065d521d3571b2a`
MD5	`2011346aef0b656c3a87ec170e5ee094`
BLAKE2b-256	`2cdbcd46d96e5c768288de568f71b6e9a6bb4e6d7424ca07e3a5ff98d553ce3f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for multiagenttrainer-2.0.0-py3-none-any.whl:

Publisher: publish.yml on sroomberg/MultiAgentTrainer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: multiagenttrainer-2.0.0-py3-none-any.whl
- Subject digest: a148fb06122319b3ecc61674c028993fb4cb2d9166d443aeb065d521d3571b2a
- Sigstore transparency entry: 1525475569
- Sigstore integration time: May 13, 2026
Source repository:
- Permalink: sroomberg/MultiAgentTrainer@63e27b228ee5785927ef11c9d722cff12870b64a
- Branch / Tag: refs/tags/v2.0.0
- Owner: https://github.com/sroomberg
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@63e27b228ee5785927ef11c9d722cff12870b64a
- Trigger Event: push

multiagenttrainer 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

MultiAgentTrainer

Install

Quick Start

Data Sources

Configuration

Top-level sections

Execution targets

Source Types

Fine-Tuning

Open-source models (HuggingFace + LoRA/QLoRA)

AWS Bedrock model customization

Adding a new backend

How It Works

Development

Docker

Library Usage

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance