Local-first Harness Operating System for defining, evaluating, evolving, versioning, and deploying agent harnesses.
Project description
AgentRL
Local-first Harness Operating System for agents.
Install package: agentrl-os
Import package: agentrl
CLI command: agentrl
AgentRL is a systems layer for defining, evaluating, evolving, versioning, and deploying agent harnesses through one unified interface. It standardizes how agent systems are tested and improved over time without forcing teams to rebuild task formats, reward schemas, trajectory traces, registries, and local deployment plumbing from scratch.
AgentRL is not an orchestration framework and not primarily an RL framework. RL, prompt optimization, skill optimization, memory optimization, preference learning, and tool optimization are implementation details behind the harness abstraction.
Why AgentRL exists
Agentic systems are becoming easier to prototype and harder to operate.
A team can assemble a capable agent from a model, a prompt, tools, memory, an orchestration library, and a few eval scripts. The hard part starts after the demo:
How are tasks represented?
How are rewards represented?
How are evaluation results compared over time?
How are trajectories stored and replayed?
How are harness changes versioned?
How do you know a prompt, skill, tool policy, or memory policy actually improved behavior?
How do you safely deploy a new behavior and roll it back?
Most agent projects answer these questions with ad-hoc glue code. That glue code becomes the real operating system for the agent, but it is rarely standardized, versioned, or portable.
AgentRL exists to make that systems layer explicit.
The goal is to make building, improving, and operating agent harnesses feel closer to using scikit-learn-style project primitives:
from agentrl import Project
project = Project("./my-agent-system")
project.compile()
project.train(strategy="verification")
project.evaluate()
project.auto_harness()
project.deploy()
The gap AgentRL fills
The ecosystem already has strong tools, but they solve different parts of the lifecycle:
- LangGraph helps build stateful agent orchestration graphs.
- TRL helps train models with reinforcement learning and preference methods.
- Ray helps scale distributed execution.
- Verifiers and RLVR-style systems help score verifiable tasks.
- Atropos-style systems help collect rollouts and trajectories.
- Repo2RLEnv/Harbor-style systems turn repositories into verifiable coding tasks.
- Evaluation frameworks help run benchmark suites.
AgentRL is the layer above those pieces:
tasks + harness + rewards + evaluation + traces + versions + deployment
It gives those pieces a common operating model so teams can move from experiments to repeatable harness evolution without locking into a single runtime or training method.
What AgentRL is and is not
AgentRL is a Harness Operating System.
It owns:
- Project layout
- Harness definitions
- Task and reward schemas
- Evaluation records
- Trajectory/trace observability
- Local version registry
- Self-evolution candidate management
- Local deployment records
- Adapter boundaries to external systems
It does not try to replace:
- agent runtimes
- graph orchestration frameworks
- RL training libraries
- distributed compute systems
- repository-to-task synthesis pipelines
- hosted experiment platforms
Those are integrations or backends. AgentRL standardizes the harness lifecycle around them.
Differentiation from existing tools
AgentRL vs LangGraph
LangGraph is for orchestrating stateful agent workflows.
AgentRL is for defining, evaluating, evolving, versioning, and deploying the harnesses those workflows run inside.
A LangGraph app can be wrapped by an AgentRL harness. AgentRL should not become a competing graph runtime.
LangGraph: how should the agent transition between steps?
AgentRL: how is this behavior evaluated, improved, versioned, and deployed?
AgentRL vs TRL/RL libraries
TRL and similar libraries help train models.
AgentRL starts before and around training: task schemas, reward specs, evaluations, traces, versioning, and deployment. Training is one possible optimization backend, not the identity of the project.
TRL: optimize model weights.
AgentRL: operate harnesses; use training only when cheaper optimizations are insufficient.
AgentRL’s preferred improvement order is intentionally practical:
prompts → skills → memory policies → routing → tools → fine-tuning → RL
AgentRL vs eval frameworks
Eval frameworks usually run tests and produce scores.
AgentRL includes evaluation, but connects it to harness compilation, candidate evolution, trace replay, version registry, deployment preflight checks, and rollback.
Eval framework: did this system pass a test?
AgentRL: should this harness change be promoted and deployed?
AgentRL vs Repo2RLEnv / Harbor
Repo2RLEnv/Harbor-style systems generate verifiable coding tasks from repositories.
AgentRL does not reimplement that synthesis. It imports those tasks into CodingHarness using Repo2RLEnvAdapter, preserving provenance, content hashes, sandbox metadata, and executable verification rewards.
Repo2RLEnv: repo → verifiable coding tasks
AgentRL: tasks + harness → evaluate, optimize, version, deploy
If Repo2RLEnv output is unavailable or invalid, AgentRL records provenance errors and imports no fabricated passing tasks.
AgentRL vs hosted experiment platforms
AgentRL is local-first. The MVP works without a hosted service:
- harness compilation
- local registry
- local traces
- local evaluation
- local self-evolution candidates
- local deployment records
Hosted registries, managed evals, GPU training, or enterprise governance can exist later as optional services, not prerequisites.
Research and library lineage
AgentRL is designed to sit on top of, or interoperate with, research and libraries such as:
- RLVR / execution rewards for verifiable tasks
- DPO and preference learning for subjective tasks
- GEPA-style reflective prompt evolution
- SkillOpt-style skill evolution
- learned reward models and LLM judges
- TRL for training backends
- Ray for distributed execution
- LangGraph for orchestration backends
- Verifiers for executable reward environments
- Atropos-style trajectory collection
- Repo2RLEnv / Harbor for repo-derived coding tasks
These remain implementation details. AgentRL’s public abstraction stays centered on Project and Harness.
Install
pip install agentrl-os
For local development:
git clone https://github.com/junaidahmed361/agentrl.git
cd agentrl
uv sync --extra dev
uv run pytest -q
Quick start
agentrl init my-agent-system
cd my-agent-system
agentrl compile
agentrl train
agentrl evaluate
agentrl deploy
Example demo
For a local Hermes-style agent replication demo, see:
examples/local-hermes-agent-os.md
The demo shows how AgentRL can represent a local agent OS with router, coding, RAG, tool-use, memory, skills, registry, traces, and local deployment while keeping Hermes-style execution as a harness capability rather than a competing runtime.
Python API
from agentrl import Project
project = Project("./my-agent-system")
project.compile()
project.train(strategy="verification")
project.evaluate()
project.auto_harness()
project.deploy()
Core operating model
Project
├── Harnesses
│ ├── Tasks
│ ├── Rewards
│ ├── Evaluations
│ ├── Policies
│ └── Goal Workflows
├── Memory
├── Skills
├── Version Registry
├── Observability
└── Deployment
Public top-level concepts stay intentionally small:
- Project
- Harness
- Memory
- Skills
- Version Registry
- Observability
- Deployment
- Goal Workflows
- Auto-Harness
Advanced methods such as RLVR, DPO, GEPA, SkillOpt, TRL, Ray, LangGraph, Verifiers, and Atropos are adapters or backend implementation details.
Built-in harnesses
coding: verifiable coding tasks using filesystem/terminal evidencerag: retrieval-grounded question answering with citation/hallucination reward dimensionstool_use: safe tool-selection and tool-call evaluation
Repo2RLEnv adapter
from agentrl import Project
from agentrl.adapters import Repo2RLEnvAdapter
project = Project.init("./coding-agent")
source = Repo2RLEnvAdapter.from_repo(
repo="pallets/click",
pipeline="pr_runtime",
limit=10,
)
project.harness("coding").add_tasks(source.to_taskset())
project.compile()
project.train(strategy="verification")
project.evaluate()
The adapter maps Repo2RLEnv/Harbor-style metadata into AgentRL TaskSet objects and attaches executable verification rewards to the coding harness.
CLI
agentrl --version
agentrl init my-project
agentrl compile
agentrl train --strategy verification
agentrl evaluate
agentrl evolve --targets prompts,skills,memory
agentrl auto-harness --mode static
agentrl run-goal "Fix the failing login test."
agentrl deploy
agentrl version list
agentrl version diff <left-version-id> <right-version-id>
agentrl version rollback <version-id>
Local-first artifacts
AgentRL stores project-local state under .agentrl/:
.agentrl/
├── compiled/ # compiled harness specs
├── registry/ # local version registry artifacts
├── traces/ # JSONL evaluation traces
├── candidates/ # promoted self-evolution candidates
├── rejected/ # rejected candidates
└── deployments/local/ # local deployment records
MVP features
- Project abstraction
- Harness compilation
- TaskSet, RewardSpec, EvaluationResult schemas
- Local version registry with list/diff/rollback
- Built-in coding, RAG, and tool-use harnesses
- Repo2RLEnvAdapter for Harbor-style coding tasks
- Evaluation engine with JSONL traces
- Basic self-evolution and auto-harness candidate promotion/archive
- Local deployment artifacts with evaluation preflight gating
- Typed Python package and console script
Development
uv sync --extra dev
uv run pytest -q
uv run python -m build
uv run twine check dist/*
License
Apache-2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentrl_os-0.1.0.tar.gz.
File metadata
- Download URL: agentrl_os-0.1.0.tar.gz
- Upload date:
- Size: 20.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ed10354d1f2080589f2deb30bbf462aec7dcdc77a857454c10e1f7d3a781203
|
|
| MD5 |
71b4132dbf24888961f40643559c80b3
|
|
| BLAKE2b-256 |
7b03f57dfa5c748a0ec378882598d4ca94302be3862ef52da6c3e6de5caf95e3
|
Provenance
The following attestation bundles were made for agentrl_os-0.1.0.tar.gz:
Publisher:
package.yml on junaidahmed361/agentrl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentrl_os-0.1.0.tar.gz -
Subject digest:
5ed10354d1f2080589f2deb30bbf462aec7dcdc77a857454c10e1f7d3a781203 - Sigstore transparency entry: 1735557322
- Sigstore integration time:
-
Permalink:
junaidahmed361/agentrl@b4d60347418856d3c594d0e6a108e46a118237c2 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/junaidahmed361
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
package.yml@b4d60347418856d3c594d0e6a108e46a118237c2 -
Trigger Event:
release
-
Statement type:
File details
Details for the file agentrl_os-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agentrl_os-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
436ec12aa0195831406041b6372dd7195e3ed41dc5af882745949c571f19adc3
|
|
| MD5 |
21b8ef59b43866d8deb2de43671f10fb
|
|
| BLAKE2b-256 |
0d8ae2b203cdc07651030e74d834e8b3b93459c910c5b1bcf51f7ff517b11b03
|
Provenance
The following attestation bundles were made for agentrl_os-0.1.0-py3-none-any.whl:
Publisher:
package.yml on junaidahmed361/agentrl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentrl_os-0.1.0-py3-none-any.whl -
Subject digest:
436ec12aa0195831406041b6372dd7195e3ed41dc5af882745949c571f19adc3 - Sigstore transparency entry: 1735557346
- Sigstore integration time:
-
Permalink:
junaidahmed361/agentrl@b4d60347418856d3c594d0e6a108e46a118237c2 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/junaidahmed361
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
package.yml@b4d60347418856d3c594d0e6a108e46a118237c2 -
Trigger Event:
release
-
Statement type: