Agent execution harness — wraps LLMs in structured, inspectable workflow specs

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Armature

A lightweight, declarative agent execution harness. Define multi-agent workflows as YAML specs. Run them with a single Python call or from the CLI.

Armature demo — pip install, a YAML spec, and a real workflow run

No framework dependency. No prescribed team structure. Just a DAG executor, an LLM adapter, and your workflow spec.

Armature is the execution engine for Reasoning Automation — end-to-end business processes where multi-agent deliberation replaces brittle rule-based logic. The harness owns orchestration, retries, safety, telemetry, and human approval gates. You supply the domain logic as YAML workflow specs and Python tool modules. The same engine that runs a code-review pipeline can run a contract risk assessment, a social media creative chain, or a compliance audit — without any changes to Armature itself.

For more information, docs, and examples, visit armature.now.

What it does

Armature reads a YAML spec that defines a workflow as a directed acyclic graph (DAG) of stages. Each stage is one of four things:

An LLM call — a role with a system prompt, model tier, and output format
A script/adapter — a Python function or shell command
A human gate — pauses execution for human approval
A direct tool call — invokes a registered tool deterministically, no LLM involved
A subagent — spawns a child workflow (with optional fan-out/fan-in for parallelism)

Stages declare depends_on relationships. The engine resolves execution order automatically, passes accumulated results downstream as context, and handles retries, safety hooks, and telemetry.

Installation

pip install armature-agents

With optional extras:

pip install "armature-agents[service]"   # FastAPI HTTP service (armature serve)
pip install "armature-agents[telemetry]" # OpenTelemetry export
pip install "armature-agents[wizard]"    # interactive spec wizard (armature new)

[service] adds FastAPI and uvicorn, needed only if you run armature serve to expose workflows as an HTTP API. The core armature run command works without it.

[telemetry] adds the OpenTelemetry SDK for span export to OTLP backends (Jaeger, Honeycomb, etc.). Without it, armature.telemetry degrades silently to no-ops — traces are written to the local SQLite store regardless.

[wizard] adds questionary for the interactive armature new spec-creation wizard. Without it, the other commands work normally and armature new will tell you to install the extra.

Verify:

armature --version

Set your LLM provider key:

export ANTHROPIC_API_KEY=sk-...
# or OPENAI_API_KEY, OPENROUTER_API_KEY, or any litellm-supported provider

No API key? Armature runs against local models via Ollama — no key, no cloud, nothing leaves your machine. Pull a model, save the spec below as hello_ollama.yml, and run it:

ollama pull llama3.2        # any model from `ollama list` works

# hello_ollama.yml
name: hello_world_local
version: "1.0"

model_tiers:
  local:
    provider: ollama
    model: llama3.2

stages:
  - id: explainer
    role:
      name: Explainer
      type: worker
      model_tier: local
      description: |
        Explain the following topic clearly and concisely in 3-5 sentences.
    output_mode: text
    depends_on: []

armature run hello_ollama.yml --input topic="what a DAG is"

This same spec ships as examples/00_hello_ollama.yml once you clone the repo.

Quick start

1. Write a spec (my_workflow.yml):

name: summarize
version: "1.0"

model_tiers:
  small:
    provider: anthropic
    model: claude-haiku-4-5-20251001

# Optional: map role types to tiers so stages don't need explicit model_tier
role_type_defaults:
  worker: small
  judge: small

stages:
  - id: summarizer
    role:
      name: Summarizer
      type: worker        # picks up "small" from role_type_defaults
      description: |
        Summarize the provided text in 3 bullet points.
        Be concise and capture the key ideas.
    output_mode: text
    depends_on: []

2. Run it from Python:

import asyncio
from armature import Harness

async def main():
    harness = Harness.from_spec("my_workflow.yml")
    result = await harness.run({"text": "Your content here..."})
    print(result["summarizer"]["content"])

asyncio.run(main())

3. Or from the CLI:

armature run my_workflow.yml --input text="Your content here..."

This spec uses Anthropic (ANTHROPIC_API_KEY). To run on OpenAI or a local model instead, change the model_tiers block — see No API key? above for the Ollama version.

CLI

armature run <spec>                           # execute a workflow
armature run <spec> --no-cache               # run without LLM response cache
armature run <spec> --auto-improve           # run then auto-apply spec improvements when HQS < 0.75
armature loop <spec>                         # run a workflow back-to-back under a central budget
armature loop <spec> --until "{{ judge.done }}" --max-llm-calls 500
armature validate <spec>                      # validate spec + show KYA-inspired risk score (LOW/MEDIUM/HIGH/CRITICAL)
armature new [output]                         # interactive spec creation wizard
armature doctor                               # environment health check
armature serve                                # start HTTP service (requires armature[service])
armature serve --specs-dir ./specs/          # serve with named workflow registry (/workflows API)
armature optimize <spec>                      # single-shot meta-harness optimizer
armature improve <spec>                       # analyze traces, propose + auto-apply a spec improvement
armature improve <spec> --no-apply            # propose only; review the diff before applying
armature report --run-id <id>                 # per-run text report with failure signatures
armature replay <run_id>                      # display a recorded run stage-by-stage
armature dashboard <spec>                     # Rich 4-panel aggregate health dashboard
armature dashboard <spec> --watch             # auto-refresh every 5 seconds
armature dashboard <spec> --format json       # machine-readable JSON output
armature export-traces                        # export traces as SFT/DPO training data
armature channels start                       # messaging channel connectors
armature watch <spec>                         # listen for cron/webhook triggers and fire runs

Built-in tools

Armature ships with a tool registry pre-loaded with the following tools. Any stage can invoke them via tool_call or by listing them in role.tools.

Tool name	Permission	Description
`file_read`	READ_ONLY	Read a file from disk
`file_write`	WORKSPACE	Write content to a file
`shell`	WORKSPACE	Run a shell command; returns stdout, stderr, exit_code
`http_get`	NETWORK	HTTP GET request; returns status and body
`http_post`	NETWORK	Authenticated HTTP POST with JSON body and custom headers; returns status and body

http_post is the general-purpose adapter for any external API — image generation, ad platforms, analytics services, webhooks, etc. Pass auth credentials in headers:

- id: generate_image
  tool_call:
    name: http_post
    args:
      url: "https://api.openai.com/v1/images/generations"
      headers:
        Authorization: "Bearer {{ env.OPENAI_API_KEY }}"
        Content-Type: "application/json"
      body:
        model: "dall-e-3"
        prompt: "{{ visual_prompt }}"
        size: "1024x1024"
        n: 1

Reasoning Automation

Armature's tools: spec section lets any workflow load external Python modules that register additional tools. This is the primary extension point for building Reasoning Automation applications — end-to-end processes that connect LLM reasoning to real external systems.

The pattern

Create a Python package alongside your workflows. Each module exposes a register(registry) function:

# myapp/tools/dalle.py
import openai
from armature.registry.registry import ToolRegistry, ToolDescriptor, PermissionLevel

_client = openai.AsyncOpenAI()

async def generate_image(args: dict) -> dict:
    response = await _client.images.generate(
        model="dall-e-3",
        prompt=args["prompt"],
        size=args.get("size", "1024x1024"),
        n=1,
    )
    return {"url": response.data[0].url, "revised_prompt": response.data[0].revised_prompt}

def register(registry: ToolRegistry) -> None:
    registry.register(ToolDescriptor(
        name="dalle.generate_image",
        description="Generate an image using DALL-E 3",
        permission=PermissionLevel.NETWORK,
        handler=generate_image,
        parameters={
            "prompt": {"type": "string"},
            "size":   {"type": "string", "optional": True},
        },
    ))

Declare it in your workflow spec:

tools:
  - module: myapp.tools.dalle
  - module: myapp.tools.meta_publisher
  - module: myapp.tools.analytics

stages:
  - id: generate_image
    tool_call:
      name: dalle.generate_image
      args:
        prompt: "{{ visual_director.prompt_a }}"

The tool modules live entirely in your application project. Armature imports them at startup. No changes to Armature are required.

What you can build

Use case	Tool modules needed
Social ad campaign automation	Image gen (DALL-E 3), platform publishers (Meta, TikTok), analytics collectors
Contract risk review	Document extractor, clause classifier, risk scorer
Vendor assessment	Web search, company lookup, scoring rubric
Compliance documentation	Regulatory corpus retrieval, template filler, diff checker
Code review pipeline	GitHub API, static analysis runner, security scanner

Each use case is a YAML workflow spec + a small set of Python tool modules. The Armature engine is the shared execution layer across all of them.

Research foundation

Armature is built from eleven academic papers, one industry governance framework, and one open-source agent architecture project — all but one published this year (each paper's date is listed below). Every major design decision traces to an experimentally validated finding: the harness matters more than the model.

The papers

[NLAH] Natural-Language Agent Harnesses — Tsinghua University, March 2026 (arXiv:2603.25723)

Establishes the architectural model. NLAH defines six core harness components (contracts, roles, stage structure, adapters/scripts, state semantics, and a failure taxonomy) and shows that workflows defined in structured natural language outperform code-based equivalents on complex benchmark tasks (47.2 vs. 30.4 on OSWorld). It also specifies parallel fan-out as a core orchestration primitive.

[Meta-Harness] Automated Optimization End-to-End — Stanford University, March 2026 (arXiv:2603.28052)

The paper behind the optimizer. Meta-Harness introduces an outer optimization loop where a frontier model reads execution traces and proposes improvements to the harness spec itself. Key finding: giving the optimizer access to full execution traces — not just pass/fail scores — raises best-run accuracy from 41.3% to 56.7% by enabling causal reasoning about why runs failed. Armature keeps a ProposalStore of prior proposals and re-runs the loop via run_loop().

[AutoHarness] LLM-Synthesized Harnesses — February 2026 (arXiv:2603.03329)

Demonstrates that LLMs can iteratively write their own harness code and produce systems that outperform larger models without harnesses. The concept most directly applied: the harness-as-verifier, where the harness validates outputs meet domain-specific legality constraints before accepting them — the ancestor of the judge role type and SpecDrafter.

[AgentSpec] Runtime Enforcement for Safe Agents — March 2025 (arXiv:2503.18666)

Introduces a declarative rule language for constraining agent behavior at runtime. Rules are composable, lightweight (millisecond-scale evaluation), and LLM-generatable. Armature implements the full enforcement architecture: pre/post-tool hooks wired into the engine and a declarative condition DSL (ToolSafetyRule + SafetyCondition) written directly in YAML.

[Continual Harness] Reset-Free Self-Improvement — May 2026 (arXiv:2605.09998)

Formalizes the two-loop self-improvement design: an inner loop (a post_run refiner stage that sees the full transcript after the DAG completes) and an outer loop (SelfImproveRunner — load traces → diagnose → propose YAML revision → auto-apply). Its emphasis on recurring failure signatures informs Armature's own diagnostic taxonomy (stage_failed, output_invalid, low_confidence, high_escalation, low_skill_activation), and its fine-tuning bridge — high-quality judge traces exported as SFT/DPO training data — is implemented directly.

[AHE] Agentic Harness Engineering — April 2026 (arXiv:2604.25850)

The accountability paper. AHE introduces the prediction-verification loop: every proposed spec revision carries a falsifiable contract (predicted_fixes, predicted_regressions), and the next cycle verifies those predictions against observed diagnostic shift. Implements component-level improvement targeting — long-term memory evolution alone yielded +5.6pp; system prompt evolution alone caused -2.3pp regression, validating the "one component at a time" discipline.

[System Scaling] From Model Scaling to System Scaling — May 2026 (arXiv:2605.26112)

Identifies three system-level failure modes — which it terms "exposure without access," "stale-but-confident," and "confident-but-unchecked": memory that is present but unreachable, aging memory trusted without warning, and tool side effects assumed rather than verified. Armature answers with staleness penalties, context provenance tracking, post-condition verification, its own drift score (regression detection across improvement cycles), and component governance (auto-apply vs. human-review classification for spec changes).

[AGT] Microsoft Agent Governance Toolkit — 2025

Five governance primitives borrowed directly: reversibility classification for every tool call (FULL / PARTIAL / NONE), tamper-evident SHA-256 hashing of trace inputs and the governing policy, a require_approval gate wired into the tool-call path, and safety_mode: strict (fail-closed — deny on no-match).

[The Log is the Agent] — Yohei Nakajima, May 2026 (arXiv:2605.21997)

Event-sourced, graph-memory agent architecture with content-addressed caching of LLM responses and event-triggered reactive behaviors. Adopted concepts: SHA-256 cache keying by model + messages + kwargs (LLMCache), audit replay from the trace store (armature replay), and the BehaviorRule/BehaviorRegistry hook layer for pattern-triggered post-run behaviors.

[KYA] Know Your Agents — Veldt Labs, May 2026 (arXiv:2605.25376)

Governance layer operating at definition-time (static risk scoring), runtime-trust (anomaly counting), and composition (only-tighten). Adopted: five-factor static spec risk score surfaced by armature validate, RogueSignalCounter wired into safety hooks and the run summary, and CONFLICTING_SAFETY_RULES validation enforcing the only-tighten composition principle.

[Skill-to-LoRA] From Using Skills to Learning Behaviors for Token-Efficient LLM Agents — The Chinese University of Hong Kong, June 2026 (arXiv:2606.16769)

The paper behind LoRA adapter skills. S2L shows that agent skills expressed as SKILL.md procedural documents can be distilled into lightweight, task-specific LoRA adapters, then plugged in at runtime instead of injecting the full skill text into the prompt. Armature's skill_library.adapter references, adapter_support: dynamic tiers, and the pluggable adapter factory implement this pattern directly: skill text is omitted when the adapter loads, cutting prefill tokens while preserving behavior. The s2l backend trains adapters from skill documents; the trace backend trains from exported high-quality traces.

[C-LoRA] Continual Low-Rank Adaptation for Pre-trained Models — Shanxi University / University of Manchester, February 2025 (arXiv:2502.17920)

The paper behind continual adapter learning. C-LoRA keeps a single LoRA adapter and updates it sequentially from new tasks by splitting the routing matrix into a frozen R_old (preserving prior knowledge) and a trainable near-zero R_delta for the new task, regularized by λ ||A^T · R_delta||_F². This maps directly to Armature's versioned adapter registry: new skill/trace batches can update the prior adapter instead of training a fresh one or keeping a growing set of per-version adapters. adapter_factory.continual_learning exposes the C-LoRA settings; adapter_factory.use_dora enables DoRA for richer adapter representations.

What's implemented

Source	Concept	Status
NLAH	Declarative NL spec, four role types, fan-out/fan-in	✅
Meta-Harness	Single-shot + multi-iteration optimizer, proposal history, prompt bootstrapping	✅
AutoHarness	Harness-as-verifier, NL-to-spec synthesis (`SpecDrafter`), `AutoHarness` loop	✅
AgentSpec	Pre/post-tool hooks, declarative safety DSL (6 operators, 5 actions)	✅
Continual Harness	Diagnostic failure taxonomy, inner refiner loop, `SelfImproveRunner`, `TraceExporter`	✅
Harness Benefit (arXiv:2605.30621v1)	Cheap-evolver (medium-tier `SpecRefiner`), HFR as 5th HQS component, SLR `low_skill_activation` diagnostic	✅
AHE	Falsifiable improvement contract, prediction-verification, `_verify_predictions()`	✅
System Scaling	Memory staleness, context provenance, drift score, postcondition verification, consensus fan-in, component governance	✅
AGT	Reversibility classification, trace hashing, policy version, `require_approval`, strict mode	✅
The Log is the Agent	LLM response caching, audit replay, trace-triggered behaviors (`BehaviorRule`), `--auto-improve`	✅
KYA	Static spec risk score, rogue signal counter, only-tighten safety rule validation	✅
Skill-to-LoRA (arXiv:2606.16769)	LoRA adapter skills: `skill_library.adapter`, `adapter_support: dynamic`, adapter factory	✅
C-LoRA (arXiv:2502.17920)	Continual adapter updates via `adapter_factory.continual_learning`	✅
DoRA	`adapter_factory.use_dora` — Weight-Decomposed Low-Rank Adaptation	✅

The self-improvement flywheel

Armature is the execution layer — the first component in a larger system designed to improve itself the more it runs. The chart below shows where the current implementation stands and where the flywheel leads aspirationally.

  TODAY                         NEAR-TERM                    ASPIRATIONAL
  ─────────────────────────────────────────────────────────────────────────

  ┌──────────────────┐
  │  Armature        │  ─── every run records ──►  ┌─────────────────────┐
  │  Harness         │                              │  TraceStore         │
  │                  │  ◄── optimizer proposes ───  │  (SQLite, per run)  │
  │  • DAG executor  │        spec improvements     └──────────┬──────────┘
  │  • Role routing  │                                         │
  │  • Safety hooks  │                              ┌──────────▼──────────┐
  │  • HQS scoring   │                              │  Loop 1:            │
  │  • Session log   │                              │  Harness Optimizer  │
  └──────────────────┘                              │                     │
                                                    │  Reads traces +     │
                                                    │  proposal history   │
                                                    │  → proposes YAML    │
                                                    │  spec improvements  │
                                                    │  → A/B tests by HQS │
                                                    └──────────┬──────────┘
                                                               │ accepted diffs
                                                    ┌──────────▼──────────┐
                                                    │  Loop 2:            │
                                                    │  SLM Fine-Tuning    │
                                                    │                     │
                                                    │  High-quality       │
                                                    │  traces → LoRA      │
                                                    │  fine-tune workers  │
                                                    │  → register as      │
                                                    │  new model tier     │
                                                    └──────────┬──────────┘
                                                               │ better workers
                                                    ┌──────────▼──────────┐
                                                    │  Loop 3:            │
                                                    │  RAG                │
                                                    │                     │
                                                    │  Trace failures     │
                                                    │  reveal knowledge   │
                                                    │  gaps → improve     │
                                                    │  retrieval index    │
                                                    └──────────┬──────────┘
                                                               │ richer context
                                                    ┌──────────▼──────────┐
                                                    │  Loop 4:            │
                                                    │  Consensus          │
                                                    │  deliberation       │
                                                    │                     │
                                                    │  Calibrate          │
                                                    │  deliberation       │
                                                    │  priors from        │
                                                    │  outcomes →         │
                                                    │  cleaner quality    │
                                                    │  signal back to     │
                                                    │  Loop 1             │
                                                    └─────────────────────┘

  ─────────────────────────────────────────────────────────────────────────
  All four loops are implemented. 1,516 tests passing.

The compounding property: Each loop feeds the next. Better traces → better optimizer proposals → better specs → better traces. Fine-tuned worker models produce better outputs → fewer judge rejections → cleaner quality signal. The harness measurably improves the more it runs, without engineering effort after initial deployment.

Key concepts

Concept	Description
Spec	YAML file defining the complete workflow — model tiers, stages, safety rules, memory
Stage	One unit of work: an LLM call, script, gate, direct tool call, or subagent
DAG	Stages declare `depends_on`; the engine resolves execution order
Context	Shared dict that accumulates stage outputs; every stage sees all upstream results
Model tiers	Named model slots (`tiny`, `small`, `medium`, `large`, `frontier`); the using app defines what each name maps to (provider, model, temperature, max_tokens)
Role type defaults	Maps role types to tiers automatically (`worker → small`, `judge → frontier`, etc.); stages can omit `model_tier` and inherit from this mapping
Native tool calling	Stages declare `role.tools` to scope which registry tools they can call; the engine runs a ReAct dispatch loop — tool calls returned by the model are executed and results fed back until a final response is produced
Direct tool call	A `tool_call` stage invokes a registered tool without an LLM — deterministic, zero-latency, no JSON hallucination. Args are Jinja2-rendered against context.
Mission context	A `mission:` field on the spec is automatically injected into every LLM stage's system prompt, anchoring agents to the stated goal across long-running workflows and including a compact prior-stage breadcrumb
Continuation	A `continuation:` block carries selected stage outputs from a prior run into the next activation via `carry_forward` key references; the merged values arrive under an `inject_as` context key (default: `prior_run`). Enables long-horizon workflows that accumulate state across repeated executions without custom code.
Iteration loops	`loop` on any stage for deliberate iteration with `until` conditions, selective `carry_forward`, and per-iteration `_iteration` context
Triggers	A `triggers:` list declares `cron` (schedule expression) and `webhook` (HTTP path) trigger sources. `armature watch <spec>` runs a persistent dispatcher that fires `Harness.run()` on every matching event.
Response stage	Mark one text-mode LLM stage as `response_stage: true` to enable token streaming; the HTTP service forwards each token to the SSE stream immediately and fires a `response_stage_complete` event so clients can render the answer before background stages finish
Context filtering	A stage's `signature.input` declares which context keys appear in its prompt — keeps prompts focused, hides internal state from irrelevant stages
Cross-run memory	The `memory:` spec section captures stage outputs across runs and injects them into subsequent runs — lets workflows accumulate knowledge without code changes
HQS	Harness Quality Score — Armature's own 5-component quality score: output validity (35%), success rate (25%), quorum score (20%), latency (10%), harness-following rate / HFR (10%). HFR = fraction of stages that succeed without escalation, a metric adapted from arXiv:2605.30621v1
Sandbox isolation	`sandbox.mode: docker` routes shell, file_write, and file_read tool calls through ephemeral Docker containers — network-isolated, CPU/memory bounded, workspace-scoped. Per-stage image overrides with `sandbox_image`. Image content digest recorded on every trace for audit.
LoRA adapter skills	`skill_library` entries can reference a registered LoRA adapter via `skill.adapter`. On tiers with `adapter_support: dynamic`, the adapter replaces the skill text at runtime; on `none` tiers the skill text is used or the configured `fallback` policy is applied. Developed from the Skill-to-LoRA paper (arXiv:2606.16769); continual updates follow C-LoRA (arXiv:2502.17920)
Templates	Pre-built spec files for common patterns (Six Thinking Hats deliberation, etc.)

Examples

examples/ — annotated workflows you can copy and modify:

Most examples run on OpenRouter open models (Qwen / Gemini) — set OPENROUTER_API_KEY to run them as-is, or edit the model_tiers block to point at any provider you prefer. 00_hello_ollama.yml needs no key at all (local Ollama); starter_template.yml uses Anthropic.

File	What it demonstrates
`00_hello_ollama.yml`	Zero-API-key quickstart — a single stage on a local Ollama model
`01_hello_world.yml`	Minimal single-stage LLM workflow
`02_research_pipeline.yml`	Sequential pipeline (researcher → writer → critic) with a human approval gate
`03_deliberation_standard.yml`	Three-round deliberation — specialist analysts, a challenger, and a synthesizer
`04_fan_out.yml`	Parallel fan-out / fan-in to a single synthesizer
`05_enterprise_slm_tiers.yml`	Multi-tier cost pattern — local SLM workers with a frontier judge
`06_human_in_the_loop.yml`	Confidence-gated human escalation (HITL)
`07_lora_adapter.yml`	LoRA adapter skills — replace skill text with a fine-tuned adapter at runtime
`11_iterative_refinement.yml`	Deliberate iteration with `loop:` and an `until:` stop condition
`starter_template.yml`	Full-featured reference — every section documented inline, showing model tiers, context filtering, cross-run memory, safety rules, guided JSON, and a human gate

Templates

Ready-to-use deliberation patterns in armature/templates/:

Template	Pattern
`six_thinking_hats.yml`	Edward de Bono's Six Thinking Hats — structured multi-perspective deliberation

Built with Armature

Open-source applications built on Armature — reference implementations you can clone, run, and adapt:

Project	What it does	Key Armature features
Research	Given any topic, plans search queries, fetches web/Reddit/YouTube sources in parallel via a looping child workflow, and produces a structured Markdown research briefing. Supports iterative deepening across multiple runs.	`subagent_spec` (child workflow loop), multiple fan-out stages, tool call stages, continuation, checkpoint, strict safety mode
Argus	Scans a code repository for security vulnerabilities and ISO/IEC 25010 quality issues, analyzing up to 40 source files in parallel and producing prioritized hardening and improvement reports.	Fan-out (up to 40 parallel file scans), `skip_if`, continuation, checkpoint, two independent workflow types
Sentinel	Weekly Python dependency digest — scans a project manifest, fetches PyPI data for all dependencies in parallel, classifies each update by severity (security / breaking / feature / patch), and writes a prioritized Markdown report.	Fan-out/fan-in, strict safety mode, continuation, `skip_if`, direct tool call stages

Each repo is a self-contained reference implementation: a YAML workflow spec, Python tool modules, and a CLI runner. Use them as starting points for your own Armature-based applications.

Project layout

armature/
├── nodes/          # Stage executors (LLMNode, ScriptNode, HumanGateNode, SubagentNode)
├── registry/       # Tool registry, built-in tools, ToolDescriptor, reversibility
├── runtime/        # DAG executor, engine, prompt assembler, context manager
├── spec/           # YAML loader, Pydantic models (HarnessSpec, Stage, SandboxConfig, ...)
├── hooks/          # Lifecycle hooks, safety rule evaluation, PostconditionFailed
├── permissions/    # PermissionLevel, PermissionChecker
├── optimizer/      # Meta-Harness: trace-driven spec optimization, ProposalStore
├── synthesis/      # SelfImproveRunner, SpecRefiner, DiagnosticAnalyzer, TraceExporter
├── state/          # TraceStore, MemoryStore, SessionLog, ArtifactStore (SQLite + JSONL)
├── report/         # Rich dashboard, sparkline, aggregator, panels
├── sandbox/        # DockerSandboxProvider — shell/file tool sandboxing
├── emitters/       # HermesEmitter — agent bundle generation
├── adapters/       # Observability adapters (LangFuse, LangSmith)
├── templates/      # Reusable workflow spec templates
├── service/        # FastAPI HTTP service — WorkflowRegistry, build_app(), /workflows API
└── cli.py          # CLI entry point

examples/           # Annotated workflow YAML specs (copy and modify)
docs/               # Full documentation (see index below)

Documentation

Getting started

Document	Purpose
BUILD_FIRST_WORKFLOW	Hands-on tutorial — build a working workflow from scratch
USER-GUIDE	Full spec reference — every field, every option, worked examples
ARMATURE-SPEC-REF	All spec fields and valid values on one page
ADAPTER-POWERED-TEAMS	How to set up, train, and continually refresh LoRA adapter-backed skills
FAQ	Common questions — positioning, capabilities, comparisons

Design & philosophy

Document	Purpose
ARCHITECTURE	Design rationale, research foundation, implementation table
ARMATURE-PHILOSOPHY	Why a harness — philosophy, research papers, architecture deep-dive
DECLARATIVE-CONTROL-FLOW	YAML-first control flow — branching, loops, conditions
DAG-vs-LANGGRAPH	How Armature's DAG model compares to LangGraph
MISSION-AS-CONTEXT	Mission statements as persistent agent context
ROLE-TAXONOMY	Agent role definitions and the role system
MODEL-TIERS	Routing work across SLM workers and frontier orchestrators

Patterns & features

Document	Purpose
JUDGE-PATTERN	Output validation with judge agents
QUORUM-SCORING	Deliberative quality scoring across agents
FAN-IN_FAN-OUT	Parallel fan-out and aggregation patterns
SUBAGENT-COMPOSITION	Composing workflows from subagent stages
CONTEXT-ISOLATION	Isolating subagent context for focus and safety
MEMORY-AND-CONTEXT	Memory persistence and context management
CHECKPOINT-AND-RESUME	Execution state persistence and resumption
CHATBOT-AND-STREAMING	Chat applications and streaming responses
HUMAN-IN-THE-LOOP	Approval gates and human decision points
HQS-AND-SELF-IMPROVEMENT	The HQS formula and self-improvement loop

Operations & safety

Document	Purpose
ARMATURE-IN-PRODUCTION	Running Armature in production — patterns and case studies
SAFETY-AND-GOVERNANCE	Safety rules, governance, and guardrails
SANDBOX-AND-ISOLATION	Sandboxed tool execution (Docker isolation)
INTEGRATION	LangGraph sidecar pattern, HTTP endpoint reference

Project

Document	Purpose
CONTRIBUTING	How to run tests, PR conventions, adding tools and commands
CHANGELOG	Release history
ROADMAP	Where Armature is headed
SECURITY	Reporting vulnerabilities

Learn more: full docs, examples, and the story behind Armature live at armature.now.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

bryan-elftech

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.5.0

Jun 30, 2026

0.4.0

Jun 29, 2026

0.3.6

Jun 23, 2026

0.3.5

Jun 17, 2026

0.3.4

Jun 16, 2026

0.3.2

Jun 15, 2026

0.3.1

Jun 15, 2026

0.3.0

Jun 14, 2026

0.2.0

Jun 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

armature_agents-0.5.0.tar.gz (596.8 kB view details)

Uploaded Jun 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

armature_agents-0.5.0-py3-none-any.whl (206.6 kB view details)

Uploaded Jun 30, 2026 Python 3

File details

Details for the file armature_agents-0.5.0.tar.gz.

File metadata

Download URL: armature_agents-0.5.0.tar.gz
Upload date: Jun 30, 2026
Size: 596.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for armature_agents-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`ee1bfdb3d8b80e6797bd1d66f69dd27dbe4519e4c42feae0901c7fbf1b6b5f85`
MD5	`7536c09dadfc027476368a45ebc700cf`
BLAKE2b-256	`82ea1d48f4f8f0159facc82ccdf2d8e3ceb5a50ad0b35810a051dd61f50c2f27`

See more details on using hashes here.

Provenance

The following attestation bundles were made for armature_agents-0.5.0.tar.gz:

Publisher: publish.yml on bryansparks/armature

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: armature_agents-0.5.0.tar.gz
- Subject digest: ee1bfdb3d8b80e6797bd1d66f69dd27dbe4519e4c42feae0901c7fbf1b6b5f85
- Sigstore transparency entry: 2028210462
- Sigstore integration time: Jun 30, 2026
Source repository:
- Permalink: bryansparks/armature@037a03d7d0d48b58127e3b5cb4f223c67460b516
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/bryansparks
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@037a03d7d0d48b58127e3b5cb4f223c67460b516
- Trigger Event: push

File details

Details for the file armature_agents-0.5.0-py3-none-any.whl.

File metadata

Download URL: armature_agents-0.5.0-py3-none-any.whl
Upload date: Jun 30, 2026
Size: 206.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for armature_agents-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6346d7b16a8c8f7559abbce54a9d6058aabd7487eef4beb2ebf167dea04234aa`
MD5	`e0913c46025f7cbee10aae9244d7b4ba`
BLAKE2b-256	`f615782fb51aea8105896bf1a679d17a89ce4e79c743851b35456d6b98420d04`

See more details on using hashes here.

Provenance

The following attestation bundles were made for armature_agents-0.5.0-py3-none-any.whl:

Publisher: publish.yml on bryansparks/armature

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: armature_agents-0.5.0-py3-none-any.whl
- Subject digest: 6346d7b16a8c8f7559abbce54a9d6058aabd7487eef4beb2ebf167dea04234aa
- Sigstore transparency entry: 2028210587
- Sigstore integration time: Jun 30, 2026
Source repository:
- Permalink: bryansparks/armature@037a03d7d0d48b58127e3b5cb4f223c67460b516
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/bryansparks
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@037a03d7d0d48b58127e3b5cb4f223c67460b516
- Trigger Event: push

armature-agents 0.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Armature

What it does

Installation

Quick start

CLI

Built-in tools

Reasoning Automation

The pattern

What you can build

Research foundation

The papers

What's implemented

The self-improvement flywheel

Key concepts

Examples

Templates

Built with Armature

Project layout

Documentation

Getting started

Design & philosophy

Patterns & features

Operations & safety

Project

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance