Skip to main content

Reliability for LLM agents through enforcement, not model size โ€” Agent-Contract-Kernel (ACK) + a fail-closed orchestration engine.

Project description

Ironclad

CI License: Apache-2.0 Python Status Stars

Reliability for LLM agents through enforcement, not model size. ๐Ÿ‡ฆ๐Ÿ‡ช Built in the UAE by MJWC-AI-LAB.

Ironclad is a generic, model-agnostic framework for building reliable agentic systems. It pairs an Agent-Contract-Kernel (ACK) โ€” schema-as-single-source- of-truth, validateโ†’reaskโ†’retry, a generator and a preflight doctor โ€” with a lean orchestration engine that turns multi-step agent workflows into deterministic, fail-closed pipelines.

The guiding principle: a small fast model with hard schema/validation enforcement beats a large model you "trust" to format its output. You get production-grade tool-calling reliability without depending on any specific model or parser โ€” the kernel enforces the contract, not the weights.

๐Ÿšง Status: proven core, mid-redesign (pre-release)

Ironclad's engine comes from a proven, in-production orchestrator, but it is currently undergoing a complete redesign (single process โ†’ headless server + thin client, containerized, reasoning-worker fan-out, full-screen TUI). Not everything is re-wired or re-tested yet, and some features are still placeholders while the rebuild finishes (notably memory and autoplan). There is no tagged release; APIs, layout and config may change. Treat main as a development snapshot.

๐Ÿ‘‰ docs/status.md is the honest, per-component wiring status (proven / wired+tested / placeholder / opt-in), the module reference, the memory situation, and the latest load-test results. Read it before relying on anything.

Reference environment. Ironclad is developed and exercised on an NVIDIA DGX Spark (GB10, Blackwell sm_121, 128 GB unified memory) running a local vLLM server with Qwen3.6-35B-A3B-NVFP4. Nothing is hard-wired to that box โ€” any OpenAI-compatible endpoint works โ€” but the defaults (localhost:8000, qwen3.6-35b), the throughput numbers and the constrained-decoding findings (NVFP4, XGrammar on the CUDA 13 nightly) reflect that hardware. See docs/dgx-spark.md for the full reference stack and a one-shot bootstrap (scripts/spark-bootstrap.sh).

Why

  • Contract-first. One Pydantic schema drives the prompt, the validator, the docs, and (where the hardware allows) constrained decoding. No drift between what you ask for and what you check.
  • Fail-closed pipelines. Macro steps (e.g. task hand-off, advancement) do the mechanical file work deterministically in code, not in model turns โ€” fewer round-trips, no silent half-completions.
  • Model-agnostic. Swap the orchestrator model freely; reliability comes from the kernel, not the weights.
  • Standalone. No hidden dependency on any private deployment. Bring your own OpenAI-compatible endpoint (vLLM, etc.).

Demo

The full-screen client over the server/client split โ€” a turn streams live, the toolbar shows live status (model ยท throughput ยท tasks ยท watcher). Illustrative transcript from a real session:

[You] > what is 17 times 23?
  [Qwen (planning)]
  17 times 23 is 391.
  [perf] TTFT 0.5s ยท 183 tok/2.9s = 64 tok/s ยท prompt 1739
  ======== โœ“ DONE ยท ready ยท 1 gen ยท 3s ยท 183 tok ========
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
โ”‚ [You] >
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
 โ–ˆโ–ˆ Ironclad  powered by MJWC-AI-LAB
 โ–ˆโ–ˆ  Orchestrator client ยท streaming   |   /help ยท exit ยท PageUp=history
     qwen3.6-35b ยท 64 tok/s ยท tasks 0P/0IP/0D ยท http://<server>:8100

Reply language is a setting (GX10_LANGUAGE โ€” en default, ar, fr, โ€ฆ). The model answers in the configured language regardless of the input language. Real output with GX10_LANGUAGE=ar, same question:

[You] > what is 17 times 23?
  ุญุงุตู„ ุถุฑุจ 17 ููŠ 23 ู‡ูˆ 391.
  ======== โœ“ DONE ยท ready ยท 1 gen ยท 2s ========

/command routing (local + forwarded), scrollback (PageUp/PageDown) and compressed multi-line paste are built in. There's also a plain line REPL and a monolithic CLI.

Benchmarks

Measured on the reference stack โ€” a single NVIDIA DGX Spark (GB10) serving Qwen3.6-35B-A3B-NVFP4 via vLLM, driven over the LAN:

Workload Result
Reasoning fan-out, 8 independent prompts 5.8ร— faster than serial (1.2 s vs 7.1 s), ~118 tok/s aggregate
Conversational turn (single agent) ~55โ€“68 tok/s, ~2.1 s mean latency
Structured emission (ACK, thinking-off) 100% schema-valid in earlier measurement

Reproduce with your own endpoint; numbers scale with the model and GPU. Full method and the per-component wiring status live in docs/status.md.

A reliability layer, not another model

Ironclad doesn't compete with the open models โ€” it makes them dependable to build on. It's model-agnostic by design: reliability comes from the contract kernel (schema โ†’ validate โ†’ reask), not the weights. So it's a natural agent/reliability layer for regional open models like Falcon, Jais and K2 Think โ€” point it at any of them (running on other models) and get fail-closed pipelines, structured tool-calls and a thin local client, without forking or retraining anything.

A starting point to build on

Ironclad is a foundation, not a finished product โ€” a generic agentic core meant to be extended. Concrete use cases dock onto it as vessels (see examples/demo-vessel/; a generator scaffolds new ones), so a broad audience can build their own domain agents on a reliable, self-hosted base rather than starting from scratch. Realistic directions the architecture supports today:

  • Edge & energy efficiency โ€” a small, enforced model on local/edge hardware instead of a large cloud one. That efficiency bet is the whole premise of the project.
  • Education ยท healthcare ยท logistics โ€” build a vessel for your domain: reliable tool-using agents and retrieval/RAG assistants over your own data, kept on-prem.

The repo's job is to give you a working starting point, not to ship every vertical โ€” the verticals are yours to build.

Setup

Requires Python 3.10+ and an OpenAI-compatible endpoint (e.g. vLLM).

git clone https://github.com/GrokBuildMJW/ironclad.git
cd ironclad
python -m venv .venv && . .venv/bin/activate     # Windows: .venv\Scripts\Activate.ps1
pip install -e ".[engine]"                         # ACK + engine (openai, prompt_toolkit)

# Point at your model endpoint (defaults: http://localhost:8000/v1, qwen3.6-35b):
export GX10_BASE_URL=http://localhost:8000/v1
export GX10_MODEL=your-served-model-name
export GX10_API_KEY=...                             # only if your endpoint needs one

# Monolithic full-screen CLI (one process):
python engine/gx10.py --workdir ./my-workspace
  • Full walkthrough, the server/client split, and the reference vLLM launch: see SETUP.md โ€” including copy-paste shell shortcuts (so you can just type ironclad) for Windows PowerShell, macOS and Linux.
  • Let an AI coding agent set it up for you (deterministic, verifiable runbook): see AGENTS.md.

A runnable demo vessel lives in examples/demo-vessel/ โ€” a minimal, self-contained workspace showing a contract spec, a pipeline, and the doctor preflight. Real vessels stay in the operators' own private repos.

Layout

core/
  ack/                 # Agent-Contract-Kernel: case-spec, validated-emit, registry, doctor, generator
  engine/              # orchestration engine: agent loop, task store, fail-closed macros
  examples/demo-vessel # runnable example workspace
  LICENSE  NOTICE      # Apache-2.0

Roadmap

Honest near-term plan (the rebuild's placeholders are now wired โ€” see docs/status.md for the full per-component status):

  • Broaden test coverage and harden the new server/client paths.
  • One-command compose for model + orchestrator + optional memory ships now (docker compose --profile model --profile memory up).
  • First tagged release once the APIs settle.

Sovereign AI / local deployments. Ironclad is model-agnostic and fully self-hostable โ€” it talks to any OpenAI-compatible endpoint, so it already runs against locally-served open models (e.g. Falcon, Jais, K2 Think via vLLM โ€” see running on other models) with no cloud dependency and data kept on your own infrastructure. On the roadmap: verified config recipes for those models, retrieval/RAG over local datasets through the memory hook, and on-prem agent templates for enterprise/government use cases. These are integration directions the architecture already supports, not shipped features yet.

Issues and discussions are welcome โ€” this is an early, openly-developed project.

License

Apache License 2.0 โ€” see LICENSE and NOTICE. Copyright 2026 MJWC-AI-LAB and Ironclad contributors.


๐Ÿ‡ฆ๐Ÿ‡ช Built in the United Arab Emirates by MJWC-AI-LAB.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ironclad_ai-0.0.1.tar.gz (64.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ironclad_ai-0.0.1-py3-none-any.whl (70.1 kB view details)

Uploaded Python 3

File details

Details for the file ironclad_ai-0.0.1.tar.gz.

File metadata

  • Download URL: ironclad_ai-0.0.1.tar.gz
  • Upload date:
  • Size: 64.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ironclad_ai-0.0.1.tar.gz
Algorithm Hash digest
SHA256 a13fb75b366d10ab809f06f26f50aede63a5fdda2705e22b209544bd13b1388f
MD5 0920ce1b35a802983da6c48564027d97
BLAKE2b-256 c3a33c0320337b918fb6fa491c8066f90ccc7ccf044be64f25227437a5c02c86

See more details on using hashes here.

Provenance

The following attestation bundles were made for ironclad_ai-0.0.1.tar.gz:

Publisher: publish.yml on GrokBuildMJW/ironclad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ironclad_ai-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: ironclad_ai-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 70.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ironclad_ai-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0f5cc4327a5d782b4f618c1f1d885ac632253952c4df17c7fbecadf29702519f
MD5 5c1913749aaff4754c3288f02abb6c09
BLAKE2b-256 5b712614073fe0f9d645a9631316266fb6787578ce803f1ac5363a8963f0888c

See more details on using hashes here.

Provenance

The following attestation bundles were made for ironclad_ai-0.0.1-py3-none-any.whl:

Publisher: publish.yml on GrokBuildMJW/ironclad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page