Reliability for LLM agents through enforcement, not model size โ Agent-Contract-Kernel (ACK) + a fail-closed orchestration engine.
Project description
Ironclad
Reliability for LLM agents through enforcement, not model size. ๐ฆ๐ช Built in the UAE by MJWC-AI-LAB.
Ironclad is a generic, model-agnostic framework for building reliable agentic systems. It pairs an Agent-Contract-Kernel (ACK) โ schema-as-single-source- of-truth, validateโreaskโretry, a generator and a preflight doctor โ with a lean orchestration engine that turns multi-step agent workflows into deterministic, fail-closed pipelines.
The guiding principle: a small fast model with hard schema/validation enforcement beats a large model you "trust" to format its output. You get production-grade tool-calling reliability without depending on any specific model or parser โ the kernel enforces the contract, not the weights.
๐ง Status: proven core, mid-redesign (pre-release)
Ironclad's engine comes from a proven, in-production orchestrator, but it is
currently undergoing a complete redesign (single process โ headless server + thin
client, containerized, reasoning-worker fan-out, full-screen TUI). Not everything is
re-wired or re-tested yet, and some features are still placeholders while the
rebuild finishes (notably memory and autoplan). There is no tagged release; APIs,
layout and config may change. Treat main as a development snapshot.
๐ docs/status.md is the honest, per-component wiring status
(proven / wired+tested / placeholder / opt-in), the module reference, the memory
situation, and the latest load-test results. Read it before relying on anything.
Reference environment. Ironclad is developed and exercised on an NVIDIA DGX
Spark (GB10, Blackwell sm_121, 128 GB unified memory) running a local vLLM
server with Qwen3.6-35B-A3B-NVFP4. Nothing is hard-wired to that box โ any
OpenAI-compatible endpoint works โ but the defaults (localhost:8000,
qwen3.6-35b), the throughput numbers and the constrained-decoding findings (NVFP4,
XGrammar on the CUDA 13 nightly) reflect that hardware. See
docs/dgx-spark.md for the full reference stack and a one-shot
bootstrap (scripts/spark-bootstrap.sh).
Why
- Contract-first. One Pydantic schema drives the prompt, the validator, the docs, and (where the hardware allows) constrained decoding. No drift between what you ask for and what you check.
- Fail-closed pipelines. Macro steps (e.g. task hand-off, advancement) do the mechanical file work deterministically in code, not in model turns โ fewer round-trips, no silent half-completions.
- Model-agnostic. Swap the orchestrator model freely; reliability comes from the kernel, not the weights.
- Standalone. No hidden dependency on any private deployment. Bring your own OpenAI-compatible endpoint (vLLM, etc.).
Demo
The full-screen client over the server/client split โ a turn streams live, the toolbar shows live status (model ยท throughput ยท tasks ยท watcher). Illustrative transcript from a real session:
[You] > what is 17 times 23?
[Qwen (planning)]
17 times 23 is 391.
[perf] TTFT 0.5s ยท 183 tok/2.9s = 64 tok/s ยท prompt 1739
======== โ DONE ยท ready ยท 1 gen ยท 3s ยท 183 tok ========
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ [You] >
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโ Ironclad powered by MJWC-AI-LAB
โโ Orchestrator client ยท streaming | /help ยท exit ยท PageUp=history
qwen3.6-35b ยท 64 tok/s ยท tasks 0P/0IP/0D ยท http://<server>:8100
Reply language is a setting (GX10_LANGUAGE โ en default, ar, fr, โฆ). The
model answers in the configured language regardless of the input language. Real output
with GX10_LANGUAGE=ar, same question:
[You] > what is 17 times 23?
ุญุงุตู ุถุฑุจ 17 ูู 23 ูู 391.
======== โ DONE ยท ready ยท 1 gen ยท 2s ========
/command routing (local + forwarded), scrollback (PageUp/PageDown) and compressed
multi-line paste are built in. There's also a plain line REPL and a monolithic CLI.
Benchmarks
Measured on the reference stack โ a single NVIDIA DGX Spark (GB10) serving Qwen3.6-35B-A3B-NVFP4 via vLLM, driven over the LAN:
| Workload | Result |
|---|---|
| Reasoning fan-out, 8 independent prompts | 5.8ร faster than serial (1.2 s vs 7.1 s), ~118 tok/s aggregate |
| Conversational turn (single agent) | ~55โ68 tok/s, ~2.1 s mean latency |
| Structured emission (ACK, thinking-off) | 100% schema-valid in earlier measurement |
Reproduce with your own endpoint; numbers scale with the model and GPU. Full method
and the per-component wiring status live in docs/status.md.
A reliability layer, not another model
Ironclad doesn't compete with the open models โ it makes them dependable to build on. It's model-agnostic by design: reliability comes from the contract kernel (schema โ validate โ reask), not the weights. So it's a natural agent/reliability layer for regional open models like Falcon, Jais and K2 Think โ point it at any of them (running on other models) and get fail-closed pipelines, structured tool-calls and a thin local client, without forking or retraining anything.
A starting point to build on
Ironclad is a foundation, not a finished product โ a generic agentic core meant
to be extended. Concrete use cases dock onto it as vessels (see
examples/demo-vessel/; a generator scaffolds new ones), so a
broad audience can build their own domain agents on a reliable, self-hosted base
rather than starting from scratch. Realistic directions the architecture supports today:
- Edge & energy efficiency โ a small, enforced model on local/edge hardware instead of a large cloud one. That efficiency bet is the whole premise of the project.
- Education ยท healthcare ยท logistics โ build a vessel for your domain: reliable tool-using agents and retrieval/RAG assistants over your own data, kept on-prem.
The repo's job is to give you a working starting point, not to ship every vertical โ the verticals are yours to build.
Setup
Requires Python 3.10+ and an OpenAI-compatible endpoint (e.g. vLLM).
git clone https://github.com/GrokBuildMJW/ironclad.git
cd ironclad
python -m venv .venv && . .venv/bin/activate # Windows: .venv\Scripts\Activate.ps1
pip install -e ".[engine]" # ACK + engine (openai, prompt_toolkit)
# Point at your model endpoint (defaults: http://localhost:8000/v1, qwen3.6-35b):
export GX10_BASE_URL=http://localhost:8000/v1
export GX10_MODEL=your-served-model-name
export GX10_API_KEY=... # only if your endpoint needs one
# Monolithic full-screen CLI (one process):
python engine/gx10.py --workdir ./my-workspace
- Full walkthrough, the server/client split, and the reference vLLM launch:
see
SETUP.mdโ including copy-paste shell shortcuts (so you can just typeironclad) for Windows PowerShell, macOS and Linux. - Let an AI coding agent set it up for you (deterministic, verifiable runbook):
see
AGENTS.md.
A runnable demo vessel lives in examples/demo-vessel/
โ a minimal, self-contained workspace showing a contract spec, a pipeline, and
the doctor preflight. Real vessels stay in the operators' own private repos.
Layout
core/
ack/ # Agent-Contract-Kernel: case-spec, validated-emit, registry, doctor, generator
engine/ # orchestration engine: agent loop, task store, fail-closed macros
examples/demo-vessel # runnable example workspace
LICENSE NOTICE # Apache-2.0
Roadmap
Honest near-term plan (the rebuild's placeholders are now wired โ see
docs/status.md for the full per-component status):
- Broaden test coverage and harden the new server/client paths.
- One-command compose for model + orchestrator + optional memory ships now
(
docker compose --profile model --profile memory up). - First tagged release once the APIs settle.
Sovereign AI / local deployments. Ironclad is model-agnostic and fully self-hostable โ it talks to any OpenAI-compatible endpoint, so it already runs against locally-served open models (e.g. Falcon, Jais, K2 Think via vLLM โ see running on other models) with no cloud dependency and data kept on your own infrastructure. On the roadmap: verified config recipes for those models, retrieval/RAG over local datasets through the memory hook, and on-prem agent templates for enterprise/government use cases. These are integration directions the architecture already supports, not shipped features yet.
Issues and discussions are welcome โ this is an early, openly-developed project.
License
Apache License 2.0 โ see LICENSE and NOTICE.
Copyright 2026 MJWC-AI-LAB and Ironclad contributors.
๐ฆ๐ช Built in the United Arab Emirates by MJWC-AI-LAB.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ironclad_ai-0.0.1.tar.gz.
File metadata
- Download URL: ironclad_ai-0.0.1.tar.gz
- Upload date:
- Size: 64.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a13fb75b366d10ab809f06f26f50aede63a5fdda2705e22b209544bd13b1388f
|
|
| MD5 |
0920ce1b35a802983da6c48564027d97
|
|
| BLAKE2b-256 |
c3a33c0320337b918fb6fa491c8066f90ccc7ccf044be64f25227437a5c02c86
|
Provenance
The following attestation bundles were made for ironclad_ai-0.0.1.tar.gz:
Publisher:
publish.yml on GrokBuildMJW/ironclad
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ironclad_ai-0.0.1.tar.gz -
Subject digest:
a13fb75b366d10ab809f06f26f50aede63a5fdda2705e22b209544bd13b1388f - Sigstore transparency entry: 1843793672
- Sigstore integration time:
-
Permalink:
GrokBuildMJW/ironclad@4649fbaa799d999dee17da221dd6263c7fa55f21 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/GrokBuildMJW
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4649fbaa799d999dee17da221dd6263c7fa55f21 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file ironclad_ai-0.0.1-py3-none-any.whl.
File metadata
- Download URL: ironclad_ai-0.0.1-py3-none-any.whl
- Upload date:
- Size: 70.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f5cc4327a5d782b4f618c1f1d885ac632253952c4df17c7fbecadf29702519f
|
|
| MD5 |
5c1913749aaff4754c3288f02abb6c09
|
|
| BLAKE2b-256 |
5b712614073fe0f9d645a9631316266fb6787578ce803f1ac5363a8963f0888c
|
Provenance
The following attestation bundles were made for ironclad_ai-0.0.1-py3-none-any.whl:
Publisher:
publish.yml on GrokBuildMJW/ironclad
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ironclad_ai-0.0.1-py3-none-any.whl -
Subject digest:
0f5cc4327a5d782b4f618c1f1d885ac632253952c4df17c7fbecadf29702519f - Sigstore transparency entry: 1843793986
- Sigstore integration time:
-
Permalink:
GrokBuildMJW/ironclad@4649fbaa799d999dee17da221dd6263c7fa55f21 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/GrokBuildMJW
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4649fbaa799d999dee17da221dd6263c7fa55f21 -
Trigger Event:
workflow_dispatch
-
Statement type: