MARL — Model-Agnostic Runtime Middleware for LLMs. Multi-stage multi-agent reasoning pipeline.

These details have not been verified by PyPI

Project links

Project description

MARL — Model-Agnostic Runtime Middleware for LLMs

The 3rd approach after fine-tuning & RAG — restructure how LLMs reason at runtime, not their weights.

pip install marl-middleware · docker run vidraft/marl

Multi-stage multi-agent reasoning pipeline that works with any LLM.

What is MARL?

MARL (Model-Agnostic Runtime Middleware) is a multi-stage multi-agent reasoning pipeline that decomposes a single LLM call into multiple independent expert roles.

The two dominant approaches to improving LLM quality — fine-tuning and RAG — each have fundamental limitations. Fine-tuning requires millions of dollars in GPU costs and weeks of training. RAG supplements external knowledge but cannot improve the model's reasoning ability itself.

MARL is the 3rd approach: it redesigns the structure of reasoning at runtime without touching model weights. It works with any LLM — GPT-5.4, Claude Opus, Gemini, DeepSeek, Llama, or local open-source models — with a single line of code. Switch models freely; MARL's effect remains.

┌─ Your App ─────────────────────────────────────────┐
│  OpenClaw / Cursor / Custom App / Any LLM Client    │
│  client = OpenAI(base_url="http://MARL:8080/v1")    │
└────────────────────┬───────────────────────────────┘
                     │ HTTP (OpenAI API format)
                     ▼
┌─ MARL Middleware ──────────────────────────────────┐
│  Multi-stage Multi-agent Reasoning Pipeline         │
│  9 Emergence Engines · 70%+ Hallucination Reduction │
│  FINAL Bench: MA=0.694 vs ER=0.302                  │
└────────────────────┬───────────────────────────────┘
                     │ API calls
                     ▼
┌─ Any LLM ──────────────────────────────────────────┐
│  GPT-5.4 · Claude · Gemini · DeepSeek · Ollama …   │
└────────────────────────────────────────────────────┘

Why MARL?

The Problem: LLMs answer confidently even when wrong — and can't stop themselves

Metacognition is the ability of an AI to recognize that its own answer might be wrong and self-correct. Research from FINAL Bench, the world's first AI metacognition benchmark, confirmed that even the most advanced models — GPT-5.2, Claude Opus 4.6, Gemini 3 Pro — have critically insufficient self-correction capabilities.

Due to their autoregressive architecture, once an LLM begins generating a response, earlier tokens determine later ones. The model cannot pause mid-generation to say "I was wrong" and change direction.

The Solution: Multi-stage Independent Expert Pipeline

MARL routes a single question through multiple independent expert agents in sequence. Instead of one chef handling everything alone, MARL assigns a recipe planner to design the optimal approach, a head chef to execute, a quality inspector to audit, and a Michelin judge to deliver the final verdict.

Each agent carries a distinct role and perspective — hypothesis exploration → core solving → consistency auditing → adversarial verification → metacognitive synthesis — producing emergent reasoning and self-correction capabilities that no single model call can achieve.

This architecture resolves two fundamental LLM limitations:

Emergent reasoning: New perspectives arise through multi-stage interaction that never surface in a single model call.
Structural self-correction: While a standard LLM cannot reverse course mid-generation, MARL's adversarial verification stage re-examines the draft for errors, and the final synthesis stage produces an entirely new answer incorporating all corrections.

Results

Metric	Value
Hallucination reduction	70%+ (FINAL Bench verified)
Self-correction contribution	94.8% (Error Recovery drives 94.8% of total improvement)
FINAL Bench MA vs ER	0.694 vs 0.302
FINAL Bench Dataset ranking	HuggingFace Global Top 5
Monthly Active Users (MAU)	2M+
Public AI models & tools	1,500+
HuggingFace STAR AI	2024 Top 12 (only Korean company selected)
FACTS Grounding	Google DeepMind Leaderboard #2 worldwide

Installation

pip (Linux x86_64 / Python 3.12)

pip install marl-middleware

Docker (All OS — Mac · Windows · Linux)

docker run -p 8080:8080 vidraft/marl

HuggingFace Space (No install required)

https://huggingface.co/spaces/VIDraft/MARL

Usage

Python SDK

from marl import Marl, MarlConfig

# OpenAI
ml = Marl.from_openai("sk-...", config=MarlConfig(
    mode="emergence", emergence_type="create"
))
result = ml.run("Generate 10 movie loglines never seen before")
print(result.answer)

# Anthropic
ml = Marl.from_anthropic("sk-ant-...", model="claude-sonnet-4-20250514")

# Ollama (local)
ml = Marl.from_ollama("llama3.1")

# Groq (free)
ml = Marl.from_openai_compatible(
    "https://api.groq.com/openai/v1", "gsk-...", "openai/gpt-oss-120b"
)

# Any OpenAI-compatible API (DeepSeek, xAI, Friendli, etc.)
ml = Marl.from_openai_compatible("https://api.deepseek.com/v1", "key", "deepseek-v3")

1-Line Integration (Any LLM App)

Add base_url — one line — and every call passes through the MARL pipeline:

# Before
client = OpenAI(api_key="sk-...")

# After — just add base_url
client = OpenAI(api_key="sk-...", base_url="http://localhost:8080/v1")

Works With Every LLM

MARL is model-agnostic middleware. It works instantly with any LLM that supports the OpenAI API format. You are never locked into a specific provider — when a better model launches, switch immediately.

OpenAI        — GPT-5.4, GPT-5.2, GPT-4.1, o4-mini, GPT-OSS-120B
Anthropic     — Claude Opus 4.6, Sonnet 4.6, Haiku 4.5
Google        — Gemini 3.1 Pro, Gemini 3 Flash, Gemini 2.5 Pro/Flash
DeepSeek      — DeepSeek-V3, DeepSeek-R1, R2
xAI           — Grok-4, Grok-3
Groq          — gpt-oss-120b, Llama 4, DeepSeek-R1, QwQ-32b (free)
Meta          — Llama 4 Scout/Maverick
Alibaba       — Qwen3.5 series
Ollama        — Any local open-source model
Custom        — Any OpenAI-compatible endpoint

This list is illustrative. MARL works with any LLM supporting the OpenAI API format — zero code changes required.

9 Emergence Engines

Beyond the default reasoning enhancement (Insight mode), MARL includes 9 specialized emergence engines. Each engine is powered by a proprietary knowledge matrix and emergence rules designed to generate domain-specific ideas that a single LLM cannot produce alone.

Engine	Strengths
🔧 Invent	Technology fusion–based invention. Cross-applies TRIZ principles, bio-inspired patterns, and contradiction resolution strategies to generate patent-level concepts.
✨ Create	Universal creative engine. Drives idea generation through cliché inversion, paradox engines, genre fusion, sensory collision, and cultural cross-pollination.
🍳 Recipe	Culinary emergence. Crosses cooking methods, textures, architectural structures, and flavor grammar to develop novel dish concepts with built-in taste chemistry validation.
💊 Pharma	Drug & compound ideation. Drug repositioning, mechanism crossing, delivery innovation, multi-target design, and scaffold hopping for novel therapeutic concepts.
🧬 Genomics	Genomics & bio emergence. Pathway crosstalk discovery, synthetic lethality exploration, phenotype bridging, and technology platform transfer.
🧪 Chemistry	Materials & chemistry emergence. Achieving contradictory properties simultaneously, nano↔macro scale shifting, biomimicry, and waste-to-value transformation.
🌍 Ecology	Environmental & conservation emergence. Conservation model transfer, species protection network effects, threat-to-resource inversion, and ecosystem service stacking.
⚖️ Law	Legal & regulatory emergence. Cross-jurisdiction framework transplantation, regulatory mechanism inversion, tech-law collision resolution, and legal instrument innovation.
📄 Document	Report & document engine. Metacognitive document generation based on structured writing principles and policy dilemma frameworks.

Mode Switching

Append ::mode to any model name to switch engines:

model="gpt-5.4"             →  🔬 Insight (default — fact-check · strategy)
model="gpt-5.4::invent"     →  🔧 Invent
model="gpt-5.4::create"     →  ✨ Create
model="gpt-5.4::recipe"     →  🍳 Recipe
model="gpt-5.4::pharma"     →  💊 Pharma
model="gpt-5.4::genomics"   →  🧬 Genomics
model="gpt-5.4::chemistry"  →  🧪 Chemistry
model="gpt-5.4::ecology"    →  🌍 Ecology
model="gpt-5.4::law"        →  ⚖️ Law
model="gpt-5.4::document"   →  📄 Document

Replace gpt-5.4 with any model name you prefer.

OpenClaw Integration

MARL serves as the brain for OpenClaw. Before OpenClaw acts (emails, files, scheduling), MARL thinks deeply first.

# Step 1: Install MARL locally
docker run -p 8080:8080 vidraft/marl

# Step 2: Set OpenClaw config.json
{
  "llm": {
    "baseURL": "http://localhost:8080/v1",
    "model": "gpt-5.4::create"
  }
}

# Step 3: Chat naturally
"Analyze this with MARL"
"Use MARL pharma mode for drug repositioning candidates"

IP Protection

MARL's core reasoning engine — the multi-stage pipeline, weighted attention matrices, and agent prompts — is delivered as compiled binaries (.so) to protect proprietary technology. At the same time, all interface code needed for installation, testing, configuration, and API integration is openly available, so anyone can try MARL immediately and integrate it into their own environment.

About VIDRAFT

VIDRAFT is an AI startup founded in 2024 with the goal of developing True-AGI by 2030. Based at Seoul AI Hub, the team focuses on LLM metacognition enhancement and multi-agent reasoning technology.

Achievement	Detail
FINAL Bench	World's first AI metacognition benchmark · HuggingFace Global Top 5 dataset
FINAL Bench Leaderboard	HuggingFace "Spaces of the Week" selected
HuggingFace Heatmap Leaderboard	Global #4
STAR AI Top 12	2024 selected (only Korean company)
Medical AI	Google DeepMind FACTS Grounding Leaderboard #2 worldwide
MAU	2M+
Cumulative visitors	30M+
Public AI models & tools	1,500+
NIPA H200 GPU×8	National AI infrastructure grant recipient
Seoul AI Hub	Resident company
NH Open Innovation	2025 selected
Press	Seoul Shinmun, Asia Economy, IT Chosun, and more

Roadmap

MARL Enterprise Edition — Private deployment, custom pipelines, SLA support (H1 2026)
Academic publication — FINAL Bench–based validation paper for international journal
US market entry — Silicon Valley partnership program
OpenClaw ClawHub — Skill registration for the 247K+ community

Links


🌐 Website	https://vidraft.net
🤗 MARL Demo	https://huggingface.co/spaces/VIDraft/MARL
🏆 FINAL Bench	https://huggingface.co/spaces/FINAL-Bench/Leaderboard
📦 PyPI	https://pypi.org/project/marl-middleware/
⚡ GitHub	https://github.com/Vidraft/MARL
✉️ Contact	arxivgpt@gmail.com (Minsik KIM, CEO)

License

MARL · Model-Agnostic Runtime Middleware for LLMs

"Don't change the model. Change how the model thinks."

— Minsik KIM, CEO of VIDRAFT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.0

Mar 9, 2026

1.0.0

Mar 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

marl_middleware-1.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (365.0 kB view details)

Uploaded Mar 9, 2026 CPython 3.12manylinux: glibc 2.17+ x86-64

File details

Details for the file marl_middleware-1.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: marl_middleware-1.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Mar 9, 2026
Size: 365.0 kB
Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for marl_middleware-1.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`273ddc103ebb7c5af77b08df0230b6ee3c72c72a94ac452440dcf331bfc96a86`
MD5	`e016ec53a818fba817e5883614ecafe2`
BLAKE2b-256	`6a65872ae3342bc69f508f27401ef23e3a47597c0d27a4ed5765c428bd902080`

See more details on using hashes here.

marl-middleware 1.1.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

MARL — Model-Agnostic Runtime Middleware for LLMs

What is MARL?

Why MARL?

The Problem: LLMs answer confidently even when wrong — and can't stop themselves

The Solution: Multi-stage Independent Expert Pipeline

Results

Installation

pip (Linux x86_64 / Python 3.12)

Docker (All OS — Mac · Windows · Linux)

HuggingFace Space (No install required)

Usage

Python SDK

1-Line Integration (Any LLM App)

Works With Every LLM

9 Emergence Engines

Mode Switching

OpenClaw Integration

IP Protection

About VIDRAFT

Roadmap

Links

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes