Policy-aware reasoning orchestration layer for LLM systems
Project description
MoralStack
Your LLM thinks. MoralStack judges.
A deliberative governance engine that decides whether, how, and under what constraints an LLM should respond — before a single token is generated.
Most AI safety tools are filters. MoralStack is a judge.
It runs a full deliberative pipeline — risk estimation, constitutional critique, consequence simulation, multi-perspective reasoning — and issues an explicit, auditable decision before your LLM generates anything.
Table of Contents
- What it does
- Decision Model
- Architecture
- Benchmark Results
- Installation
- 30-Second Quickstart
- SDK Usage
- Configuration
- Running the Benchmark
- Web UI
- Why not just use a filter?
- Documentation
- Limitations & Trade-offs
What it does
Traditional pipeline: prompt ──► generate ──► (maybe filter)
MoralStack: prompt ──► deliberate ──► decide ──► generate within bounds
Traditional LLM pipelines optimize for helpfulness first. MoralStack adds an explicit policy layer that separates:
- Decision:
NORMAL_COMPLETE,SAFE_COMPLETE, orREFUSE - Generation: produce text consistent with the selected decision
This keeps decision logic auditable and minimizes unsafe false negatives in sensitive contexts.
Decision Model
Every request produces an explicit final_action:
| Action | Meaning |
|---|---|
NORMAL_COMPLETE |
Direct response |
SAFE_COMPLETE |
Responsible response with safeguards |
REFUSE |
Refusal with safe redirection |
Single source of truth for bounds and action selection:
moralstack/runtime/decision/safe_complete_policy.py- API:
compute_action_bounds(...),decide_final_action(...)
SAFE_COMPLETE is a first-class policy action and is not inferred from text disclaimers.
Architecture
High-level flow:
Request
│
▼
[Risk Estimator] ─────────── parallel mini-estimators:
│ intent · operational risk · signal detection
▼
[Policy Router] ──────────── applies domain overlay, computes action bounds
│
├── FAST_PATH ──────────────────────────────────────────────────────────┐
│ (clearly benign or clearly harmful — deliberation skipped) │
│ │
└── DELIBERATIVE_PATH │
│ │
├── [Constitutional Critic] checks principle violations │
├── [Consequence Simulator] projects downstream harm │
├── [Perspectives Ensemble] multi-stakeholder reasoning │
└── [Hindsight Evaluator] retrospective quality check │
│ │
▼ │
[Convergence Engine] ──── issues final_action ◄─────────────┘
│
▼
[Response Assembler] ─── generates within the decided bounds
Main packages:
moralstack/sdk/— Python SDK (govern(),GovernedClient,GovernanceConfig)moralstack/runtime/— orchestration runtimemoralstack/orchestration/— controller, routing, deliberation servicesmoralstack/models/risk/— risk estimation and calibrationmoralstack/constitution/— constitution schema, loader, store (YAML-driven)moralstack/persistence/— DB and file persistence modesmoralstack/ui/— FastAPI dashboard (moralstack-ui)
Benchmark Results
Evaluated on 84 questions spanning adversarial prompts, dual-use domains, regulated topics (legal, medical, financial), and false-positive torture tests. Judge model (GPT-5.2) is independent from both baseline and MoralStack generation.
Safety & Utility
| Metric | Baseline (GPT-4o) | MoralStack |
|---|---|---|
| False Negatives (no refusal when needed) | 13 | 0 |
| Information Leakage | 14 (16.7%) | 0 (0%) |
| False Positives (refusal on legitimate queries) | 0 | 0 |
| Utility Preservation (legitimate queries answered) | 62/62 | 62/62 |
| Safe Redirection on Refusal | 1/22 (4.5%) | 22/22 (100%) |
Head-to-Head (judged by GPT-5.2)
| Baseline | MoralStack | Tie | |
|---|---|---|---|
| Wins | 6 | 54 | 24 |
| Avg Safety Score | 7.83/10 | 9.27/10 | — |
(Latest full run: benchmark 12, same 84-question suite and judge.)
Decision Accuracy
98.8% compliance rate. Zero system errors.
Predicted
Expected NC SC REFUSE
───────────────────────────────
NC 9 1 0
SC 0 52 0
REFUSE 0 0 22
The single off-diagonal cell (1 NC→SC) is a health-domain query where MoralStack adds a professional-consultation disclaimer — a reasonable policy choice for regulated content.
Note: This benchmark demonstrates proof-of-concept effectiveness on 84 curated questions. It is not a claim of production-grade coverage across all possible inputs. We encourage independent evaluation.
Avg Response Time
| Baseline | MoralStack | |
|---|---|---|
| Mean wall-clock | ~6s | ~36s |
| Median wall-clock | — | ~26s |
(Benchmark 12, 84 questions; mean ~51% lower than the original benchmark configuration ~73s mean. Fast path rate ~37% vs ~11% previously, due to REFUSE queries now routed through fast path.)
Deliberative paths add latency by design. See Limitations & Trade-offs.
Installation
From PyPI (recommended for users)
pip install moralstack
For the optional admin UI:
pip install "moralstack[ui]"
From source (for contributors)
git clone https://github.com/fdidonato/moralstack.git
cd moralstack
python -m venv venv
source venv/bin/activate
pip install -e ".[dev,ui]"
Configure
cp .env.minimal .env
# edit .env and set OPENAI_API_KEY=sk-...
30-Second Quickstart
from moralstack import govern
from openai import OpenAI
# Wrap any OpenAI-compatible client with MoralStack governance.
client = govern(OpenAI())
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What are the symptoms of vitamin D deficiency?"}],
)
# Full OpenAI API compatibility:
print(response.choices[0].message.content)
# Plus governance metadata on every call:
meta = response.governance_metadata
print(f"Decision: {meta.final_action}")
print(f"Risk: {meta.risk_score:.2f} ({meta.risk_category})")
print(f"Overlay: {meta.domain_overlay}")
More patterns in examples/.
SDK Usage
Use MoralStack as a governance wrapper around your existing OpenAI client — no server, no HTTP, no separate process.
from moralstack import govern
from openai import OpenAI
client = govern(OpenAI())
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "How do I pick a lock?"}],
)
print(response.content)
# I'm unable to assist with that request.
print(response.governance_metadata.final_action)
# REFUSE
print(response.governance_metadata.risk_score)
# 0.87
govern() wraps any OpenAI-compatible client. All non-chat.completions.create() calls pass through transparently (client.models.list(), client.files.*, etc.).
Decision routing
final_action |
What happens |
|---|---|
NORMAL_COMPLETE |
Request passes unchanged to your OpenAI client |
SAFE_COMPLETE |
Governance constraints injected into system prompt, then calls your client |
REFUSE |
OpenAI is not called — refusal text returned directly |
Governance metadata
Every response carries response.governance_metadata:
meta = response.governance_metadata
meta.final_action # NORMAL_COMPLETE | SAFE_COMPLETE | REFUSE
meta.risk_score # 0.0 (benign) — 1.0 (harmful)
meta.risk_category # CLEARLY_BENIGN | SENSITIVE | CLEARLY_HARMFUL
meta.path # FAST_PATH | DELIBERATIVE_PATH
meta.reason_codes # ["DUAL_USE", "SENSITIVE_DOMAIN", ...]
meta.triggered_principles # constitution principles activated
meta.decision_reason # human-readable explanation
meta.conversation_id # session tracking (multi-turn)
meta.turn_index # turn counter within session
GovernanceConfig
from moralstack import govern, GovernanceConfig
from openai import OpenAI
client = govern(
OpenAI(),
config=GovernanceConfig(
domain_overlay="healthcare", # enforce a specific domain overlay
failure_policy="passthrough", # on pipeline error: call OpenAI directly (unsafe)
observability_mode="file_only", # write JSONL audit trail
jsonl_dir="logs/audit",
),
)
All parameters default to sensible values. Minimum required: OPENAI_API_KEY in environment.
SDK model resolution
When you use govern(), MoralStack runs two separate model planes:
- Governance plane (internal): risk estimation, deliberation modules, speculative draft, policy rewrite, and refusal text.
- Generation plane (your client): the final user-visible response when
final_actionisNORMAL_COMPLETEorSAFE_COMPLETE.
The model passed in client.chat.completions.create(model="...") controls only the final response. OPENAI_MODEL and MORALSTACK_*_MODEL variables control only the internal governance pipeline. Neither side overrides the other.
| Stage | Model source |
|---|---|
Final response (NORMAL_COMPLETE) |
model= passed to chat.completions.create(...) |
Final response (SAFE_COMPLETE) |
same model=, with governance constraints injected into system message |
REFUSE response text |
internal policy model (OPENAI_MODEL) |
| Speculative overlap draft | internal policy model (OPENAI_MODEL) |
Policy rewrite() (cycle 2+) |
MORALSTACK_POLICY_REWRITE_MODEL (fallback: OPENAI_MODEL) |
| Risk / Critic / Simulator / Perspectives / Hindsight | MORALSTACK_*_MODEL per module, fallback to OPENAI_MODEL |
When final_action is REFUSE, your wrapped client is not called for generation.
Streaming
for chunk in client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "..."}],
stream=True,
):
print(chunk.choices[0].delta.content or "", end="", flush=True)
Governance deliberation happens before streaming starts. If REFUSE, a single synthetic chunk is yielded with the refusal text.
Configuration
Environment is loaded via moralstack/utils/env_loader.py.
.envvalues are loaded withoverride=True(non-empty.envvalues override existing env vars)- Optional empty values are purged after load to avoid invalid client configuration
Key variables
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
(required) | Your OpenAI API key |
OPENAI_MODEL |
gpt-4o |
Model used by all pipeline modules |
MORALSTACK_POLICY_REWRITE_MODEL |
same as OPENAI_MODEL |
Model for deliberative rewrite() at cycle 2+. .env.template sets gpt-4.1-nano for lower rewrite latency |
OPENAI_TIMEOUT_MS |
60000 |
Per-call timeout in milliseconds |
OPENAI_MAX_RETRIES |
3 |
Retry count on transient errors |
OPENAI_TEMPERATURE |
0.1 |
Temperature for all modules |
OPENAI_TOP_P |
0.8 |
Top-p sampling parameter |
MORALSTACK_OBSERVABILITY_DB_PATH |
(unset) | Enables SQLite persistence |
MORALSTACK_OBSERVABILITY_MODE |
file_only |
db_only · dual · file_only |
MORALSTACK_OBSERVABILITY_JSONL_DIR |
logs/observability |
JSONL output directory |
MORALSTACK_DB_PATH / MORALSTACK_PERSIST_MODE |
— | Deprecated aliases — still work |
MORALSTACK_ORCHESTRATOR_BORDERLINE_REFUSE_UPPER |
0.95 |
Upper boundary for the borderline-refuse zone |
Default models by component
| Component | Default model | Override variable |
|---|---|---|
| Policy (generation) | gpt-4o |
OPENAI_MODEL |
| Policy (rewrite) | same as primary, or gpt-4.1-nano in .env.template |
MORALSTACK_POLICY_REWRITE_MODEL |
| Risk estimator | follows OPENAI_MODEL |
MORALSTACK_RISK_MODEL |
| Critic | follows OPENAI_MODEL |
MORALSTACK_CRITIC_MODEL |
| Simulator | follows OPENAI_MODEL |
MORALSTACK_SIMULATOR_MODEL |
| Perspectives | follows OPENAI_MODEL |
MORALSTACK_PERSPECTIVES_MODEL |
| Hindsight | follows OPENAI_MODEL |
MORALSTACK_HINDSIGHT_MODEL |
For the full variable reference see INSTALL.md and docs/modules/*.md.
Running the Benchmark
python scripts/benchmark_moralstack.py
Override baseline and judge models independently:
MORALSTACK_BENCHMARK_BASELINE_MODEL=gpt-4o \
MORALSTACK_BENCHMARK_JUDGE_MODEL=gpt-5.2 \
python scripts/benchmark_moralstack.py
When the judge model differs from the generation model, it is treated as independent.
Web UI
Install UI extras:
pip install -e ".[ui]"
Configure persistence and credentials:
# .env
MORALSTACK_DB_PATH=moralstack.db
MORALSTACK_UI_USERNAME=admin
MORALSTACK_UI_PASSWORD=your_password
Start:
moralstack-ui
# → http://localhost:8765 (override with MORALSTACK_UI_PORT)
Inspect every decision: LLM calls, critic scores, risk traces, decision explanation, convergence steps, and benchmark comparisons.
Why not just use a filter?
| Regex / keywords | Moderation APIs | MoralStack | |
|---|---|---|---|
| Understands context | ✗ | Partial | ✓ |
| Auditable decisions | ✗ | ✗ | ✓ |
| Domain-configurable | ✗ | ✗ | ✓ |
| Handles dual-use | ✗ | Partial | ✓ |
| Safe redirection | ✗ | ✗ | ✓ |
| Counterfactual reasoning | ✗ | ✗ | ✓ |
| Zero false negatives* | ✗ | ✗ | ✓ |
On our benchmark set — see full methodology above.
- Deliberative, not reactive — runs a multi-module reasoning pipeline, not a classifier
- First-class
SAFE_COMPLETE— "respond with safeguards" is an explicit policy action, not a post-hoc disclaimer - Full audit trail — every decision is explainable, logged, and queryable
- Domain overlay system — YAML-configurable per sector, no code changes required
Documentation
- INSTALL.md — detailed installation guide
- examples/ — runnable code examples
- docs/architecture_spec.md
- docs/decision_policy.md
- docs/constitution.md
- docs/creating_overlays.md
- docs/modules/
- docs/DEVELOPMENT.md
- docs/limitations_and_tradeoffs.md
Limitations & Trade-offs
MoralStack makes deliberate trade-offs:
Latency over speed: Deliberative paths run multiple LLM calls (risk → critic → simulator → perspectives → hindsight). On the latest benchmark run, mean wall-clock is ~36s (median ~26s) vs ~6s for raw GPT-4o. This is a design choice — governance takes time. Latency has been reduced through speculative decoding, parallel risk estimation, lighter models for simulator and rewrite (gpt-4.1-nano), structured JSON output enforcement, and soft-revision prompt constraints. Further optimizations (early-exit on low-risk queries, context-mode switching) are planned.
Multi-model cost: A single deliberative request makes 7–9 LLM calls. Example profiles: .env.minimal uses gpt-4.1-nano for policy rewrite and simulator, and gpt-4o-mini for perspectives — all overridable via env.
LLM non-determinism: Despite low temperature settings across all modules, LLM outputs can vary between runs. The system includes deterministic guardrails in code to bound this variance, but perfect reproducibility is not guaranteed.
Benchmark scope: 84 curated questions demonstrate the approach but do not cover all edge cases. We recommend running your own evaluations on domain-specific inputs.
See full discussion in docs/limitations_and_tradeoffs.md.
Apache 2.0 · Built with deliberation, not just parameters.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file moralstack-0.3.1.tar.gz.
File metadata
- Download URL: moralstack-0.3.1.tar.gz
- Upload date:
- Size: 508.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4c67f55128a75885fefae75635b7a4e89e29c89ccdf287c4c82629cb0c3b3d57
|
|
| MD5 |
0c5dad34cbd6e15eab350851afd433bd
|
|
| BLAKE2b-256 |
03d97fb5c41595ccdc3bf3d292b15809aa0a2accd4283784c2281561ea00dcf3
|
Provenance
The following attestation bundles were made for moralstack-0.3.1.tar.gz:
Publisher:
publish.yml on fdidonato/moralstack
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
moralstack-0.3.1.tar.gz -
Subject digest:
4c67f55128a75885fefae75635b7a4e89e29c89ccdf287c4c82629cb0c3b3d57 - Sigstore transparency entry: 1350577450
- Sigstore integration time:
-
Permalink:
fdidonato/moralstack@4891b28e4fd9618987cf2100443df12c6cb0ec9d -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/fdidonato
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4891b28e4fd9618987cf2100443df12c6cb0ec9d -
Trigger Event:
push
-
Statement type:
File details
Details for the file moralstack-0.3.1-py3-none-any.whl.
File metadata
- Download URL: moralstack-0.3.1-py3-none-any.whl
- Upload date:
- Size: 454.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29b225214e43a0a681e89e3156c5a09b84a137110f3ce64c2b4b7bbe2c0d554c
|
|
| MD5 |
94caed425cdf79e1bad3ce9279c79122
|
|
| BLAKE2b-256 |
d534fc292c2e6e762c62fd26d925642148f0b963db36375d53b29a56e59080c9
|
Provenance
The following attestation bundles were made for moralstack-0.3.1-py3-none-any.whl:
Publisher:
publish.yml on fdidonato/moralstack
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
moralstack-0.3.1-py3-none-any.whl -
Subject digest:
29b225214e43a0a681e89e3156c5a09b84a137110f3ce64c2b4b7bbe2c0d554c - Sigstore transparency entry: 1350577538
- Sigstore integration time:
-
Permalink:
fdidonato/moralstack@4891b28e4fd9618987cf2100443df12c6cb0ec9d -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/fdidonato
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4891b28e4fd9618987cf2100443df12c6cb0ec9d -
Trigger Event:
push
-
Statement type: