moralstack

Policy-aware reasoning orchestration layer for LLM systems

These details have not been verified by PyPI

Project description

MoralStack

Your LLM thinks. MoralStack judges.

A deliberative governance engine that decides whether, how, and under what constraints an LLM should respond — before a single token is generated.

License Python Status Compliance Model

Most AI safety tools are filters. MoralStack is a judge.

It runs a full deliberative pipeline — risk estimation, constitutional critique, consequence simulation, multi-perspective reasoning — and issues an explicit, auditable decision before your LLM generates anything.

What it does
Decision Model
Architecture
Benchmark Results
Installation
30-Second Quickstart
SDK Usage
Configuration
Running the Benchmark
Web UI
Why not just use a filter?
Documentation
Limitations & Trade-offs

What it does

Traditional pipeline:   prompt ──► generate ──► (maybe filter)

MoralStack:             prompt ──► deliberate ──► decide ──► generate within bounds

Traditional LLM pipelines optimize for helpfulness first. MoralStack adds an explicit policy layer that separates:

Decision: NORMAL_COMPLETE, SAFE_COMPLETE, or REFUSE
Generation: produce text consistent with the selected decision

This keeps decision logic auditable and minimizes unsafe false negatives in sensitive contexts.

Decision Model

Every request produces an explicit final_action:

Action	Meaning
`NORMAL_COMPLETE`	Direct response
`SAFE_COMPLETE`	Responsible response with safeguards
`REFUSE`	Refusal with safe redirection

Single source of truth for bounds and action selection:

moralstack/runtime/decision/safe_complete_policy.py
API: compute_action_bounds(...), decide_final_action(...)

SAFE_COMPLETE is a first-class policy action and is not inferred from text disclaimers.

Architecture

High-level flow:

Request
  │
  ▼
[Risk Estimator] ─────────── parallel mini-estimators:
  │                          intent · operational risk · signal detection
  ▼
[Policy Router] ──────────── applies domain overlay, computes action bounds
  │
  ├── FAST_PATH ──────────────────────────────────────────────────────────┐
  │   (clearly benign or clearly harmful — deliberation skipped)          │
  │                                                                       │
  └── DELIBERATIVE_PATH                                                   │
        │                                                                 │
        ├── [Constitutional Critic]    checks principle violations        │
        ├── [Consequence Simulator]    projects downstream harm           │
        ├── [Perspectives Ensemble]    multi-stakeholder reasoning        │
        └── [Hindsight Evaluator]      retrospective quality check        │
                    │                                                     │
                    ▼                                                     │
             [Convergence Engine] ──── issues final_action ◄─────────────┘
                    │
                    ▼
             [Response Assembler] ─── generates within the decided bounds

Main packages:

moralstack/sdk/ — Python SDK (govern(), GovernedClient, GovernanceConfig)
moralstack/runtime/ — orchestration runtime
moralstack/orchestration/ — controller, routing, deliberation services
moralstack/models/risk/ — risk estimation and calibration
moralstack/constitution/ — constitution schema, loader, store (YAML-driven)
moralstack/persistence/ — DB and file persistence modes
moralstack/ui/ — FastAPI dashboard (moralstack-ui)

Benchmark Results

Evaluated on 84 questions spanning adversarial prompts, dual-use domains, regulated topics (legal, medical, financial), and false-positive torture tests. Judge model (GPT-5.2) is independent from both baseline and MoralStack generation.

Safety & Utility

Metric	Baseline (GPT-4o)	MoralStack
False Negatives (no refusal when needed)	13	0
Information Leakage	14 (16.7%)	0 (0%)
False Positives (refusal on legitimate queries)	0	0
Utility Preservation (legitimate queries answered)	62/62	62/62
Safe Redirection on Refusal	1/22 (4.5%)	22/22 (100%)

Head-to-Head (judged by GPT-5.2)

	Baseline	MoralStack	Tie
Wins	6	54	24
Avg Safety Score	7.83/10	9.27/10	—

(Latest full run: benchmark 12, same 84-question suite and judge.)

Decision Accuracy

98.8% compliance rate. Zero system errors.

             Predicted
Expected      NC    SC    REFUSE
───────────────────────────────
NC             9     1     0
SC             0    52     0
REFUSE         0     0    22

The single off-diagonal cell (1 NC→SC) is a health-domain query where MoralStack adds a professional-consultation disclaimer — a reasonable policy choice for regulated content.

Note: This benchmark demonstrates proof-of-concept effectiveness on 84 curated questions. It is not a claim of production-grade coverage across all possible inputs. We encourage independent evaluation.

Avg Response Time

	Baseline	MoralStack
Mean wall-clock	~6s	~36s
Median wall-clock	—	~26s

(Benchmark 12, 84 questions; mean ~51% lower than the original benchmark configuration ~73s mean. Fast path rate ~37% vs ~11% previously, due to REFUSE queries now routed through fast path.)

Deliberative paths add latency by design. See Limitations & Trade-offs.

Installation

From PyPI (recommended for users)

pip install moralstack

For the optional admin UI:

pip install "moralstack[ui]"

From source (for contributors)

git clone https://github.com/fdidonato/moralstack.git
cd moralstack
python -m venv venv
source venv/bin/activate
pip install -e ".[dev,ui]"

Configure

cp .env.minimal .env
# edit .env and set OPENAI_API_KEY=sk-...

30-Second Quickstart

from moralstack import govern
from openai import OpenAI

# Wrap any OpenAI-compatible client with MoralStack governance.
client = govern(OpenAI())

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What are the symptoms of vitamin D deficiency?"}],
)

# Full OpenAI API compatibility:
print(response.choices[0].message.content)

# Plus governance metadata on every call:
meta = response.governance_metadata
print(f"Decision: {meta.final_action}")
print(f"Risk: {meta.risk_score:.2f} ({meta.risk_category})")
print(f"Overlay: {meta.domain_overlay}")

More patterns in examples/.

SDK Usage

Use MoralStack as a governance wrapper around your existing OpenAI client — no server, no HTTP, no separate process.

from moralstack import govern
from openai import OpenAI

client = govern(OpenAI())

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "How do I pick a lock?"}],
)

print(response.content)
# I'm unable to assist with that request.

print(response.governance_metadata.final_action)
# REFUSE

print(response.governance_metadata.risk_score)
# 0.87

govern() wraps any OpenAI-compatible client. All non-chat.completions.create() calls pass through transparently (client.models.list(), client.files.*, etc.).

Decision routing

`final_action`	What happens
`NORMAL_COMPLETE`	Request passes unchanged to your OpenAI client
`SAFE_COMPLETE`	Governance constraints injected into system prompt, then calls your client
`REFUSE`	OpenAI is not called — refusal text returned directly

Governance metadata

Every response carries response.governance_metadata:

meta = response.governance_metadata

meta.final_action           # NORMAL_COMPLETE | SAFE_COMPLETE | REFUSE
meta.risk_score             # 0.0 (benign) — 1.0 (harmful)
meta.risk_category          # CLEARLY_BENIGN | SENSITIVE | CLEARLY_HARMFUL
meta.path                   # FAST_PATH | DELIBERATIVE_PATH
meta.reason_codes           # ["DUAL_USE", "SENSITIVE_DOMAIN", ...]
meta.triggered_principles   # constitution principles activated
meta.decision_reason        # human-readable explanation
meta.conversation_id        # session tracking (multi-turn)
meta.turn_index             # turn counter within session

GovernanceConfig

from moralstack import govern, GovernanceConfig
from openai import OpenAI

client = govern(
    OpenAI(),
    config=GovernanceConfig(
        domain_overlay="healthcare",     # enforce a specific domain overlay
        failure_policy="passthrough",    # on pipeline error: call OpenAI directly (unsafe)
        observability_mode="file_only",  # write JSONL audit trail
        jsonl_dir="logs/audit",
    ),
)

All parameters default to sensible values. Minimum required: OPENAI_API_KEY in environment.

SDK model resolution

When you use govern(), MoralStack runs two separate model planes:

Governance plane (internal): risk estimation, deliberation modules, speculative draft, policy rewrite, and refusal text.
Generation plane (your client): the final user-visible response when final_action is NORMAL_COMPLETE or SAFE_COMPLETE.

The model passed in client.chat.completions.create(model="...") controls only the final response. OPENAI_MODEL and MORALSTACK_*_MODEL variables control only the internal governance pipeline. Neither side overrides the other.

Stage	Model source
Final response (`NORMAL_COMPLETE`)	`model=` passed to `chat.completions.create(...)`
Final response (`SAFE_COMPLETE`)	same `model=`, with governance constraints injected into system message
`REFUSE` response text	internal policy model (`OPENAI_MODEL`)
Speculative overlap draft	internal policy model (`OPENAI_MODEL`)
Policy `rewrite()` (cycle 2+)	`MORALSTACK_POLICY_REWRITE_MODEL` (fallback: `OPENAI_MODEL`)
Risk / Critic / Simulator / Perspectives / Hindsight	`MORALSTACK_*_MODEL` per module, fallback to `OPENAI_MODEL`

When final_action is REFUSE, your wrapped client is not called for generation.

Streaming

for chunk in client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "..."}],
    stream=True,
):
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Governance deliberation happens before streaming starts. If REFUSE, a single synthetic chunk is yielded with the refusal text.

Configuration

Environment is loaded via moralstack/utils/env_loader.py.

.env values are loaded with override=True (non-empty .env values override existing env vars)
Optional empty values are purged after load to avoid invalid client configuration

Key variables

Variable	Default	Description
`OPENAI_API_KEY`	(required)	Your OpenAI API key
`OPENAI_MODEL`	`gpt-4o`	Model used by all pipeline modules
`MORALSTACK_POLICY_REWRITE_MODEL`	same as `OPENAI_MODEL`	Model for deliberative `rewrite()` at cycle 2+. `.env.template` sets `gpt-4.1-nano` for lower rewrite latency
`OPENAI_TIMEOUT_MS`	`60000`	Per-call timeout in milliseconds
`OPENAI_MAX_RETRIES`	`3`	Retry count on transient errors
`OPENAI_TEMPERATURE`	`0.1`	Temperature for all modules
`OPENAI_TOP_P`	`0.8`	Top-p sampling parameter
`MORALSTACK_OBSERVABILITY_DB_PATH`	(unset)	Enables SQLite persistence
`MORALSTACK_OBSERVABILITY_MODE`	`file_only`	`db_only` · `dual` · `file_only`
`MORALSTACK_OBSERVABILITY_JSONL_DIR`	`logs/observability`	JSONL output directory
`MORALSTACK_DB_PATH` / `MORALSTACK_PERSIST_MODE`	—	Deprecated aliases — still work
`MORALSTACK_ORCHESTRATOR_BORDERLINE_REFUSE_UPPER`	`0.95`	Upper boundary for the borderline-refuse zone

Default models by component

Component	Default model	Override variable
Policy (generation)	`gpt-4o`	`OPENAI_MODEL`
Policy (rewrite)	same as primary, or `gpt-4.1-nano` in `.env.template`	`MORALSTACK_POLICY_REWRITE_MODEL`
Risk estimator	follows `OPENAI_MODEL`	`MORALSTACK_RISK_MODEL`
Critic	follows `OPENAI_MODEL`	`MORALSTACK_CRITIC_MODEL`
Simulator	follows `OPENAI_MODEL`	`MORALSTACK_SIMULATOR_MODEL`
Perspectives	follows `OPENAI_MODEL`	`MORALSTACK_PERSPECTIVES_MODEL`
Hindsight	follows `OPENAI_MODEL`	`MORALSTACK_HINDSIGHT_MODEL`

For the full variable reference see INSTALL.md and docs/modules/*.md.

Running the Benchmark

python scripts/benchmark_moralstack.py

Override baseline and judge models independently:

MORALSTACK_BENCHMARK_BASELINE_MODEL=gpt-4o \
MORALSTACK_BENCHMARK_JUDGE_MODEL=gpt-5.2 \
python scripts/benchmark_moralstack.py

When the judge model differs from the generation model, it is treated as independent.

Web UI

Install UI extras:

pip install -e ".[ui]"

Configure persistence and credentials:

# .env
MORALSTACK_DB_PATH=moralstack.db
MORALSTACK_UI_USERNAME=admin
MORALSTACK_UI_PASSWORD=your_password

Start:

moralstack-ui
# → http://localhost:8765  (override with MORALSTACK_UI_PORT)

Inspect every decision: LLM calls, critic scores, risk traces, decision explanation, convergence steps, and benchmark comparisons.

Why not just use a filter?

	Regex / keywords	Moderation APIs	MoralStack
Understands context	✗	Partial	✓
Auditable decisions	✗	✗	✓
Domain-configurable	✗	✗	✓
Handles dual-use	✗	Partial	✓
Safe redirection	✗	✗	✓
Counterfactual reasoning	✗	✗	✓
Zero false negatives*	✗	✗	✓

On our benchmark set — see full methodology above.

Deliberative, not reactive — runs a multi-module reasoning pipeline, not a classifier
First-class SAFE_COMPLETE — "respond with safeguards" is an explicit policy action, not a post-hoc disclaimer
Full audit trail — every decision is explainable, logged, and queryable
Domain overlay system — YAML-configurable per sector, no code changes required

Documentation

INSTALL.md — detailed installation guide
examples/ — runnable code examples
docs/architecture_spec.md
docs/decision_policy.md
docs/constitution.md
docs/creating_overlays.md
docs/modules/
docs/DEVELOPMENT.md
docs/limitations_and_tradeoffs.md

Limitations & Trade-offs

MoralStack makes deliberate trade-offs:

Latency over speed: Deliberative paths run multiple LLM calls (risk → critic → simulator → perspectives → hindsight). On the latest benchmark run, mean wall-clock is ~36s (median ~26s) vs ~6s for raw GPT-4o. This is a design choice — governance takes time. Latency has been reduced through speculative decoding, parallel risk estimation, lighter models for simulator and rewrite (gpt-4.1-nano), structured JSON output enforcement, and soft-revision prompt constraints. Further optimizations (early-exit on low-risk queries, context-mode switching) are planned.

Multi-model cost: A single deliberative request makes 7–9 LLM calls. Example profiles: .env.minimal uses gpt-4.1-nano for policy rewrite and simulator, and gpt-4o-mini for perspectives — all overridable via env.

LLM non-determinism: Despite low temperature settings across all modules, LLM outputs can vary between runs. The system includes deterministic guardrails in code to bound this variance, but perfect reproducibility is not guaranteed.

Benchmark scope: 84 curated questions demonstrate the approach but do not cover all edge cases. We recommend running your own evaluations on domain-specific inputs.

See full discussion in docs/limitations_and_tradeoffs.md.

Apache 2.0 · Built with deliberation, not just parameters.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.0

May 6, 2026

0.3.3

Apr 22, 2026

This version

0.3.1

Apr 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moralstack-0.3.1.tar.gz (508.2 kB view details)

Uploaded Apr 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

moralstack-0.3.1-py3-none-any.whl (454.6 kB view details)

Uploaded Apr 21, 2026 Python 3

File details

Details for the file moralstack-0.3.1.tar.gz.

File metadata

Download URL: moralstack-0.3.1.tar.gz
Upload date: Apr 21, 2026
Size: 508.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for moralstack-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`4c67f55128a75885fefae75635b7a4e89e29c89ccdf287c4c82629cb0c3b3d57`
MD5	`0c5dad34cbd6e15eab350851afd433bd`
BLAKE2b-256	`03d97fb5c41595ccdc3bf3d292b15809aa0a2accd4283784c2281561ea00dcf3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for moralstack-0.3.1.tar.gz:

Publisher: publish.yml on fdidonato/moralstack

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: moralstack-0.3.1.tar.gz
- Subject digest: 4c67f55128a75885fefae75635b7a4e89e29c89ccdf287c4c82629cb0c3b3d57
- Sigstore transparency entry: 1350577450
- Sigstore integration time: Apr 21, 2026
Source repository:
- Permalink: fdidonato/moralstack@4891b28e4fd9618987cf2100443df12c6cb0ec9d
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/fdidonato
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4891b28e4fd9618987cf2100443df12c6cb0ec9d
- Trigger Event: push

File details

Details for the file moralstack-0.3.1-py3-none-any.whl.

File metadata

Download URL: moralstack-0.3.1-py3-none-any.whl
Upload date: Apr 21, 2026
Size: 454.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for moralstack-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`29b225214e43a0a681e89e3156c5a09b84a137110f3ce64c2b4b7bbe2c0d554c`
MD5	`94caed425cdf79e1bad3ce9279c79122`
BLAKE2b-256	`d534fc292c2e6e762c62fd26d925642148f0b963db36375d53b29a56e59080c9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for moralstack-0.3.1-py3-none-any.whl:

Publisher: publish.yml on fdidonato/moralstack

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: moralstack-0.3.1-py3-none-any.whl
- Subject digest: 29b225214e43a0a681e89e3156c5a09b84a137110f3ce64c2b4b7bbe2c0d554c
- Sigstore transparency entry: 1350577538
- Sigstore integration time: Apr 21, 2026
Source repository:
- Permalink: fdidonato/moralstack@4891b28e4fd9618987cf2100443df12c6cb0ec9d
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/fdidonato
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4891b28e4fd9618987cf2100443df12c6cb0ec9d
- Trigger Event: push

moralstack 0.3.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

MoralStack

Your LLM thinks. MoralStack judges.

Table of Contents

What it does

Decision Model

Architecture

Benchmark Results

Safety & Utility

Head-to-Head (judged by GPT-5.2)

Decision Accuracy

Avg Response Time

Installation

From PyPI (recommended for users)

From source (for contributors)

Configure

30-Second Quickstart

SDK Usage

Decision routing

Governance metadata

GovernanceConfig

SDK model resolution

Streaming

Configuration

Key variables

Default models by component

Running the Benchmark

Web UI

Why not just use a filter?

Documentation

Limitations & Trade-offs

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance