Cross-layer policy regression testing for LLM data-agent stacks

These details have not been verified by PyPI

Project links

Paper

Project description

PolicyStrata

PolicyStrata is a deterministic regression-testing framework for cross-layer policy drift in LLM data-agent stacks.

It generates principals, requests, semantic plans, database states, lowered queries, and release decisions; compares each layer against a canonical reference policy; and minimizes failures into small reproducible witnesses.

Use it when you are building text-to-SQL agents, BI copilots, internal analytics agents, warehouse chat systems, or governed enterprise LLM tools and need to know whether prompts, manifests, semantic plans, validators, SQL compilers, database controls, and output filters still agree about policy.

PolicyStrata is not an authorization boundary, and it is not another generic text-to-SQL benchmark. It is a reproducible research artifact and regression gate for finding reachable disagreements between layers.

Quick Start

From PyPI:

uvx policystrata demo
pipx run policystrata demo

From a source checkout:

uv sync --extra dev
uv run policystrata demo

The demo runs the built-in support_saas fixture, writes traces and minimized witnesses to runs/demo, and prints the drift classes it found. Use --out to choose another output directory:

uv run policystrata demo --out runs/demo

No LLM API key is required for deterministic tests, benchmark runs, or the built-in demo.

Install

PolicyStrata is a CLI-first Python package. The public package provides the policystrata console script and importable Python modules.

python -m pip install policystrata
policystrata demo

For one-off CLI use without managing an environment:

uvx policystrata demo
pipx run policystrata demo

Repository examples under examples/, Docker Compose fixtures, and evidence scripts are available from a GitHub checkout or source distribution. The wheel installs the runtime package and built-in domain fixtures used by policystrata demo, run, init-domain, and scan.

Use As A Template

Click Use this template on GitHub, then start with the deterministic fixtures:

uv sync --extra dev
uv run policystrata run --domain support_saas --suite seeded --out runs/example
uv run policystrata summarize runs/example

To copy a built-in domain fixture into your tree:

uv run policystrata init-domain support_saas --out examples/my-policystrata-domain

Keep custom integrations as adapters. The policy oracle should stay independent from SQL compiler behavior, external eval frameworks, and model-provider behavior.

What It Tests

The core failure class is cross-layer policy drift:

Canonical policy:
  Analysts may view tenant-scoped aggregate ticket counts, but not customer-level PII.

Model-visible manifest or grammar:
  Accidentally exposes customer_email as a dimension.

SQL compiler:
  Accidentally drops the tenant predicate while lowering an authorized aggregate.

Output layer:
  Releases the result because the final answer looks like a summary.

PolicyStrata result:
  A minimized witness localizes the violated layer and failed obligation.

PolicyStrata does not assume every layer should behave identically. Each surface has a declared responsibility:

manifest: expose model-visible capabilities without stale or forbidden options.
grammar: parse the declared intent space and preserve untrusted intent for validation.
validator: authorize semantic queries and bind principal, tenant, time, and budget obligations.
compiler: lower authorized semantic IR into SQL while preserving metric, tenant, time, and row obligations.
database: contain row access with RLS and other database-side controls.
release: withhold contained or unauthorized results.

See docs/failure-taxonomy.md for how witness classes map to concrete policy-drift failures.

Run Benchmarks

PolicyStrata ships with deterministic support_saas and finance_saas benchmarks, generated mutation suites, minimized witnesses, JSONL traces, baseline comparisons, and evidence tables.

uv run policystrata run --domain support_saas --suite seeded --out runs/example
uv run policystrata run \
  --domain support_saas \
  --suite generated \
  --count 500 \
  --seed 1729 \
  --out runs/generated
uv run policystrata run --domain finance_saas --suite seeded --out runs/finance
uv run policystrata baselines runs/example

The default run command writes:

runs/<id>/traces.jsonl
runs/<id>/summary.json
runs/<id>/metadata.json
runs/<id>/witnesses/*.json

metadata.json records the mutation operator set, suite provenance, evidence level, and detector-freeze status. Static suite YAML can declare suite_metadata so externally authored, detector-frozen, or incident-reconstruction cases stay separate from public/generated benchmark scores.

Regenerate paper-style evidence tables with:

scripts/reproduce-evidence.sh

Generate reviewer-facing artifact metrics for a run:

uv run policystrata artifact-report runs/repro/seeded

Current benchmark details are in docs/evidence.md, with methodology and claim boundaries in docs/methodology.md and EVAL_CARD.md.

Run The Scanner

policystrata scan is the production-oriented path. It treats PolicyStrata as a scanner and release gate, not as the authorization boundary.

Clean smoke test:

uv run policystrata scan --config examples/postgres_dbt/policystrata_clean.yaml --out runs/scan-clean

Intentional gate-failure fixture:

uv run policystrata scan --config examples/postgres_dbt/policystrata.yaml --out runs/scan

That fixture should exit 1 because it contains imported traces with known authorization, unsafe-release, and tenant-scope findings.

Scanner outputs include:

runs/scan-clean/scan.json
runs/scan-clean/findings.jsonl
runs/scan-clean/summary.json
runs/scan-clean/report.md
runs/scan-clean/witnesses/*.json
runs/scan-clean/scan.sarif  # when sarif: true

For a scanner run that also executes imported SQL beside canonical compiler SQL against the Docker/PostgreSQL fixture:

docker compose up -d postgres
uv run policystrata scan --config examples/postgres_dbt/policystrata_real_db_clean.yaml --out runs/scan-real-db-clean

Postgres access goes through Python/psycopg; host psql is not required. See docs/scanner.md for scanner configuration, gate behavior, state assertions, and real-database fixture details.

GitHub Action

Use the first-party action to run policystrata scan as a pull-request or release gate:

name: PolicyStrata

on:
  pull_request:
  push:
    branches: [main]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: raintree-technology/policystrata@v0.1.0
        with:
          config: policystrata.yaml
          out: runs/policystrata

See docs/github-action.md for inputs, artifact upload, and database fixture guidance.

Integrations And Exports

PolicyStrata keeps core execution independent from external eval frameworks. Adapter exports are available for downstream systems:

uv run policystrata export runs/example --format inspect --out runs/example/inspect.jsonl
uv run policystrata export runs/example --format benchflow --out runs/example/benchflow.json

The repo also includes a small dbt Semantic Layer adapter and fixture:

uv run policystrata check-integration dbt-semantic \
  --domain finance_saas \
  --path examples/integrations/dbt_semantic/finance_saas/semantic_models.yml

See docs/trace-interop.md for adapter field mappings.

Reference Docs

docs/benchmark-reference.md: domains, generated mutants, baselines, and witness shape.
docs/scanner.md: scanner inputs, gates, state assertions, and PostgreSQL fixture use.
docs/github-action.md: CI wrapper for policystrata scan.
docs/distribution-roadmap.md: CLI, GitHub Action, SDK, MCP, and GitHub CLI extension sequence.
docs/evidence.md: current evidence snapshot and reproduction commands.
docs/methodology.md: claims, limitations, mutant definitions, and witness minimization.
EVAL_CARD.md: benchmark provenance, evidence levels, and eval boundaries.
docs/open-source-commercial-strategy.md: packaging and product boundary.

Development

uv run pytest
uv run ruff check .
uv run mypy src

The built-in support_saas domain is deterministic and seed-driven. Preserve JSON/YAML trace stability when extending artifacts; add fields compatibly.

Status

PolicyStrata is an early research artifact. It is useful for reproducing the paper's core failure model and for building regression gates around real stacks. It does not prove recall on unknown production incidents, and it should not be represented as a production security scanner by itself.

Project details

These details have not been verified by PyPI

Project links

Paper

Release history Release notifications | RSS feed

1.0.0

Jun 27, 2026

0.1.6

Jun 27, 2026

0.1.5

Jun 26, 2026

0.1.4

Jun 26, 2026

0.1.3

Jun 26, 2026

0.1.2

Jun 26, 2026

0.1.1

Jun 26, 2026

This version

0.1.0

Jun 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

policystrata-0.1.0.tar.gz (132.4 kB view details)

Uploaded Jun 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

policystrata-0.1.0-py3-none-any.whl (59.4 kB view details)

Uploaded Jun 25, 2026 Python 3

File details

Details for the file policystrata-0.1.0.tar.gz.

File metadata

Download URL: policystrata-0.1.0.tar.gz
Upload date: Jun 25, 2026
Size: 132.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for policystrata-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`86b3899063b5191b2d4af3b21c93c231b735ab593d29b0e2a3aaf77a74149373`
MD5	`fbf1731b87f4666199b135d4d82f2b2e`
BLAKE2b-256	`f1b40a2e64c58be549431a5a91d7c031de69625963575baaf5f7ad9ad4d37f8d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for policystrata-0.1.0.tar.gz:

Publisher: publish.yml on raintree-technology/policystrata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: policystrata-0.1.0.tar.gz
- Subject digest: 86b3899063b5191b2d4af3b21c93c231b735ab593d29b0e2a3aaf77a74149373
- Sigstore transparency entry: 1958927635
- Sigstore integration time: Jun 25, 2026
Source repository:
- Permalink: raintree-technology/policystrata@6e1a01aab904c7c46164ee1c953ba5419c3041e9
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/raintree-technology
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6e1a01aab904c7c46164ee1c953ba5419c3041e9
- Trigger Event: workflow_dispatch

File details

Details for the file policystrata-0.1.0-py3-none-any.whl.

File metadata

Download URL: policystrata-0.1.0-py3-none-any.whl
Upload date: Jun 25, 2026
Size: 59.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for policystrata-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`51d83b728cdf188fdfb43a0f43e3a63fc6fca2e1bcc445fb4ff55fa1c78dc0b5`
MD5	`6ff917a6a7ff19169c0815521b3d64b9`
BLAKE2b-256	`c533efcef31632d9011e2a626f4965c58f905c5b88fd30c95ee51f632e41af7a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for policystrata-0.1.0-py3-none-any.whl:

Publisher: publish.yml on raintree-technology/policystrata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: policystrata-0.1.0-py3-none-any.whl
- Subject digest: 51d83b728cdf188fdfb43a0f43e3a63fc6fca2e1bcc445fb4ff55fa1c78dc0b5
- Sigstore transparency entry: 1958927781
- Sigstore integration time: Jun 25, 2026
Source repository:
- Permalink: raintree-technology/policystrata@6e1a01aab904c7c46164ee1c953ba5419c3041e9
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/raintree-technology
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6e1a01aab904c7c46164ee1c953ba5419c3041e9
- Trigger Event: workflow_dispatch

policystrata 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

PolicyStrata

Quick Start

Install

Use As A Template

What It Tests

Run Benchmarks

Run The Scanner

GitHub Action

Integrations And Exports

Reference Docs

Development

Status

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance