Skip to main content

Async LLM safety pipeline with input/output guardrails (Guardrails-AI, NeMo) and Langfuse tracing.

Project description

pynop

tests PyPI Python 3.12 License: MIT

Async LLM safety pipeline with input/output guardrails and observability tracing.

Install

pynop is published on PyPI under the distribution name pynop-ai. The Python import name is pynop.

pip install pynop-ai
# or
uv add pynop-ai
import pynop
from pynop import SafetyPipeline

The core install ships with OpenAI support, Guardrails-AI, and Langfuse tracing. Additional providers and tools are available as optional extras:

Extra Adds Use when
pynop-ai[anthropic] langchain-anthropic You configure provider: anthropic in YAML
pynop-ai[google] langchain-google-genai You configure provider: google in YAML
pynop-ai[nemo] nemoguardrails You add a type: nemo guard
pynop-ai[eval] garak, giskard You call EvalRunner.run_garak / run_giskard
pynop-ai[all] All of the above You want everything
pip install "pynop-ai[anthropic,nemo]"
# or, install everything
uv add "pynop-ai[all]"

Pynop imports the optional dependencies lazily — picking a provider or tool you didn't install raises a clear ModuleNotFoundError at from_yaml / run_* time.

Setup

LLM provider

pynop requires an API key from your chosen LLM provider. Sign up and obtain a key from one of:

pynop does not cap or monitor LLM spend. Every pipeline run, and each reask retry, incurs token costs. Cost management is the user's responsibility.

Guardrails-AI validators

Validators must be installed before use via the Guardrails Hub CLI:

guardrails hub install hub://guardrails/detect_pii
guardrails hub install hub://guardrails/toxic_language

Browse available validators at hub.guardrailsai.com. Any validator referenced in your config that is not installed will cause an AttributeError at pipeline startup.

Pin validators to a specific version to avoid silent behavioral changes when a validator package updates:

guardrails hub install "hub://guardrails/detect_pii~=1.4"

Validators do not update automatically. To update to the latest version, run:

guardrails hub install hub://guardrails/detect_pii --upgrade

Langfuse (tracing)

Tracing requires a Langfuse instance. Sign up at langfuse.com or self-host. Obtain a public key and secret key from your project settings.

Environment variables

Set the required env vars before running pynop. Missing vars referenced in config raise a ValueError at startup:

export OPENAI_API_KEY=sk-...          # or your provider's key
export LANGFUSE_PUBLIC_KEY=pk-...     # if tracing is enabled
export LANGFUSE_SECRET_KEY=sk-...     # if tracing is enabled

Usage

import asyncio
from pynop import SafetyPipeline

async def main():
    pipeline = SafetyPipeline.from_yaml("config.yaml")
    result = await pipeline.run("Summarize this document for me.")
    print(result.output)

    # Select an environment profile
    pipeline = SafetyPipeline.from_yaml("config.yaml", env="prod")

asyncio.run(main())

Config

See config.yaml for the default configuration. Supports:

  • LLM: Multi-backend via LangChain — OpenAI, Anthropic, Google, and local (Ollama/vLLM/LM Studio)
  • Guards: Guardrails-AI validators (PII, schema) and NeMo Guardrails (jailbreak, content safety) — configurable per input/output slot, run in config order
  • Tracing: Langfuse observability (optional, auto-reads env vars)
  • Eval thresholds: Configurable pass/fail criteria for evaluation runs
  • Environment profiles: Per-environment config overrides (dev, staging, prod)

LLM providers

# OpenAI
llm:
  provider: openai
  model: gpt-4o-mini
  api_key: ${OPENAI_API_KEY}

# Anthropic
llm:
  provider: anthropic
  model: claude-sonnet-4-20250514
  api_key: ${ANTHROPIC_API_KEY}

# Google Gemini
llm:
  provider: google
  model: gemini-2.0-flash
  api_key: ${GOOGLE_API_KEY}

# Local (OpenAI-compatible server — Ollama, vLLM, LM Studio)
llm:
  provider: local
  model: llama3
  base_url: http://localhost:11434/v1
  api_key: not-needed

You can also pass a pre-built LangChain BaseChatModel directly to the constructor (skipping from_yaml):

from langchain_openai import ChatOpenAI
from pynop import SafetyPipeline
from pynop.tracing import Tracer
from pynop.types import GuardSlot

custom_llm = ChatOpenAI(model="gpt-4o", temperature=0)

pipeline = SafetyPipeline(
    llm_config={"provider": "openai", "model": "gpt-4o"},
    input_slot=GuardSlot(),     # add guards if you want input validation
    output_slot=GuardSlot(),    # add guards if you want output validation
    tracer=Tracer(enabled=False),
    llm=custom_llm,
)

Guard slots

Each guard slot (input/output) supports configurable rejection and error strategies:

guards:
  input:
    on_guard_fail: reject           # reject | return_canned | include_reason
    on_guard_error: reject          # reject | pass
    canned_response: "I can't process that request."  # required for return_canned
    guards:
      - type: guardrails_ai
        validators:
          - name: DetectPII
            on_fail: exception
      - type: nemo
        config_path: ./nemo_configs/input_rails

on_guard_fail — what happens when a guard rejects input/output. Set at slot level as a default; individual guards can override:

  • reject (default): raise GuardRejection with generic message
  • return_canned: return a PipelineResult with the canned_response string, skip LLM call
  • include_reason: raise GuardRejection with the guard's rejection reason attached
  • reask (output guards only): re-call the LLM with the rejection reason appended, then re-run all output guards. Falls back to reject after max_reask retries (default: 2)
guards:
  output:
    on_guard_fail: reject                # slot default
    guards:
      - type: guardrails_ai
        on_guard_fail: reask             # per-guard override
        max_reask: 3
        reask_instruction: "Your response was flagged: {reason}. Rewrite it."
        validators:
          - name: ToxicLanguage
            on_fail: exception
      - type: guardrails_ai
        # inherits slot default: reject
        validators:
          - name: DetectPII
            on_fail: exception

Guard ordering matters when mixing strategies — guards run in config order and stop at the first failure.

on_guard_error — what happens when a guard crashes (unexpected exception):

  • reject (default): treat the error as a guard failure (applies on_guard_fail strategy)
  • pass: log the error, skip the failed guard, continue to next guard

NeMo Guardrails

NeMo guards can be configured in two ways:

Inline rails (recommended) — declare rails by name directly in config. pynop generates the NeMo config automatically. Built-in NeMo rails (jailbreak, content safety, PII) are referenced directly; parameterized rails (topic control) accept custom parameters:

guards:
  input:
    guards:
      - type: nemo
        rails:
          - jailbreak
          - topic_control:
              allowed: [coding, data science]
              denied: [politics, violence]
  output:
    guards:
      - type: nemo
        rails:
          - content_safety
          - pii

Custom config directory — for rails that require custom Colang flows, point to a directory containing a config.yml and .co files:

      - type: nemo
        config_path: ./my_custom_rails

rails and config_path are mutually exclusive on a single guard entry. See nemo_configs/ for custom config examples.

Environment profiles

Define per-environment overrides in the environments: section. Each profile replaces entire top-level sections (no deep merge). Sections not defined in a profile fall through to the base config.

eval:
  max_issues: 0

environments:
  dev:
    tracing:
      enabled: false
    eval:
      max_issues: 10
      ignore_severities: [minor]
  prod:
    eval:
      max_issues: 0

Select an environment via the env parameter or the PYNOP_ENV env var:

pipeline = SafetyPipeline.from_yaml("config.yaml", env="dev")
# or: export PYNOP_ENV=dev

Eval thresholds

The eval: section configures pass/fail criteria for evaluation runs:

eval:
  max_issues: 0                # maximum issues before failing (default: 0)
  ignore_severities: [minor]   # exclude these severity levels from the count
  garak_severities:            # map Garak probe families to severity levels
    dan: major
    glitch: minor
    # unlisted probes default to major

Severity levels are major, medium, and minor. Without an eval: section, the default is zero-tolerance (any issue fails).

Per-tool thresholds

Garak and Giskard can have different thresholds within the same pipeline. Add a garak: or giskard: block under eval: — each block inherits from the top-level defaults and only overrides the keys you set:

eval:
  max_issues: 0
  ignore_severities: [minor]   # default applied to both tools
  garak:
    max_issues: 0              # zero tolerance for vulnerability scans
    ignore_severities: []      # don't ignore minor either
  giskard:
    max_issues: 3              # lenient for quality checks
    ignore_severities: [minor]

Use pipeline.eval_threshold_for("garak") (or "giskard") to inspect the resolved threshold from Python. EvalRunner uses the per-tool threshold automatically when computing EvalResult.passed.

Evaluation

Run pre-deployment security evaluations using Garak and Giskard:

from pynop import SafetyPipeline
from pynop.eval import EvalRunner

pipeline = SafetyPipeline.from_yaml("config.yaml")
runner = EvalRunner(pipeline)

# Garak vulnerability scan
garak_result = await runner.run_garak(probes=["dan", "promptinject"])
print(garak_result.summary)
print(garak_result.passed)    # uses the eval threshold from config

# Giskard quality scan
giskard_result = await runner.run_giskard(detectors=["prompt_injection"])
print(giskard_result.summary)
print(giskard_result.issues)

Both tools evaluate the full pipeline (guards + LLM). Results are traced in Langfuse when tracing is enabled.

Before running evaluations, review the available probe families and detectors to determine which are relevant to your use case:

pynop does not provide CI/CD integration. EvalRunner returns a Python result object — wiring evaluations into a CI pipeline (e.g. failing a build on low scores) is the user's responsibility.

Latency benchmarking

LatencyBenchmark compares the per-guard latency of two pipeline configurations side-by-side. It runs a prompt set through both pipelines, fetches the resulting traces from Langfuse, and reports per-span p50/p95/p99.

from pynop import LatencyBenchmark, SafetyPipeline

baseline = SafetyPipeline.from_yaml("config.baseline.yaml")
candidate = SafetyPipeline.from_yaml("config.candidate.yaml")

benchmark = LatencyBenchmark(baseline, candidate, label_a="baseline", label_b="candidate")
report = await benchmark.run([
    "Summarize this document.",
    "Explain quantum computing in one paragraph.",
    "Write a haiku about CI pipelines.",
])

for span in report.stats_a:
    print(f"{span.name:30s} p50={span.p50:.3f}s p95={span.p95:.3f}s")
print(f"baseline total p95: {report.total_a.p95:.3f}s")
print(f"candidate total p95: {report.total_b.p95:.3f}s")

LatencyBenchmark requires both pipelines to have Langfuse tracing enabled — it reads the per-span latency from Langfuse rather than instrumenting timers itself.

Integration testing

The default uv run pytest command runs the unit suite with mocked OpenAI, Langfuse, and Guardrails-AI. End-to-end integration tests live in tests/integration/ and are opt-in — they hit real OpenAI, Garak, Giskard, and Langfuse, so they require API keys and network access.

Enable them by setting PYNOP_INTEGRATION=1:

export PYNOP_INTEGRATION=1
export OPENAI_API_KEY=sk-...
export LANGFUSE_PUBLIC_KEY=pk-...
export LANGFUSE_SECRET_KEY=sk-...
uv run pytest tests/integration/

Without PYNOP_INTEGRATION=1, every test in tests/integration/ is skipped — safe to run on a developer laptop or in PR-level CI.

Development

uv sync
uv run pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pynop_ai-1.0.0.tar.gz (25.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pynop_ai-1.0.0-py3-none-any.whl (29.5 kB view details)

Uploaded Python 3

File details

Details for the file pynop_ai-1.0.0.tar.gz.

File metadata

  • Download URL: pynop_ai-1.0.0.tar.gz
  • Upload date:
  • Size: 25.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pynop_ai-1.0.0.tar.gz
Algorithm Hash digest
SHA256 ba31593af8109d1844b50aa651f42c071c33a19090abc4849e02f4cbb761a569
MD5 f00f80e979b3c53756e53143acd5dfd1
BLAKE2b-256 1fa8d9ff92566a399410080523ad72874510e5bc9c1c7634e3bdbb3dbf693050

See more details on using hashes here.

File details

Details for the file pynop_ai-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: pynop_ai-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 29.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pynop_ai-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ba2c1d15adfac0d98e5449cbac9341d0445c10c04b29310ad1d48b57758fe37a
MD5 e15c8b7d2206b3bc1aa9933f51863a13
BLAKE2b-256 edf97261ad624794ecfe2c940df0e58199d85bf94723905931d6dd96f29136b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page