Skip to main content

Guardrails that can be used to check inputs and outputs of functions and works well with Openlayer tracing.

Project description

Openlayer Guardrails

Open source guardrail implementations that work with Openlayer tracing.

Installation

pip install openlayer-guardrails

Usage

Standalone Usage

from openlayer_guardrails import PIIGuardrail

# Create guardrail
pii_guard = PIIGuardrail(
    block_entities={"CREDIT_CARD", "US_SSN"},
    redact_entities={"EMAIL_ADDRESS", "PHONE_NUMBER"}
)

# Check inputs manually
data = {"message": "My email is john@example.com and SSN is 123-45-6789"}
result = pii_guard.check_input(data)

if result.action.value == "block":
    print(f"Blocked: {result.reason}")
elif result.action.value == "modify":
    print(f"Modified data: {result.modified_data}")

With Openlayer Tracing

from openlayer_guardrails import PIIGuardrail
from openlayer.lib.tracing import trace

# Create guardrail
pii_guard = PIIGuardrail()

# Apply to traced functions
@trace(guardrails=[pii_guard])
def process_user_data(user_input: str):
    return f"Processed: {user_input}"

# PII is automatically handled
result = process_user_data("My email is john@example.com")
# Output: "Processed: My email is [EMAIL-REDACTED]"

Toxicity Guardrail (Brazilian Portuguese)

Detects toxic content in Brazilian Portuguese using the ToxiGuardrailPT model.

pip install 'openlayer-guardrails[toxicity]'
from openlayer_guardrails import ToxicityPTGuardrail

# Create guardrail (default threshold=0.0; positive scores = safe, negative = toxic)
toxicity_guard = ToxicityPTGuardrail()

# Check inputs
result = toxicity_guard.check_input({"message": "Você é um idiota!"})
print(result.action)  # GuardrailAction.BLOCK

# Check outputs with contextual scoring (sentence-pair encoding)
result = toxicity_guard.check_output(
    output="Claro, aqui está a informação solicitada.",
    inputs={"prompt": "Me ajude com meu trabalho."},
)
print(result.action)  # GuardrailAction.ALLOW

Toxicity Guardrail (English)

Detects toxic content in English across six categories using unitary/toxic-bert.

pip install 'openlayer-guardrails[toxicity]'
from openlayer_guardrails import ToxicityENGuardrail

# Create guardrail (default threshold=0.5)
toxicity_guard = ToxicityENGuardrail()

# Check inputs
result = toxicity_guard.check_input({"message": "You are terrible and should die"})
print(result.action)  # GuardrailAction.BLOCK
print(result.metadata["triggered_categories"])
# e.g. {'toxic': 0.98, 'severe_toxic': 0.72, 'insult': 0.89, 'threat': 0.81}

# Monitor only specific categories
guard = ToxicityENGuardrail(categories={"threat", "severe_toxic"})

Handling long texts

By default, all guardrails truncate inputs to 512 tokens for fast inference. To evaluate the full text, enable chunking mode by setting max_length=None:

guard = ToxicityPTGuardrail(max_length=None)   # or ToxicityENGuardrail(max_length=None)

In chunking mode, long texts are split into overlapping 512-token windows and each window is scored independently. The most toxic score across all windows is used. Latency scales linearly with the number of chunks.

Model Limitations

Prompt Injection Guardrail

Property Value
Model meta-llama/Prompt-Guard-86M
Max tokens 512
Language English
Parameters 86M
Scope Input-only (outputs are not checked)

Texts longer than 512 tokens are truncated. Only the first 512 tokens are evaluated.

Toxicity Guardrail (PT-BR)

Property Value
Model nicholasKluge/ToxiGuardrailPT
Max tokens 512
Language Brazilian Portuguese
Parameters 109M
Architecture BERTimbau (bert-base-portuguese-cased)
Output type Single scalar reward score (positive = safe, negative = toxic)
Reported accuracy 70.36% (hatecheck-portuguese), 74.04% (told-br)
Scope Input and output (output uses sentence-pair encoding for context)

By default, texts longer than 512 tokens are truncated. Set max_length=None to enable chunking for full-text coverage. The model was trained on Brazilian Portuguese data and may not generalize well to European Portuguese or other languages.

Toxicity Guardrail (EN)

Property Value
Model unitary/toxic-bert
Max tokens 512 (chunking available via max_length=None)
Language English
Parameters 110M
Architecture BERT (bert-base-uncased)
Output type Multi-label probabilities across 6 categories
Categories toxic, severe_toxic, obscene, threat, insult, identity_hate
Reported AUC 0.98636 (Jigsaw Toxic Comment Challenge)
Scope Input and output

By default, texts longer than 512 tokens are truncated. Set max_length=None to enable chunking for full-text coverage. The model was trained on English data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openlayer_guardrails-0.4.0.tar.gz (166.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openlayer_guardrails-0.4.0-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file openlayer_guardrails-0.4.0.tar.gz.

File metadata

  • Download URL: openlayer_guardrails-0.4.0.tar.gz
  • Upload date:
  • Size: 166.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for openlayer_guardrails-0.4.0.tar.gz
Algorithm Hash digest
SHA256 8992c156acda1d06e6a06fcafbe1463d7ee17d1049bed806ecad119b5fe2713c
MD5 673b890b1cc23d364e19bf1416a83ef2
BLAKE2b-256 58310281480b1ce6555a2e69adeed896eaf093cd01277ad8e3ae09052b78d64a

See more details on using hashes here.

File details

Details for the file openlayer_guardrails-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: openlayer_guardrails-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 19.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for openlayer_guardrails-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 756d57ec7c7b0f8bbde0b1ab7d56b97679a4b5febb21c015b190422602435654
MD5 3199596053aac40469bc476c1813ff42
BLAKE2b-256 e5f24ede94c57de87d277d8bae50a483b126d3188044980eafc9e83580afcdab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page