Skip to main content

Guardrails that can be used to check inputs and outputs of functions and works well with Openlayer tracing.

Project description

Openlayer Guardrails

Open source guardrail implementations that work with Openlayer tracing.

Installation

pip install openlayer-guardrails

Usage

Standalone Usage

from openlayer_guardrails import PIIGuardrail

# Create guardrail
pii_guard = PIIGuardrail(
    block_entities={"CREDIT_CARD", "US_SSN"},
    redact_entities={"EMAIL_ADDRESS", "PHONE_NUMBER"}
)

# Check inputs manually
data = {"message": "My email is john@example.com and SSN is 123-45-6789"}
result = pii_guard.check_input(data)

if result.action.value == "block":
    print(f"Blocked: {result.reason}")
elif result.action.value == "modify":
    print(f"Modified data: {result.modified_data}")

With Openlayer Tracing

from openlayer_guardrails import PIIGuardrail
from openlayer.lib.tracing import trace

# Create guardrail
pii_guard = PIIGuardrail()

# Apply to traced functions
@trace(guardrails=[pii_guard])
def process_user_data(user_input: str):
    return f"Processed: {user_input}"

# PII is automatically handled
result = process_user_data("My email is john@example.com")
# Output: "Processed: My email is [EMAIL-REDACTED]"

Toxicity Guardrail (Brazilian Portuguese)

Detects toxic content in Brazilian Portuguese using the ToxiGuardrailPT model.

pip install 'openlayer-guardrails[toxicity]'
from openlayer_guardrails import ToxicityPTGuardrail

# Create guardrail (default threshold=0.0; positive scores = safe, negative = toxic)
toxicity_guard = ToxicityPTGuardrail()

# Check inputs
result = toxicity_guard.check_input({"message": "Você é um idiota!"})
print(result.action)  # GuardrailAction.BLOCK

# Check outputs with contextual scoring (sentence-pair encoding)
result = toxicity_guard.check_output(
    output="Claro, aqui está a informação solicitada.",
    inputs={"prompt": "Me ajude com meu trabalho."},
)
print(result.action)  # GuardrailAction.ALLOW

Toxicity Guardrail (English)

Detects toxic content in English across six categories using unitary/toxic-bert.

pip install 'openlayer-guardrails[toxicity]'
from openlayer_guardrails import ToxicityENGuardrail

# Create guardrail (default threshold=0.5)
toxicity_guard = ToxicityENGuardrail()

# Check inputs
result = toxicity_guard.check_input({"message": "You are terrible and should die"})
print(result.action)  # GuardrailAction.BLOCK
print(result.metadata["triggered_categories"])
# e.g. {'toxic': 0.98, 'severe_toxic': 0.72, 'insult': 0.89, 'threat': 0.81}

# Monitor only specific categories
guard = ToxicityENGuardrail(categories={"threat", "severe_toxic"})

Handling long texts

By default, all guardrails truncate inputs to 512 tokens for fast inference. To evaluate the full text, enable chunking mode by setting max_length=None:

guard = ToxicityPTGuardrail(max_length=None)   # or ToxicityENGuardrail(max_length=None)

In chunking mode, long texts are split into overlapping 512-token windows and each window is scored independently. The most toxic score across all windows is used. Latency scales linearly with the number of chunks.

Model Limitations

Prompt Injection Guardrail

Property Value
Model meta-llama/Prompt-Guard-86M
Max tokens 512
Language English
Parameters 86M
Scope Input-only (outputs are not checked)

Texts longer than 512 tokens are truncated. Only the first 512 tokens are evaluated.

Toxicity Guardrail (PT-BR)

Property Value
Model nicholasKluge/ToxiGuardrailPT
Max tokens 512
Language Brazilian Portuguese
Parameters 109M
Architecture BERTimbau (bert-base-portuguese-cased)
Output type Single scalar reward score (positive = safe, negative = toxic)
Reported accuracy 70.36% (hatecheck-portuguese), 74.04% (told-br)
Scope Input and output (output uses sentence-pair encoding for context)

By default, texts longer than 512 tokens are truncated. Set max_length=None to enable chunking for full-text coverage. The model was trained on Brazilian Portuguese data and may not generalize well to European Portuguese or other languages.

Toxicity Guardrail (EN)

Property Value
Model unitary/toxic-bert
Max tokens 512 (chunking available via max_length=None)
Language English
Parameters 110M
Architecture BERT (bert-base-uncased)
Output type Multi-label probabilities across 6 categories
Categories toxic, severe_toxic, obscene, threat, insult, identity_hate
Reported AUC 0.98636 (Jigsaw Toxic Comment Challenge)
Scope Input and output

By default, texts longer than 512 tokens are truncated. Set max_length=None to enable chunking for full-text coverage. The model was trained on English data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openlayer_guardrails-0.4.1.tar.gz (166.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openlayer_guardrails-0.4.1-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file openlayer_guardrails-0.4.1.tar.gz.

File metadata

  • Download URL: openlayer_guardrails-0.4.1.tar.gz
  • Upload date:
  • Size: 166.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for openlayer_guardrails-0.4.1.tar.gz
Algorithm Hash digest
SHA256 a60096da8ae4f2aa4152f0dbcb3b8ecc9da1c048aed45bc78b288999713a1760
MD5 1fb29c6e72fb57c6958749a389efc79b
BLAKE2b-256 3f4abbcf83832eb680e917bbf2a77112fb1dbd39a5bc2eb9b91a7e5b46254135

See more details on using hashes here.

File details

Details for the file openlayer_guardrails-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: openlayer_guardrails-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 19.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for openlayer_guardrails-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 82d215eb9bf463f2e8ef860c50ccb8b6c9fb214e179c3cfc7e984573b2c86c3a
MD5 02f57965923e5a715fbd9eefe707d3ba
BLAKE2b-256 7cb9aef720bb837ebcc2d83f5316b27e30182bb825838deb497833d02b74e1fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page