Skip to main content

Guardrails that can be used to check inputs and outputs of functions and works well with Openlayer tracing.

Project description

Openlayer Guardrails

Open source guardrail implementations that work with Openlayer tracing.

Installation

pip install openlayer-guardrails

Usage

Standalone Usage

from openlayer_guardrails import PIIGuardrail

# Create guardrail
pii_guard = PIIGuardrail(
    block_entities={"CREDIT_CARD", "US_SSN"},
    redact_entities={"EMAIL_ADDRESS", "PHONE_NUMBER"}
)

# Check inputs manually
data = {"message": "My email is john@example.com and SSN is 123-45-6789"}
result = pii_guard.check_input(data)

if result.action.value == "block":
    print(f"Blocked: {result.reason}")
elif result.action.value == "modify":
    print(f"Modified data: {result.modified_data}")

With Openlayer Tracing

from openlayer_guardrails import PIIGuardrail
from openlayer.lib.tracing import trace

# Create guardrail
pii_guard = PIIGuardrail()

# Apply to traced functions
@trace(guardrails=[pii_guard])
def process_user_data(user_input: str):
    return f"Processed: {user_input}"

# PII is automatically handled
result = process_user_data("My email is john@example.com")
# Output: "Processed: My email is [EMAIL-REDACTED]"

Toxicity Guardrail (Brazilian Portuguese)

Detects toxic content in Brazilian Portuguese using the ToxiGuardrailPT model.

pip install 'openlayer-guardrails[toxicity]'
from openlayer_guardrails import ToxicityPTGuardrail

# Create guardrail (default threshold=0.0; positive scores = safe, negative = toxic)
toxicity_guard = ToxicityPTGuardrail()

# Check inputs
result = toxicity_guard.check_input({"message": "Você é um idiota!"})
print(result.action)  # GuardrailAction.BLOCK

# Check outputs with contextual scoring (sentence-pair encoding)
result = toxicity_guard.check_output(
    output="Claro, aqui está a informação solicitada.",
    inputs={"prompt": "Me ajude com meu trabalho."},
)
print(result.action)  # GuardrailAction.ALLOW

Toxicity Guardrail (English)

Detects toxic content in English across six categories using unitary/toxic-bert.

pip install 'openlayer-guardrails[toxicity]'
from openlayer_guardrails import ToxicityENGuardrail

# Create guardrail (default threshold=0.5)
toxicity_guard = ToxicityENGuardrail()

# Check inputs
result = toxicity_guard.check_input({"message": "You are terrible and should die"})
print(result.action)  # GuardrailAction.BLOCK
print(result.metadata["triggered_categories"])
# e.g. {'toxic': 0.98, 'severe_toxic': 0.72, 'insult': 0.89, 'threat': 0.81}

# Monitor only specific categories
guard = ToxicityENGuardrail(categories={"threat", "severe_toxic"})

Handling long texts

By default, all guardrails truncate inputs to 512 tokens for fast inference. To evaluate the full text, enable chunking mode by setting max_length=None:

guard = ToxicityPTGuardrail(max_length=None)   # or ToxicityENGuardrail(max_length=None)

In chunking mode, long texts are split into overlapping 512-token windows and each window is scored independently. The most toxic score across all windows is used. Latency scales linearly with the number of chunks.

Model Limitations

Prompt Injection Guardrail

Property Value
Model meta-llama/Prompt-Guard-86M
Max tokens 512
Language English
Parameters 86M
Scope Input-only (outputs are not checked)

Texts longer than 512 tokens are truncated. Only the first 512 tokens are evaluated.

Toxicity Guardrail (PT-BR)

Property Value
Model nicholasKluge/ToxiGuardrailPT
Max tokens 512
Language Brazilian Portuguese
Parameters 109M
Architecture BERTimbau (bert-base-portuguese-cased)
Output type Single scalar reward score (positive = safe, negative = toxic)
Reported accuracy 70.36% (hatecheck-portuguese), 74.04% (told-br)
Scope Input and output (output uses sentence-pair encoding for context)

By default, texts longer than 512 tokens are truncated. Set max_length=None to enable chunking for full-text coverage. The model was trained on Brazilian Portuguese data and may not generalize well to European Portuguese or other languages.

Toxicity Guardrail (EN)

Property Value
Model unitary/toxic-bert
Max tokens 512 (chunking available via max_length=None)
Language English
Parameters 110M
Architecture BERT (bert-base-uncased)
Output type Multi-label probabilities across 6 categories
Categories toxic, severe_toxic, obscene, threat, insult, identity_hate
Reported AUC 0.98636 (Jigsaw Toxic Comment Challenge)
Scope Input and output

By default, texts longer than 512 tokens are truncated. Set max_length=None to enable chunking for full-text coverage. The model was trained on English data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openlayer_guardrails-0.3.0.tar.gz (164.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openlayer_guardrails-0.3.0-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file openlayer_guardrails-0.3.0.tar.gz.

File metadata

  • Download URL: openlayer_guardrails-0.3.0.tar.gz
  • Upload date:
  • Size: 164.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for openlayer_guardrails-0.3.0.tar.gz
Algorithm Hash digest
SHA256 a9b065f8c7acf9703b3d25c2e4bf01cf702299a461d4376c9bc606181048aad4
MD5 20d28d0d8542b7677ff1f8c64253b0f2
BLAKE2b-256 05ae5f031b575cdb5c32f224bd3b104ab89729d3d7e28cc4e1d3ea7d08089e13

See more details on using hashes here.

File details

Details for the file openlayer_guardrails-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: openlayer_guardrails-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for openlayer_guardrails-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 afcdc21646b443386513f616b0b712f4ed3c7ec147a268a28bd3b11fd2c9e4e4
MD5 8a1dbf90d9e4fa7665c2db16b9c4d0b3
BLAKE2b-256 05c0e650db55ec3bfb5edb68a422c30d92fcfe8007874892d1fd946ef3fb4001

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page