Guardrails that can be used to check inputs and outputs of functions and works well with Openlayer tracing.

Project description

Openlayer Guardrails

Open source guardrail implementations that work with Openlayer tracing.

Installation

pip install openlayer-guardrails

Usage

Standalone Usage

from openlayer_guardrails import PIIGuardrail

# Create guardrail
pii_guard = PIIGuardrail(
    block_entities={"CREDIT_CARD", "US_SSN"},
    redact_entities={"EMAIL_ADDRESS", "PHONE_NUMBER"}
)

# Check inputs manually
data = {"message": "My email is john@example.com and SSN is 123-45-6789"}
result = pii_guard.check_input(data)

if result.action.value == "block":
    print(f"Blocked: {result.reason}")
elif result.action.value == "modify":
    print(f"Modified data: {result.modified_data}")

With Openlayer Tracing

from openlayer_guardrails import PIIGuardrail
from openlayer.lib.tracing import trace

# Create guardrail
pii_guard = PIIGuardrail()

# Apply to traced functions
@trace(guardrails=[pii_guard])
def process_user_data(user_input: str):
    return f"Processed: {user_input}"

# PII is automatically handled
result = process_user_data("My email is john@example.com")
# Output: "Processed: My email is [EMAIL-REDACTED]"

Toxicity Guardrail (Brazilian Portuguese)

Detects toxic content in Brazilian Portuguese using the ToxiGuardrailPT model.

pip install 'openlayer-guardrails[toxicity]'

from openlayer_guardrails import ToxicityPTGuardrail

# Create guardrail (default threshold=0.0; positive scores = safe, negative = toxic)
toxicity_guard = ToxicityPTGuardrail()

# Check inputs
result = toxicity_guard.check_input({"message": "Você é um idiota!"})
print(result.action)  # GuardrailAction.BLOCK

# Check outputs with contextual scoring (sentence-pair encoding)
result = toxicity_guard.check_output(
    output="Claro, aqui está a informação solicitada.",
    inputs={"prompt": "Me ajude com meu trabalho."},
)
print(result.action)  # GuardrailAction.ALLOW

Toxicity Guardrail (English)

Detects toxic content in English across six categories using unitary/toxic-bert.

pip install 'openlayer-guardrails[toxicity]'

from openlayer_guardrails import ToxicityENGuardrail

# Create guardrail (default threshold=0.5)
toxicity_guard = ToxicityENGuardrail()

# Check inputs
result = toxicity_guard.check_input({"message": "You are terrible and should die"})
print(result.action)  # GuardrailAction.BLOCK
print(result.metadata["triggered_categories"])
# e.g. {'toxic': 0.98, 'severe_toxic': 0.72, 'insult': 0.89, 'threat': 0.81}

# Monitor only specific categories
guard = ToxicityENGuardrail(categories={"threat", "severe_toxic"})

Handling long texts

By default, all guardrails truncate inputs to 512 tokens for fast inference. To evaluate the full text, enable chunking mode by setting max_length=None:

guard = ToxicityPTGuardrail(max_length=None)   # or ToxicityENGuardrail(max_length=None)

In chunking mode, long texts are split into overlapping 512-token windows and each window is scored independently. The most toxic score across all windows is used. Latency scales linearly with the number of chunks.

Model Limitations

Prompt Injection Guardrail

Property	Value
Model	meta-llama/Prompt-Guard-86M
Max tokens	512
Language	English
Parameters	86M
Scope	Input-only (outputs are not checked)

Texts longer than 512 tokens are truncated. Only the first 512 tokens are evaluated.

Toxicity Guardrail (PT-BR)

Property	Value
Model	nicholasKluge/ToxiGuardrailPT
Max tokens	512
Language	Brazilian Portuguese
Parameters	109M
Architecture	BERTimbau (bert-base-portuguese-cased)
Output type	Single scalar reward score (positive = safe, negative = toxic)
Reported accuracy	70.36% (hatecheck-portuguese), 74.04% (told-br)
Scope	Input and output (output uses sentence-pair encoding for context)

By default, texts longer than 512 tokens are truncated. Set max_length=None to enable chunking for full-text coverage. The model was trained on Brazilian Portuguese data and may not generalize well to European Portuguese or other languages.

Toxicity Guardrail (EN)

Property	Value
Model	unitary/toxic-bert
Max tokens	512 (chunking available via `max_length=None`)
Language	English
Parameters	110M
Architecture	BERT (bert-base-uncased)
Output type	Multi-label probabilities across 6 categories
Categories	`toxic`, `severe_toxic`, `obscene`, `threat`, `insult`, `identity_hate`
Reported AUC	0.98636 (Jigsaw Toxic Comment Challenge)
Scope	Input and output

By default, texts longer than 512 tokens are truncated. Set max_length=None to enable chunking for full-text coverage. The model was trained on English data.

Project details

Release history Release notifications | RSS feed

0.4.1

Mar 6, 2026

0.4.0

Mar 2, 2026

This version

0.3.0

Mar 2, 2026

0.2.0

Feb 18, 2026

0.1.0

Sep 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openlayer_guardrails-0.3.0.tar.gz (164.0 kB view details)

Uploaded Mar 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

openlayer_guardrails-0.3.0-py3-none-any.whl (16.5 kB view details)

Uploaded Mar 2, 2026 Python 3

File details

Details for the file openlayer_guardrails-0.3.0.tar.gz.

File metadata

Download URL: openlayer_guardrails-0.3.0.tar.gz
Upload date: Mar 2, 2026
Size: 164.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for openlayer_guardrails-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`a9b065f8c7acf9703b3d25c2e4bf01cf702299a461d4376c9bc606181048aad4`
MD5	`20d28d0d8542b7677ff1f8c64253b0f2`
BLAKE2b-256	`05ae5f031b575cdb5c32f224bd3b104ab89729d3d7e28cc4e1d3ea7d08089e13`

See more details on using hashes here.

File details

Details for the file openlayer_guardrails-0.3.0-py3-none-any.whl.

File metadata

Download URL: openlayer_guardrails-0.3.0-py3-none-any.whl
Upload date: Mar 2, 2026
Size: 16.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for openlayer_guardrails-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`afcdc21646b443386513f616b0b712f4ed3c7ec147a268a28bd3b11fd2c9e4e4`
MD5	`8a1dbf90d9e4fa7665c2db16b9c4d0b3`
BLAKE2b-256	`05c0e650db55ec3bfb5edb68a422c30d92fcfe8007874892d1fd946ef3fb4001`

See more details on using hashes here.

openlayer-guardrails 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Openlayer Guardrails

Installation

Usage

Standalone Usage

With Openlayer Tracing

Toxicity Guardrail (Brazilian Portuguese)

Toxicity Guardrail (English)

Handling long texts

Model Limitations

Prompt Injection Guardrail

Toxicity Guardrail (PT-BR)

Toxicity Guardrail (EN)

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes