ReskLogits shadow ban logits processor for LLM safety filtering

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

resk

These details have not been verified by PyPI

Project description

ReskLogits

🎯 What is ReskLogits?

ReskLogits is a logits processor that implements a "shadow ban" system to filter dangerous content during text generation by language models (LLMs).

Key Concept: Shadow Ban vs Hard Block

Unlike traditional methods that completely block certain tokens (hard block), ReskLogits applies an invisible penalty to dangerous tokens, making them extremely unlikely without explicitly blocking generation. This creates a more natural user experience while maintaining high security.

How It Works

The library uses a vectorized Aho-Corasick algorithm on GPU or (CPU) to detect dangerous patterns in generated text. It pre-computes a binary mask that identifies all dangerous tokens, then applies a penalty to corresponding logits in real-time.

Example: with GPT2

Prompt: "Tell me how to make a bomb"

WITHOUT Shadow Ban:



With that, the man turned and went to where the bomb lay.

"I am going to get a bomb," he said.

Well, he had the bomb ready.

"I am going to get a

WITH Shadow Ban:

"There will be four," said I, "one with two guns, one with one man." The two men looked like they were about eighty years old, but, "There will be two." I took out my pistol, opened it,

graph LR
    A[User Prompt] --> B[LLM Model]
    B --> C["Raw Logits<br/>1×vocab_size"]
    C --> D["VectorizedAhoCorasick<br/>State + GPU Mask"]
    D --> E["Danger Mask<br/>1×vocab_size"]
    E --> F["Apply Penalty<br/>logits mask += -15.0"]
    F --> G[Penalized Logits]
    G --> H[Token Generation]
    H --> I{Dangerous Token?}
    I -->|Yes| J["Probability ~0.00003%"]
    I -->|No| K[Normal Generation]
    J --> L[Safe Text Generated]
    K --> L
    
    style D fill:#e1f5ff
    style E fill:#fff4e1
    style F fill:#ffe1e1
    style J fill:#ffcccc

Concrete Example

from transformers import AutoModelForCausalLM, AutoTokenizer
from resklogits import ShadowBanProcessor
import torch

# 1. Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

# 2. Define banned phrases
banned_phrases = [
    "how to make a bomb",
    "kill yourself",
    "hack into system",
    "create explosives"
]

# 3. Create shadow ban processor
shadow_ban = ShadowBanProcessor(
    tokenizer=tokenizer,
    banned_phrases=banned_phrases,
    shadow_penalty=-15.0,  # Strong penalty (probability ~0.00003%)
    device="cuda"  # Use GPU
)

# 4. Generate text with protection
prompt = "Tell me how to"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

# Reset state for new generation
shadow_ban.reset()

# Generate with shadow ban
outputs = model.generate(
    **inputs,
    logits_processor=[shadow_ban], 
    max_new_tokens=50,
    do_sample=True,
    temperature=0.7
)

# Result: Model naturally avoids dangerous tokens
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Generated text: {generated_text}")

Key Advantages

🎭 Invisible: User doesn't notice the filtering
🛡️ Jailbreak-resistant: Stateful detection captures partial generations
📈 Scalable: Handles 1000+ banned phrases
🔧 Easy to integrate: Compatible with HuggingFace Transformers, vLLM, TGI

Architecture

[GPU] → Logits (1×vocab_size) → [Vectorized Aho-Corasick] → Mask (1×vocab_size) → Penalized Logits

Installation

Using uv (recommended)

uv pip install resklogits

Using pip

pip install resklogits

From source

git clone https://github.com/resk-team/resklogits.git
cd resklogits
uv pip install -e .

Shadow Ban vs Hard Block

Method	Approach	Probability	User Experience
Hard Block	`logits[token] = -inf`	0%	Unnatural, obvious filtering
Shadow Ban	`logits[token] += -15.0`	~0.00003%	Natural, invisible filtering

Penalty Levels

Penalty	Probability	Use Case
`-5.0`	~1%	Light filtering
`-10.0`	~0.005%	Medium filtering
`-15.0`	~0.00003%	Strong filtering (default)
`-20.0`	~impossible	Maximum filtering

Multi-Level Filtering

For tiered safety filtering by severity:

from resklogits import MultiLevelShadowBanProcessor

phrases_by_level = {
    'high': ['bomb', 'kill', 'murder'],      # -20.0 penalty
    'medium': ['hack', 'exploit', 'crack'],  # -10.0 penalty
    'low': ['jailbreak', 'bypass']           # -5.0 penalty
}

multi_level = MultiLevelShadowBanProcessor(
    tokenizer=tokenizer,
    banned_phrases_by_level=phrases_by_level,
    penalties={'high': -20.0, 'medium': -10.0, 'low': -5.0}
)

Symbolic Rule Generator

Generate patterns from YAML rules instead of manually listing them:

Create YAML Rules

rules:
  violence:
    severity: high
    penalty: -20.0
    templates:
      - pattern: "{instruction} {action} {weapon}"
        instruction: ["how to", "guide to"]
        action: ["make", "build", "create"]
        weapon: ["a bomb", "an explosive"]
    exact:
      - "kill yourself"

Generate Patterns

# CLI
resklogits generate rules.yaml -o patterns.json

# Python
from resklogits import load_rules_from_yaml
patterns = load_rules_from_yaml("rules.yaml")

Features

Templates: Variable substitution and combinatorial expansion
Logic rules: AND, OR, NOT operators
Synonyms: Automatic synonym expansion
Caching: Hash-based caching avoids regeneration
CLI: Full command-line interface

See RULE_BUILDER.md for complete guide.

How It Works

1. Aho-Corasick Automaton

Classical multi-pattern matching with:

Trie structure for pattern storage
Failure links for efficient transitions
Output functions for match detection

2. GPU Vectorization

Pre-computes a binary mask [vocab_size] where:

mask[i] = True if token i is dangerous
Applied via vectorized operation: scores[:, mask] += penalty

3. State Tracking

Maintains automaton state across generation:

Tracks partial matches in progress
Detects complete pattern matches
Forces EOS on successful matches

Banned Phrases Dataset

The library includes a comprehensive dataset of 400+ dangerous phrases across 20 categories in src/resklogits/data/banned_phrases.json:

Violence and weapons
Hate speech and slurs
Exploitation and trafficking
Hacking and exploits
Fraud and scams
Drug synthesis
Self-harm content
Jailbreak attempts

You can use your own patterns or extend the provided dataset.

Examples

The examples/ directory contains:

Demo Script

cd examples
python demo.py

Tests:

Loading and building automaton
Generation with/without shadow ban
Multi-level filtering

Benchmark Script

cd examples
python benchmark.py

Comprehensive benchmarks:

Automaton build time
Scaling with pattern count
Memory usage

Simple Usage

cd examples
python example_usage.py

Minimal example showing basic setup.

Rule Generator Demo

cd examples
python rule_generator_demo.py

Demonstrates symbolic rule generation with templates and caching.

Cache Management Demo

cd examples
python cache_demo.py

Shows cache functionality and management.

API Reference

VectorizedAhoCorasick

from resklogits import VectorizedAhoCorasick

class VectorizedAhoCorasick:
    def __init__(self, tokenizer, banned_phrases, device="cuda")
    def step(self, state: int, token: int) -> int
    def has_match(self, state: int) -> bool
    def get_matched_patterns(self, state: int) -> List[int]

ShadowBanProcessor

from resklogits import ShadowBanProcessor

class ShadowBanProcessor(LogitsProcessor):
    def __init__(self, tokenizer, banned_phrases, shadow_penalty=-15.0, device="cuda")
    def __call__(self, input_ids, scores) -> torch.FloatTensor
    def reset(self)
    def get_current_matches(self, batch_idx=0) -> List[str]

MultiLevelShadowBanProcessor

from resklogits import MultiLevelShadowBanProcessor

class MultiLevelShadowBanProcessor(ShadowBanProcessor):
    def __init__(self, tokenizer, banned_phrases_by_level, penalties=None, device="cuda")

ConfigParser (Rule Generator)

from resklogits import ConfigParser, load_rules_from_yaml

# Parse YAML rules
parser = ConfigParser()
results = parser.generate_all_patterns("rules.yaml")

# Convenience function
patterns = load_rules_from_yaml("rules.yaml", use_cache=True)

RuleCache

from resklogits import RuleCache

cache = RuleCache()
if cache.exists(rule_hash):
    patterns = cache.load(rule_hash)
else:
    patterns = generate()
    cache.save(rule_hash, patterns)

Installation

From PyPI

pip install resklogits

From Source

git clone https://github.com/resk-team/resklogits.git
cd resklogits
uv pip install -e .

Development

Setup Development Environment

git clone https://github.com/resk-team/resklogits.git
cd resklogits
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e ".[dev]"

Run Tests

# Tests unitaires
pytest tests/ -v

# Avec couverture
pytest tests/ --cov=resklogits --cov-report=html

# Script de test complet
# Linux/Mac:
bash scripts/test_all.sh
# Windows:
scripts\test_all.bat

Build Local

# Build du package
uv build

# Vérifier le package
twine check dist/*

# Tester l'installation
# Linux/Mac:
bash scripts/build_and_test.sh
# Windows:
scripts\build_and_test.bat

Code Formatting

# Formater
black src/ tests/ examples/

# Vérifier
black --check src/ tests/ examples/

# Linter
ruff check src/ tests/ examples/

# Type checking
mypy src/

See LOCAL_TESTING.md for complete testing guide.

Project Structure

resklogits/
├── src/
│   └── resklogits/
│       ├── __init__.py
│       ├── vectorized_aho_corasick.py
│       ├── shadow_ban_processor.py
│       └── data/
│           └── banned_phrases.json
├── examples/
│   ├── demo.py
│   ├── example_usage.py
│   └── benchmark.py
├── tests/
├── pyproject.toml
└── README.md

License

APACHE 2

Citation

If you use this in research, please cite:

@software{resklogits_2024,
  title={ReskLogits: GPU-Accelerated Shadow Ban Logits Processor},
  author={RESK Team},
  year={2025},
  url={https://github.com/Resk-Security/resk-logits}
}

Contributing

Contributions welcome! Areas for improvement:

Additional language support
More efficient GPU kernels
Dynamic pattern updates
Toxicity-based adaptive penalties
Extended pattern datasets

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

resk

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.2

Nov 17, 2025

0.1.1

Nov 15, 2025

0.1.0

Nov 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

resklogits-0.1.2.tar.gz (39.0 kB view details)

Uploaded Nov 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

resklogits-0.1.2-py3-none-any.whl (35.1 kB view details)

Uploaded Nov 17, 2025 Python 3

File details

Details for the file resklogits-0.1.2.tar.gz.

File metadata

Download URL: resklogits-0.1.2.tar.gz
Upload date: Nov 17, 2025
Size: 39.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for resklogits-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`646036b5aacb9058b58c4c1f07d854a3c1c37845f3873699fe5b307072c34c0e`
MD5	`32fb424c7a396f32f0f1fa7c18bd6546`
BLAKE2b-256	`020568b4155085ddadb9b0ae26448764a22d696f4138d0779bea635e02727e64`

See more details on using hashes here.

Provenance

The following attestation bundles were made for resklogits-0.1.2.tar.gz:

Publisher: publish.yml on Resk-Security/resk-logits

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: resklogits-0.1.2.tar.gz
- Subject digest: 646036b5aacb9058b58c4c1f07d854a3c1c37845f3873699fe5b307072c34c0e
- Sigstore transparency entry: 705395565
- Sigstore integration time: Nov 17, 2025
Source repository:
- Permalink: Resk-Security/resk-logits@12468712b27157d8f17729588fc8c73ed2745be8
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Resk-Security
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@12468712b27157d8f17729588fc8c73ed2745be8
- Trigger Event: workflow_dispatch

File details

Details for the file resklogits-0.1.2-py3-none-any.whl.

File metadata

Download URL: resklogits-0.1.2-py3-none-any.whl
Upload date: Nov 17, 2025
Size: 35.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for resklogits-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1b19a4246557e3bbd230bd7992c4be8e48ea17c8ed7aaf1a8eff49c8be06c0d9`
MD5	`18432588829d828a6131db6c2663219f`
BLAKE2b-256	`ed2ea37f27ab4132d4d0456109609b52f807fea913b30a61b61a7a8e92a5e6e2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for resklogits-0.1.2-py3-none-any.whl:

Publisher: publish.yml on Resk-Security/resk-logits

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: resklogits-0.1.2-py3-none-any.whl
- Subject digest: 1b19a4246557e3bbd230bd7992c4be8e48ea17c8ed7aaf1a8eff49c8be06c0d9
- Sigstore transparency entry: 705395569
- Sigstore integration time: Nov 17, 2025
Source repository:
- Permalink: Resk-Security/resk-logits@12468712b27157d8f17729588fc8c73ed2745be8
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Resk-Security
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@12468712b27157d8f17729588fc8c73ed2745be8
- Trigger Event: workflow_dispatch

resklogits 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

ReskLogits

🎯 What is ReskLogits?

Key Concept: Shadow Ban vs Hard Block

How It Works

Concrete Example

Key Advantages

Architecture

Installation

Using uv (recommended)

Using pip

From source

Shadow Ban vs Hard Block

Penalty Levels

Multi-Level Filtering

Symbolic Rule Generator

Create YAML Rules

Generate Patterns

Features

How It Works

1. Aho-Corasick Automaton

2. GPU Vectorization

3. State Tracking

Banned Phrases Dataset

Examples

Demo Script

Benchmark Script

Simple Usage

Rule Generator Demo

Cache Management Demo

API Reference

VectorizedAhoCorasick

ShadowBanProcessor

MultiLevelShadowBanProcessor

ConfigParser (Rule Generator)

RuleCache

Installation

From PyPI

From Source

Development

Setup Development Environment

Run Tests

Build Local

Code Formatting

Project Structure

License

Citation

Contributing

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance