Skip to main content

PrivySHA (privacy focused secure hashing library) — drop-in security + optimization layer for LLM apps (developer preview)

Project description

PrivySHA

Status: Developer Preview (v0.3.0) — Early development. APIs and features may change before 1.0.0. See Developer Preview for scope, limitations, and how to give feedback.

PrivySHA — privacy focused secure hashing library
Privacy-first prompt compilation for AI systems

Transform raw prompts into optimized, structured, privacy-safe prompts before they reach LLMs.

PyPI Python License Status Tests


Quick try (60 seconds)

pip install -e .
python examples/developer_preview_demo.py
from privysha import process

result = process(
    "My email is alex@company.com — analyze this dataset.",
    return_metrics=True,
)
print(result["optimized"])

Scope (what to expect in 0.x)

Included now Preview / evolving Not yet
process(), wrap_llm(), optimize(), sanitize() PrivyFit (recommend_local_model) Stable 1.0 API guarantee
PII masking, token compression Agent, routing, pipeline stages Enterprise compliance reports
CLI demo & benchmarks Multi-provider routing at scale Full HF catalog tooling

Full details: Developer Preview · Roadmap


Overview

PrivySHA is an open-source prompt optimization and compilation framework designed for modern AI applications.

Instead of sending raw user prompts directly to Large Language Models, PrivySHA introduces a compiler-style processing pipeline that transforms prompts into structured, optimized instructions.

This improves:

• privacy • token efficiency • prompt reliability • system observability

PrivySHA acts as a prompt compiler layer between your application and any LLM.

Release track: 0.3.0 developer preview — use for feedback and experiments. Requires Python 3.10+. Install with pip install privysha (or pip install -e . from source). Primary APIs: process() and wrap_llm().


Motivation

Most LLM applications look like this:

User Prompt → LLM

This causes problems:

Problem Result
Unstructured prompts inconsistent responses
Excess tokens higher API costs
PII leakage privacy risk
Prompt drift unreliable outputs
No observability hard debugging

PrivySHA introduces a structured pipeline:

User Prompt → PrivySHA → Optimized Prompt → LLM

Key Features

Privacy-First Processing

PrivySHA detects and masks sensitive information such as:

  • email addresses
  • phone numbers
  • personal identifiers

Example:

Input

John's email is john@email.com analyze this dataset

Output

<PERSON_HASH> email <EMAIL_HASH> analyze dataset

PrivyFit — Local Model Advisor

Recommend local LLMs for your app's compiled workload on your hardware:

from privysha import recommend_local_model

report = recommend_local_model(
    prompts=["My email is john@x.com — analyze this dataset."],
    mode="strict",
    top=3,
)
print(report.top_pick.ollama_pull_name)

CLI: privysha recommend --prompts ./samples.json --gpu "RTX 4090"

See docs/local-advisor.md.


Prompt Sanitization

Removes conversational filler.

Example

Hey bro can you analyze this dataset for anomalies?

becomes

analyze dataset for anomalies

Prompt AST

PrivySHA converts prompts into structured representations.

Example

intent: analyze
object: dataset
task: anomaly_detection

This allows the system to perform compiler-style optimizations.


Token Optimization

Prompts are compressed to reduce token usage.

Example

Analyze this dataset for anomalies and patterns

becomes

@analyze(dataset)

Modular Prompt Pipeline

PrivySHA processes prompts through multiple stages.

User Prompt
   │
   ▼
Parser
   │
   ▼
Sanitizer
   │
   ▼
PII Detection
   │
   ▼
Optimizer
   │
   ▼
Context Injector
   │
   ▼
Prompt Compiler
   │
   ▼
Model Adapter
   │
   ▼
LLM Response

Each stage can be customized or replaced.


Installation

Basic Installation (Lightweight)

pip install privysha

Instant setup - No downloads, works immediately with rule-based PII detection.

Advanced Features (Optional)

For ML-enhanced PII detection and advanced features:

pip install privysha[ml]

ML features include:

  • Enhanced PII detection with transformer models
  • Higher accuracy for complex PII patterns
  • Context-aware entity recognition

Provider-Specific

# OpenAI support
pip install privysha[openai]

# Anthropic Claude support  
pip install privysha[anthropic]

# Google Gemini support
pip install privysha[gemini]

# All providers + ML features
pip install privysha[all]

Requirements:

  • Python 3.10+

Quick Start

Drop-in Functions (Easiest)

from privysha import process

# Simple processing
result = process("Hey bro analyze my dataset with john@example.com")
print(result)  # "analyze dataset with <EMAIL_HASH>"

# With ML-enhanced PII detection (requires pip install privysha[ml])
result = process("Contact john@example.com for details", pii_mode="hybrid")
print(result)  # Enhanced PII detection with transformer models

Agent Class (Full Control)

from privysha import Agent

agent = Agent(
    model="mock",  # Use "gpt-4o-mini" for OpenAI, "llama3" for Ollama
    privacy=True,
    token_budget=1200
)

response = agent.run(
    "Hey bro can you analyze this dataset for anomalies?"
)

print(response)

PrivySHA automatically:

  1. sanitizes the prompt
  2. removes personal language
  3. masks sensitive data
  4. optimizes token usage
  5. compiles a structured prompt

Progressive Enhancement

Choose your PII detection level:

# Rule-based only (lightweight, default)
process("Contact john@example.com", pii_mode="rule")

# Hybrid: Rules + ML (requires pip install privysha[ml])
process("Contact john@example.com", pii_mode="hybrid")

# ML-only (experimental, requires pip install privysha[ml])
process("Contact john@example.com", pii_mode="ml_only")

Usage Examples

Model Providers

OpenAI (Requires API Key)

import os
from privysha import Agent

os.environ["OPENAI_API_KEY"] = "your-api-key"

agent = Agent(model="gpt-4o-mini")
response = agent.run("Analyze this data")

Ollama (Requires Local Server)

# Install and start Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve
ollama pull llama3
from privysha import Agent

agent = Agent(model="llama3")
response = agent.run("Analyze this data")

HuggingFace (Requires Transformers)

from privysha import Agent

agent = Agent(model="microsoft/DialoGPT-medium")
response = agent.run("Analyze this data")

Real-World Applications

Data Analysis Pipeline

from privysha import Agent

agent = Agent(model="gpt-4o-mini", privacy=True)

def analyze_data(data_description):
    prompt = f"Analyze this dataset for patterns: {data_description}"
    return agent.run(prompt)

# Usage
result = analyze_data("Sales data from Q1 2024 with customer emails")

Customer Support

from privysha import Agent

agent = Agent(model="gpt-4o-mini", privacy=True)

def support_query(customer_message):
    # PII will be automatically masked
    return agent.run(customer_message)

# Usage
response = support_query("Help me with order #12345, email john@example.com")

Content Moderation

from privysha import Agent

agent = Agent(model="gpt-4o-mini", privacy=True)

def moderate_content(user_content):
    return agent.run(f"Review this content for policy violations: {user_content}")

# Usage
moderation_result = moderate_content("Check this post from user@social.com")

Debugging Prompt Transformations

PrivySHA exposes the full pipeline trace.

result = agent.run(prompt, trace=True)

print(result)

Example output

RAW PROMPT
Hey bro analyze this dataset

SANITIZED
analyze dataset

OPTIMIZED
@analyze(dataset)

COMPILED
SYSTEM:
You are a data scientist

TASK:
analyze dataset

This allows developers to debug prompt engineering systematically.


Production Deployment

Security Best Practices

import os
from privysha import Agent

# Always use environment variables for API keys
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

# Production configuration
agent = Agent(
    model="gpt-4o-mini",
    privacy=True,  # Always enable in production
    token_budget=2000  # Adjust based on your needs
)

def process_prompt(user_input):
    """Process user input with privacy protection"""
    try:
        response = agent.run(user_input)
        return response
    except Exception as e:
        return f"Error processing prompt: {e}"

Monitoring & Debugging

import time
from privysha import Agent

agent = Agent(model="gpt-4o-mini", privacy=True)

def monitored_process(prompt):
    start_time = time.time()
    
    result = agent.run(prompt, trace=True)
    
    processing_time = time.time() - start_time
    
    # Log metrics (without sensitive data)
    print(f"Processing time: {processing_time:.2f}s")
    print(f"Token optimization: {len(result['raw_prompt'])} -> {len(result['optimized'])}")
    
    return result["response"]

Testing Your Setup

from privysha import Agent

# Test without external services
agent = Agent(model="mock", privacy=True)
response = agent.run("Test prompt with email@example.com")
print(response)

# Test pipeline stages
result = agent.run("Hey bro analyze my dataset john@example.com", trace=True)

# Verify PII masking
assert "john@example.com" not in result["sanitized"]
assert "<EMAIL_HASH>" in result["sanitized"]

# Verify sanitization
assert "bro" not in result["sanitized"]

Supported Model Providers

PrivySHA integrates with multiple model providers.

Provider Type
OpenAI hosted APIs
Ollama local LLM runtime
HuggingFace transformer models

Example:

Agent(model="gpt-4o-mini")

or

Agent(model="llama3")

Architecture

PrivySHA follows a compiler-inspired, modular pipeline architecture.

flowchart LR
    UserInput[User Input] --> Security[Security Stage]
    Security --> IR[IR Generation]
    IR --> Routing[Model Routing]
    Routing --> Compile[Compilation]
    Compile --> Optimize[Optimization]
    Optimize --> LLM[LLM Provider]
    LLM --> Result[Result Assembly]
privysha/
├── agent.py                 # High-level Agent API
├── utils/
│   ├── dropin.py            # process(), wrap_llm(), optimize(), sanitize()
│   └── pii_detector.py      # Rule-based PII detection
├── pipeline/
│   ├── pipeline.py          # 7-stage orchestrator
│   └── stages/              # Security, IR, Routing, Compilation, Optimization, Generation, Result
├── core/
│   └── pii_pipeline/        # Multi-stage PII detection pipeline
├── compiler/
│   └── msdpc/               # Token optimization engine
├── security/                # Threat detection and masking
├── adapters/                # OpenAI, Claude, Gemini, Grok, Ollama, HuggingFace, Mock
├── integrations/            # FastAPI, Flask, Django, LangChain, LlamaIndex
├── cli/                     # privysha command-line tool
└── ir/                      # Prompt intermediate representation

Documentation

Build the docs site locally:

pip install -e ".[docs]"
mkdocs serve

Optional integrations: pip install privysha[integrations] or pip install privysha[fastapi,langchain,instructor]

See docs/publishing.md for PyPI trusted publishing setup.


Running Tests

# Unit tests (no API keys required)
pytest -m "not integration"

# Full suite including integration tests (requires GEMINI_API_KEY)
pytest

Or run the readiness check:

pytest tests/comprehensive_test.py -v

Tests validate:

  • prompt sanitization
  • token optimization
  • pipeline execution
  • PII masking
  • adapter functionality

Troubleshooting

Common Issues

  1. Import Error: pip install -e . in development
  2. Connection Refused: Start Ollama server or check API keys
  3. Memory Issues: Reduce token_budget or use smaller models
  4. PII Not Masked: Ensure privacy=True

Debug Mode

# Enable full debugging
result = agent.run(prompt, trace=True)

# Print all stages
for stage, output in result.items():
    if stage != "response":
        print(f"{stage.upper()}:")
        print(f"  {output}")
        print()

Comparison

Feature PrivySHA Traditional Prompting
Prompt Sanitization
PII Protection
Token Optimization
Pipeline Debugging

PrivySHA introduces a structured prompt lifecycle rather than raw prompt usage.


Performance Benchmarks

Reproducible benchmarks are included in the repo. Typical results (rule-based PII, no ML):

Metric Typical range
Token reduction 5–15% on verbose prompts
Processing latency 20–80 ms
Fail-safe rate ~100%
pip install -e .
python benchmarks/run_benchmarks.py --save

Results are written to benchmarks/output/. See benchmarks/results.md for methodology and reference numbers. Benchmarks also run in CI on every push.


Contributing

Contributions are welcome.

Steps:

  1. Fork the repository
  2. Create a feature branch
  3. Write tests for your changes
  4. Submit a pull request

Before submitting:

pytest

Roadmap

Future versions will include:

  • advanced prompt AST analysis
  • prompt caching engine
  • cost-aware optimization
  • multi-model routing

License

This project is licensed under the Apache 2.0 License.

See the LICENSE file for details.


Acknowledgements

PrivySHA is inspired by ideas from modern AI tooling ecosystems and compiler design.

It explores the idea of treating prompts as structured programs rather than raw text.


Support the Project

If you find this project useful:

⭐ Star the repository 🐛 Report issues 💡 Suggest improvements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

privysha-0.3.0.tar.gz (291.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

privysha-0.3.0-py3-none-any.whl (377.1 kB view details)

Uploaded Python 3

File details

Details for the file privysha-0.3.0.tar.gz.

File metadata

  • Download URL: privysha-0.3.0.tar.gz
  • Upload date:
  • Size: 291.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for privysha-0.3.0.tar.gz
Algorithm Hash digest
SHA256 b13cb5dc476442847d04330a62abd60526b5825acff948b9d8fc840b16dadb54
MD5 5cd8a04aeeeda1f22b5732093f5e60a3
BLAKE2b-256 90817ae0c08fc27ed19012e5b7156980c19529c2ad676266e3ae0fc8c2031364

See more details on using hashes here.

Provenance

The following attestation bundles were made for privysha-0.3.0.tar.gz:

Publisher: publish.yml on AjayRajan05/privySHA

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file privysha-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: privysha-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 377.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for privysha-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b8fea1c64fc67c6b068d2394744f5dacd16ac7afa947b524b505fd2db34c2cae
MD5 6da787b00fdd3c63354eba7a975b0630
BLAKE2b-256 3916a3427fdb2b5ad7e1a0ef64bd8ce8d5147d493d126c86b0bcaf50af2c344d

See more details on using hashes here.

Provenance

The following attestation bundles were made for privysha-0.3.0-py3-none-any.whl:

Publisher: publish.yml on AjayRajan05/privySHA

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page