PrivySHA (privacy focused secure hashing library) — drop-in security + optimization layer for LLM apps (developer preview)
Project description
PrivySHA
Status: Developer Preview (v0.3.0) — Early development. APIs and features may change before 1.0.0. See Developer Preview for scope, limitations, and how to give feedback.
PrivySHA — privacy focused secure hashing library
Privacy-first prompt compilation for AI systems
Transform raw prompts into optimized, structured, privacy-safe prompts before they reach LLMs.
Quick try (60 seconds)
pip install -e .
python examples/developer_preview_demo.py
from privysha import process
result = process(
"My email is alex@company.com — analyze this dataset.",
return_metrics=True,
)
print(result["optimized"])
Scope (what to expect in 0.x)
| Included now | Preview / evolving | Not yet |
|---|---|---|
process(), wrap_llm(), optimize(), sanitize() |
PrivyFit (recommend_local_model) |
Stable 1.0 API guarantee |
| PII masking, token compression | Agent, routing, pipeline stages | Enterprise compliance reports |
| CLI demo & benchmarks | Multi-provider routing at scale | Full HF catalog tooling |
Full details: Developer Preview · Roadmap
Overview
PrivySHA is an open-source prompt optimization and compilation framework designed for modern AI applications.
Instead of sending raw user prompts directly to Large Language Models, PrivySHA introduces a compiler-style processing pipeline that transforms prompts into structured, optimized instructions.
This improves:
• privacy • token efficiency • prompt reliability • system observability
PrivySHA acts as a prompt compiler layer between your application and any LLM.
Release track:
0.3.0developer preview — use for feedback and experiments. Requires Python 3.10+. Install withpip install privysha(orpip install -e .from source). Primary APIs:process()andwrap_llm().
Motivation
Most LLM applications look like this:
User Prompt → LLM
This causes problems:
| Problem | Result |
|---|---|
| Unstructured prompts | inconsistent responses |
| Excess tokens | higher API costs |
| PII leakage | privacy risk |
| Prompt drift | unreliable outputs |
| No observability | hard debugging |
PrivySHA introduces a structured pipeline:
User Prompt → PrivySHA → Optimized Prompt → LLM
Key Features
Privacy-First Processing
PrivySHA detects and masks sensitive information such as:
- email addresses
- phone numbers
- personal identifiers
Example:
Input
John's email is john@email.com analyze this dataset
Output
<PERSON_HASH> email <EMAIL_HASH> analyze dataset
PrivyFit — Local Model Advisor
Recommend local LLMs for your app's compiled workload on your hardware:
from privysha import recommend_local_model
report = recommend_local_model(
prompts=["My email is john@x.com — analyze this dataset."],
mode="strict",
top=3,
)
print(report.top_pick.ollama_pull_name)
CLI: privysha recommend --prompts ./samples.json --gpu "RTX 4090"
Prompt Sanitization
Removes conversational filler.
Example
Hey bro can you analyze this dataset for anomalies?
becomes
analyze dataset for anomalies
Prompt AST
PrivySHA converts prompts into structured representations.
Example
intent: analyze
object: dataset
task: anomaly_detection
This allows the system to perform compiler-style optimizations.
Token Optimization
Prompts are compressed to reduce token usage.
Example
Analyze this dataset for anomalies and patterns
becomes
@analyze(dataset)
Modular Prompt Pipeline
PrivySHA processes prompts through multiple stages.
User Prompt
│
▼
Parser
│
▼
Sanitizer
│
▼
PII Detection
│
▼
Optimizer
│
▼
Context Injector
│
▼
Prompt Compiler
│
▼
Model Adapter
│
▼
LLM Response
Each stage can be customized or replaced.
Installation
Basic Installation (Lightweight)
pip install privysha
Instant setup - No downloads, works immediately with rule-based PII detection.
Advanced Features (Optional)
For ML-enhanced PII detection and advanced features:
pip install privysha[ml]
ML features include:
- Enhanced PII detection with transformer models
- Higher accuracy for complex PII patterns
- Context-aware entity recognition
Provider-Specific
# OpenAI support
pip install privysha[openai]
# Anthropic Claude support
pip install privysha[anthropic]
# Google Gemini support
pip install privysha[gemini]
# All providers + ML features
pip install privysha[all]
Requirements:
- Python 3.10+
Quick Start
Drop-in Functions (Easiest)
from privysha import process
# Simple processing
result = process("Hey bro analyze my dataset with john@example.com")
print(result) # "analyze dataset with <EMAIL_HASH>"
# With ML-enhanced PII detection (requires pip install privysha[ml])
result = process("Contact john@example.com for details", pii_mode="hybrid")
print(result) # Enhanced PII detection with transformer models
Agent Class (Full Control)
from privysha import Agent
agent = Agent(
model="mock", # Use "gpt-4o-mini" for OpenAI, "llama3" for Ollama
privacy=True,
token_budget=1200
)
response = agent.run(
"Hey bro can you analyze this dataset for anomalies?"
)
print(response)
PrivySHA automatically:
- sanitizes the prompt
- removes personal language
- masks sensitive data
- optimizes token usage
- compiles a structured prompt
Progressive Enhancement
Choose your PII detection level:
# Rule-based only (lightweight, default)
process("Contact john@example.com", pii_mode="rule")
# Hybrid: Rules + ML (requires pip install privysha[ml])
process("Contact john@example.com", pii_mode="hybrid")
# ML-only (experimental, requires pip install privysha[ml])
process("Contact john@example.com", pii_mode="ml_only")
Usage Examples
Model Providers
OpenAI (Requires API Key)
import os
from privysha import Agent
os.environ["OPENAI_API_KEY"] = "your-api-key"
agent = Agent(model="gpt-4o-mini")
response = agent.run("Analyze this data")
Ollama (Requires Local Server)
# Install and start Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve
ollama pull llama3
from privysha import Agent
agent = Agent(model="llama3")
response = agent.run("Analyze this data")
HuggingFace (Requires Transformers)
from privysha import Agent
agent = Agent(model="microsoft/DialoGPT-medium")
response = agent.run("Analyze this data")
Real-World Applications
Data Analysis Pipeline
from privysha import Agent
agent = Agent(model="gpt-4o-mini", privacy=True)
def analyze_data(data_description):
prompt = f"Analyze this dataset for patterns: {data_description}"
return agent.run(prompt)
# Usage
result = analyze_data("Sales data from Q1 2024 with customer emails")
Customer Support
from privysha import Agent
agent = Agent(model="gpt-4o-mini", privacy=True)
def support_query(customer_message):
# PII will be automatically masked
return agent.run(customer_message)
# Usage
response = support_query("Help me with order #12345, email john@example.com")
Content Moderation
from privysha import Agent
agent = Agent(model="gpt-4o-mini", privacy=True)
def moderate_content(user_content):
return agent.run(f"Review this content for policy violations: {user_content}")
# Usage
moderation_result = moderate_content("Check this post from user@social.com")
Debugging Prompt Transformations
PrivySHA exposes the full pipeline trace.
result = agent.run(prompt, trace=True)
print(result)
Example output
RAW PROMPT
Hey bro analyze this dataset
SANITIZED
analyze dataset
OPTIMIZED
@analyze(dataset)
COMPILED
SYSTEM:
You are a data scientist
TASK:
analyze dataset
This allows developers to debug prompt engineering systematically.
Production Deployment
Security Best Practices
import os
from privysha import Agent
# Always use environment variables for API keys
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
# Production configuration
agent = Agent(
model="gpt-4o-mini",
privacy=True, # Always enable in production
token_budget=2000 # Adjust based on your needs
)
def process_prompt(user_input):
"""Process user input with privacy protection"""
try:
response = agent.run(user_input)
return response
except Exception as e:
return f"Error processing prompt: {e}"
Monitoring & Debugging
import time
from privysha import Agent
agent = Agent(model="gpt-4o-mini", privacy=True)
def monitored_process(prompt):
start_time = time.time()
result = agent.run(prompt, trace=True)
processing_time = time.time() - start_time
# Log metrics (without sensitive data)
print(f"Processing time: {processing_time:.2f}s")
print(f"Token optimization: {len(result['raw_prompt'])} -> {len(result['optimized'])}")
return result["response"]
Testing Your Setup
from privysha import Agent
# Test without external services
agent = Agent(model="mock", privacy=True)
response = agent.run("Test prompt with email@example.com")
print(response)
# Test pipeline stages
result = agent.run("Hey bro analyze my dataset john@example.com", trace=True)
# Verify PII masking
assert "john@example.com" not in result["sanitized"]
assert "<EMAIL_HASH>" in result["sanitized"]
# Verify sanitization
assert "bro" not in result["sanitized"]
Supported Model Providers
PrivySHA integrates with multiple model providers.
| Provider | Type |
|---|---|
| OpenAI | hosted APIs |
| Ollama | local LLM runtime |
| HuggingFace | transformer models |
Example:
Agent(model="gpt-4o-mini")
or
Agent(model="llama3")
Architecture
PrivySHA follows a compiler-inspired, modular pipeline architecture.
flowchart LR
UserInput[User Input] --> Security[Security Stage]
Security --> IR[IR Generation]
IR --> Routing[Model Routing]
Routing --> Compile[Compilation]
Compile --> Optimize[Optimization]
Optimize --> LLM[LLM Provider]
LLM --> Result[Result Assembly]
privysha/
├── agent.py # High-level Agent API
├── utils/
│ ├── dropin.py # process(), wrap_llm(), optimize(), sanitize()
│ └── pii_detector.py # Rule-based PII detection
├── pipeline/
│ ├── pipeline.py # 7-stage orchestrator
│ └── stages/ # Security, IR, Routing, Compilation, Optimization, Generation, Result
├── core/
│ └── pii_pipeline/ # Multi-stage PII detection pipeline
├── compiler/
│ └── msdpc/ # Token optimization engine
├── security/ # Threat detection and masking
├── adapters/ # OpenAI, Claude, Gemini, Grok, Ollama, HuggingFace, Mock
├── integrations/ # FastAPI, Flask, Django, LangChain, LlamaIndex
├── cli/ # privysha command-line tool
└── ir/ # Prompt intermediate representation
Documentation
- 📖 Quickstart Guide - 5-minute walkthrough
- 🔧 Troubleshooting Guide - Common issues & solutions
- ⚡ Performance Tuning - Optimization guide
- 🔄 Migration Guide - From other solutions
- 📚 API Reference - Complete documentation
- 🏗️ Architecture - System design
- 🤝 Contributing - Development guide
Build the docs site locally:
pip install -e ".[docs]"
mkdocs serve
Optional integrations: pip install privysha[integrations] or pip install privysha[fastapi,langchain,instructor]
See docs/publishing.md for PyPI trusted publishing setup.
Running Tests
# Unit tests (no API keys required)
pytest -m "not integration"
# Full suite including integration tests (requires GEMINI_API_KEY)
pytest
Or run the readiness check:
pytest tests/comprehensive_test.py -v
Tests validate:
- prompt sanitization
- token optimization
- pipeline execution
- PII masking
- adapter functionality
Troubleshooting
Common Issues
- Import Error:
pip install -e .in development - Connection Refused: Start Ollama server or check API keys
- Memory Issues: Reduce
token_budgetor use smaller models - PII Not Masked: Ensure
privacy=True
Debug Mode
# Enable full debugging
result = agent.run(prompt, trace=True)
# Print all stages
for stage, output in result.items():
if stage != "response":
print(f"{stage.upper()}:")
print(f" {output}")
print()
Comparison
| Feature | PrivySHA | Traditional Prompting |
|---|---|---|
| Prompt Sanitization | ✓ | ✗ |
| PII Protection | ✓ | ✗ |
| Token Optimization | ✓ | ✗ |
| Pipeline Debugging | ✓ | ✗ |
PrivySHA introduces a structured prompt lifecycle rather than raw prompt usage.
Performance Benchmarks
Reproducible benchmarks are included in the repo. Typical results (rule-based PII, no ML):
| Metric | Typical range |
|---|---|
| Token reduction | 5–15% on verbose prompts |
| Processing latency | 20–80 ms |
| Fail-safe rate | ~100% |
pip install -e .
python benchmarks/run_benchmarks.py --save
Results are written to benchmarks/output/. See benchmarks/results.md for methodology and reference numbers. Benchmarks also run in CI on every push.
Contributing
Contributions are welcome.
Steps:
- Fork the repository
- Create a feature branch
- Write tests for your changes
- Submit a pull request
Before submitting:
pytest
Roadmap
Future versions will include:
- advanced prompt AST analysis
- prompt caching engine
- cost-aware optimization
- multi-model routing
License
This project is licensed under the Apache 2.0 License.
See the LICENSE file for details.
Acknowledgements
PrivySHA is inspired by ideas from modern AI tooling ecosystems and compiler design.
It explores the idea of treating prompts as structured programs rather than raw text.
Support the Project
If you find this project useful:
⭐ Star the repository 🐛 Report issues 💡 Suggest improvements
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file privysha-0.3.0.tar.gz.
File metadata
- Download URL: privysha-0.3.0.tar.gz
- Upload date:
- Size: 291.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b13cb5dc476442847d04330a62abd60526b5825acff948b9d8fc840b16dadb54
|
|
| MD5 |
5cd8a04aeeeda1f22b5732093f5e60a3
|
|
| BLAKE2b-256 |
90817ae0c08fc27ed19012e5b7156980c19529c2ad676266e3ae0fc8c2031364
|
Provenance
The following attestation bundles were made for privysha-0.3.0.tar.gz:
Publisher:
publish.yml on AjayRajan05/privySHA
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
privysha-0.3.0.tar.gz -
Subject digest:
b13cb5dc476442847d04330a62abd60526b5825acff948b9d8fc840b16dadb54 - Sigstore transparency entry: 1821478792
- Sigstore integration time:
-
Permalink:
AjayRajan05/privySHA@5e1440d1a60d6d8cdc8afe57b9cc5a90c807be94 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/AjayRajan05
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5e1440d1a60d6d8cdc8afe57b9cc5a90c807be94 -
Trigger Event:
release
-
Statement type:
File details
Details for the file privysha-0.3.0-py3-none-any.whl.
File metadata
- Download URL: privysha-0.3.0-py3-none-any.whl
- Upload date:
- Size: 377.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8fea1c64fc67c6b068d2394744f5dacd16ac7afa947b524b505fd2db34c2cae
|
|
| MD5 |
6da787b00fdd3c63354eba7a975b0630
|
|
| BLAKE2b-256 |
3916a3427fdb2b5ad7e1a0ef64bd8ce8d5147d493d126c86b0bcaf50af2c344d
|
Provenance
The following attestation bundles were made for privysha-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on AjayRajan05/privySHA
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
privysha-0.3.0-py3-none-any.whl -
Subject digest:
b8fea1c64fc67c6b068d2394744f5dacd16ac7afa947b524b505fd2db34c2cae - Sigstore transparency entry: 1821478825
- Sigstore integration time:
-
Permalink:
AjayRajan05/privySHA@5e1440d1a60d6d8cdc8afe57b9cc5a90c807be94 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/AjayRajan05
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5e1440d1a60d6d8cdc8afe57b9cc5a90c807be94 -
Trigger Event:
release
-
Statement type: