Privacy scanner with GDPR compliance reports - Zero config, instant insights
Project description
🔒 Privalyse – Catch Privcay Leaks in AI-Assisted Codebases
Code can be a a black box. Data moves through invisible paths. Privalyse makes these paths explicit.
We are generating code faster than ever, but we are losing sight of where our data actually goes. LLMs write logic, but they don't see the flow. They happily pipe PII into logs, send secrets to third-party APIs, or expose internal state.
Privalyse is not just a linter. It builds a Semantic Data Flow Graph of your application to tell Flow Stories:
- ❌ Traditional Linter: "Variable
user_emailused in line 42." - ✅ Privalyse: "User Email (Source) → Prompt Template → OpenAI API (Sink) → Logs (Leak)."
With its deterministic static analysis engine, it serves as the perfect counterpart to AI-assisted coding: ensuring reproducible results and providing a safety net to recheck your entire codebase before deployment.
⭐️ Star if you believe in visible data flows.
🚀 Alpha Release - We're building the privacy scanner that modern development deserves. Zero config, instant insights, built for speed.
📚 Quick Start • 🔍 What We Detect • 🗺️ Roadmap • 🐛 Report Bug • ✨ Request Feature
pip install privalyse-cli
privalyse
# ✅ Done. Markdown report ready (scan_results.md).
✨ AI-Native Privacy & Guardrails (New in v0.3.0)
Privalyse now includes specialized features for AI-Model integrations:
- 🤖 AI Guardrails: Detects PII leaking into LLM prompts (OpenAI, LangChain, etc.).
- 🌍 Data Sovereignty: Flags data transfers to non-EU providers (AWS, Azure, OpenAI) to help with GDPR compliance.
- 🛡️ Policy as Code: Enforce blocked countries or providers via
privalyse.toml. - 🧼 Smart Sanitization: Recognizes
hash(),anonymize()and other cleaning functions to reduce false positives.
🔄 Continuous Monitoring (CI/CD)
👁️ Data Flow Visibility & Monitoring
Modern applications are complex webs of data movement. Privalyse provides Data Flow Visibility to help you oversee where sensitive data travels.
Continuous Monitoring
Privalyse is designed to be run in your CI/CD pipeline to provide continuous monitoring of data flows.
- Detect: Catch new leaks before they merge.
- Visualize: See the path of data from Source to Sink.
- Comply: Ensure every data flow has a legal basis.
To achieve true visibility, Privalyse should be part of your continuous integration pipeline. This ensures that every code change is monitored for new data leaks.
GitHub Actions Example
name: Privacy Monitor
on: [push, pull_request]
jobs:
privalyse-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run Privalyse Scanner
uses: privalyse/privalyse-cli@v0.3.0
with:
root: '.'
format: 'markdown'
output: 'report.md'
- name: Upload Report
uses: actions/upload-artifact@v4
with:
name: privacy-report
path: report.md
The generated report.md is a human-readable Markdown report that you can view directly in GitHub Actions artifacts or attach to Pull Requests. It provides a clear, visual summary of all findings, compliance risks, and data flow stories.
Installation
pip install privalyse-cli
Quick Start
# Scan current directory (defaults to Markdown output)
privalyse
# Scan specific folder
privalyse --root ./backend
# Output as JSON (Structured)
privalyse --root ./backend --format json --out results.json
# Output as HTML (Visual Dashboard)
privalyse --root ./backend --format html --out report.html
⚙️ Configuration (Policy as Code)
You can enforce privacy policies using a privalyse.toml file in your project root.
# privalyse.toml
[policy]
blocked_countries = ["US", "CN"] # Fail if data flows to these countries
blocked_providers = ["openai"] # Fail if data flows to these providers
When a policy violation is detected (e.g., sending PII to a US server), Privalyse will report a CRITICAL finding and exit with a failure code.
🎥 See It In Action
📊 Example Reports
See how Privalyse analyzes different types of projects:
| Project Type | Description | Report |
|---|---|---|
| Bad Practice App | A vulnerable app full of security holes and GDPR violations. | View Report |
| Modern Fullstack | A typical React/Node.js stack with some common issues. | View Report |
| Best Practice App | A secure, compliant application following GDPR standards. | View Report |
⚡ Try It Now (30 seconds)
No installation needed - works in any Python project:
pip install privalyse-cli && privalyse --root . --out report.md && cat report.md | head -50
🎯 Boom. Privacy report generated in 3 seconds.
🤖 AI Agent Integration
Privalyse is designed to be "Agent-Ready". If you are building an AI coding agent or using LLMs to fix code, Privalyse provides structured, context-rich output that agents can understand.
For Coding Agents
When using Privalyse as a tool for an agent:
- Run with JSON output:
privalyse --format json --out report.json - Parse the
findingsarray: Each finding now includes:code_context: The actual lines of code (with surrounding context) where the issue was found.context_start_line/context_end_line: Precise line numbers.suggested_fix: A human-readable suggestion for fixing the issue.confidence_score: To help the agent decide whether to act.
Example JSON Output for Agents
{
"rule": "HARDCODED_SECRET",
"file": "src/config.py",
"line": 15,
"severity": "critical",
"suggested_fix": "Move secret to environment variable (os.environ.get) or secrets manager.",
"confidence_score": 1.0,
"code_context": [
"def connect_db():",
" db_password = \"super_secret_password_123\" # <--- Finding here",
" return connect(password=db_password)"
]
}
This allows agents to self-correct code without needing to read the file separately.
JSON Schema for Agents
For strict validation, you can use the official JSON schema located at privalyse_scanner/models/output_schema.json. This helps LLM agents understand the exact structure of the output they are processing.
What It Does
Privalyse performs Static Monitoring (Detection) to ensure data safety:
- Data Flow Visualization: Tracking where user data moves across your codebase (Source -> Sink).
- Hardcoded Secrets: Detecting API keys, passwords, and tokens.
- PII Leakage: Identifying Personal Identifiable Information in logs and external calls.
- GDPR Violations: Mapping findings to specific GDPR articles (Art. 5, 6, 9, 32).
- Security Misconfigurations: Checking for HTTP vs HTTPS, CORS, and security headers.
Note on Monitoring: In the context of this CLI, "Monitoring" refers to the continuous detection of vulnerabilities in your codebase (e.g., via CI/CD or pre-commit hooks). It does not currently perform live runtime traffic interception.
The scanner uses AST (Abstract Syntax Tree) parsing for both Python and JavaScript/TypeScript to ensure deep understanding of your code structure.
Features
- Python & JavaScript/TypeScript support
- AST-based analysis for Python and JS/TS (deterministic, deep data flow tracking)
- Cross-file taint tracking (follows data flows across imports and modules)
- Cross-stack tracing (links Frontend API calls to Backend routes)
- GDPR article mapping (Art. 5, 6, 9, 32)
- Structured Reports (Executive Summary, Compliance View, File Hotspots)
- Multiple output formats (JSON, Markdown, HTML)
- Ignore file support (
.privalyseignorefor false positives) - 100% Local Execution (no code leaves your machine)
💡 Why Privalyse?
We believe security shouldn't be a question of price. Everyone deserves data safety and secure code. That's why Privalyse is MIT Licensed and free to use.
1. The "Audit-Ready" Approach
Don't just find bugs—generate documentation. When your CTO asks "Are we GDPR compliant?", you can't send them a JSON file. Privalyse generates reports you can actually hand to your Data Protection Officer (DPO).
2. Focus on Data Flows
We find problems even in massive codebases. Privalyse goes beyond simple pattern matching by implementing Cross-File & Cross-Stack Taint Tracking. It traces the journey of sensitive data throughout your application—from database models to API endpoints, across network calls to the frontend, and finally to sinks like logging or third-party APIs. By understanding how modules and services interact, we can detect when a variable defined in one file is insecurely used in another, effectively connecting the dots across your entire project structure.
Note: Visual data flow graphs are on the Roadmap!
3. The Human-in-the-Loop
The Markdown results are perfect for reviewing AI-generated code before merging. This helps keep control where it really counts.
The Problem: ChatGPT just wrote 500 lines. Did it leak user emails into logs?
The Solution: privalyse scan ./new-feature --format markdown
🎯 Use Cases
For Developers
- ✅ Review AI-Generated Code: Catch hardcoded secrets and PII leaks before merging.
- ✅ Clean Up Debug Code: Find forgotten
print()andconsole.log()statements. - ✅ Learn GDPR: Understand privacy requirements while you code.
For Security Teams
- ✅ Quick Audits: Generate compliance reports in seconds.
- ✅ Track Progress: Monitor privacy improvements over time.
- ✅ CI/CD Integration (Roadmap): Catch issues early in the pipeline.
🗺️ Roadmap
Current (Alpha v0.1):
-
✅ Python & JavaScript/TypeScript analysis
-
✅ Cross-file taint tracking
-
✅ GDPR article mapping (Art. 5, 6, 9, 32)
-
✅ JSON, Markdown, HTML export
-
✅
.privalyseignoresupport
Next Up:
-
🔜 Data Flow display
-
🔜 Smarter detection Improving the rules and patterns.
-
🔜 More Compliance Standards (CCPA, HIPAA, etc.)
-
🔜 GitHub Actions integration (CI/CD ready)
-
🔜 Enhanced test coverage
Vision (Future):
- 🎯 Multi-language (Java, Go, Ruby, C#)
- 🔜 VS Code extension (lint as you code)
- 🎯 Team features (shared reports, trends)
- 🎯 AI-assisted fixes (not just detection)
- 🎯 Pre-commit hooks
Contributing
We're building this in the open. Contributions welcome!
- Report bugs or suggest features via Issues
- See CONTRIBUTING.md for guidelines
License & Disclaimer
MIT License - See LICENSE for details.
⚠️ Alpha Software: Privalyse helps identify privacy issues but:
- Does not guarantee complete GDPR compliance
- Not a substitute for legal counsel
- Should be part of a broader security strategy
- May have false positives/negatives as we improve
Always consult privacy professionals for compliance decisions.
Built by developers who care about privacy.
Report a bug • Request a feature • Contribute
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file privalyse_cli-0.3.1.tar.gz.
File metadata
- Download URL: privalyse_cli-0.3.1.tar.gz
- Upload date:
- Size: 152.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9b06b2d8e17171c9e360e14ecd2d92bf0e0672d5807ca1b808c979702921bfa
|
|
| MD5 |
68bb5d57f354f326c8c09c141f74837d
|
|
| BLAKE2b-256 |
ae5df286fb7ef83c507b77895774f582e3103d1ed9fc83417058d177fb091ddc
|
Provenance
The following attestation bundles were made for privalyse_cli-0.3.1.tar.gz:
Publisher:
publish.yml on Privalyse/privalyse-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
privalyse_cli-0.3.1.tar.gz -
Subject digest:
a9b06b2d8e17171c9e360e14ecd2d92bf0e0672d5807ca1b808c979702921bfa - Sigstore transparency entry: 777568255
- Sigstore integration time:
-
Permalink:
Privalyse/privalyse-cli@35dcb4585055e846ddc4a74340e4154754c26e9e -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/Privalyse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@35dcb4585055e846ddc4a74340e4154754c26e9e -
Trigger Event:
release
-
Statement type:
File details
Details for the file privalyse_cli-0.3.1-py3-none-any.whl.
File metadata
- Download URL: privalyse_cli-0.3.1-py3-none-any.whl
- Upload date:
- Size: 134.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
57a36126eeb4dbd6358384dcebf309cfcfd3f8754ba46a4b7dd654369f8e439a
|
|
| MD5 |
58770de15c907147eef57c86ddc6ceb9
|
|
| BLAKE2b-256 |
8372aabe96f9eb0b3009b21950cf0bc5559fe98be4b759ab78ff163ebe55a8c5
|
Provenance
The following attestation bundles were made for privalyse_cli-0.3.1-py3-none-any.whl:
Publisher:
publish.yml on Privalyse/privalyse-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
privalyse_cli-0.3.1-py3-none-any.whl -
Subject digest:
57a36126eeb4dbd6358384dcebf309cfcfd3f8754ba46a4b7dd654369f8e439a - Sigstore transparency entry: 777568303
- Sigstore integration time:
-
Permalink:
Privalyse/privalyse-cli@35dcb4585055e846ddc4a74340e4154754c26e9e -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/Privalyse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@35dcb4585055e846ddc4a74340e4154754c26e9e -
Trigger Event:
release
-
Statement type: