Skip to main content

Privacy scanner with GDPR compliance reports - Zero config, instant insights

Project description

Privalyse Logo

🔒 Privalyse – Catch Privcay Leaks in AI-Assisted Codebases

License: MIT PyPI Python 3.8+ X Follow

Code can be a a black box. Data moves through invisible paths. Privalyse makes these paths explicit.

We are generating code faster than ever, but we are losing sight of where our data actually goes. LLMs write logic, but they don't see the flow. They happily pipe PII into logs, send secrets to third-party APIs, or expose internal state.

Privalyse is not just a linter. It builds a Semantic Data Flow Graph of your application to tell Flow Stories:

  • Traditional Linter: "Variable user_email used in line 42."
  • Privalyse: "User Email (Source) → Prompt Template → OpenAI API (Sink) → Logs (Leak)."

With its deterministic static analysis engine, it serves as the perfect counterpart to AI-assisted coding: ensuring reproducible results and providing a safety net to recheck your entire codebase before deployment.

⭐️ Star if you believe in visible data flows.

🚀 Alpha Release - We're building the privacy scanner that modern development deserves. Zero config, instant insights, built for speed.

📚 Quick Start • 🔍 What We Detect • 🗺️ Roadmap • 🐛 Report Bug • ✨ Request Feature

pip install privalyse-cli
privalyse
# ✅ Done. Markdown report ready (scan_results.md).

✨ AI-Native Privacy & Guardrails (New in v0.3.0)

Privalyse now includes specialized features for AI-Model integrations:

  • 🤖 AI Guardrails: Detects PII leaking into LLM prompts (OpenAI, LangChain, etc.).
  • 🌍 Data Sovereignty: Flags data transfers to non-EU providers (AWS, Azure, OpenAI) to help with GDPR compliance.
  • 🛡️ Policy as Code: Enforce blocked countries or providers via privalyse.toml.
  • 🧼 Smart Sanitization: Recognizes hash(), anonymize() and other cleaning functions to reduce false positives.

🔄 Continuous Monitoring (CI/CD)

👁️ Data Flow Visibility & Monitoring

Modern applications are complex webs of data movement. Privalyse provides Data Flow Visibility to help you oversee where sensitive data travels.

Continuous Monitoring

Privalyse is designed to be run in your CI/CD pipeline to provide continuous monitoring of data flows.

  • Detect: Catch new leaks before they merge.
  • Visualize: See the path of data from Source to Sink.
  • Comply: Ensure every data flow has a legal basis.

To achieve true visibility, Privalyse should be part of your continuous integration pipeline. This ensures that every code change is monitored for new data leaks.

GitHub Actions Example

name: Privacy Monitor
on: [push, pull_request]
jobs:
  privalyse-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Run Privalyse Scanner
        uses: privalyse/privalyse-cli@v0.3.0
        with:
          root: '.'
          format: 'markdown'
          output: 'report.md'
          
      - name: Upload Report
        uses: actions/upload-artifact@v4
        with:
          name: privacy-report
          path: report.md

The generated report.md is a human-readable Markdown report that you can view directly in GitHub Actions artifacts or attach to Pull Requests. It provides a clear, visual summary of all findings, compliance risks, and data flow stories.

Installation

pip install privalyse-cli

Quick Start

# Scan current directory (defaults to Markdown output)
privalyse

# Scan specific folder
privalyse --root ./backend

# Output as JSON (Structured)
privalyse --root ./backend --format json --out results.json

# Output as HTML (Visual Dashboard)
privalyse --root ./backend --format html --out report.html

⚙️ Configuration (Policy as Code)

You can enforce privacy policies using a privalyse.toml file in your project root.

# privalyse.toml
[policy]
blocked_countries = ["US", "CN"]  # Fail if data flows to these countries
blocked_providers = ["openai"]    # Fail if data flows to these providers

When a policy violation is detected (e.g., sending PII to a US server), Privalyse will report a CRITICAL finding and exit with a failure code.

🎥 See It In Action

Privalyse CLI Demo

📊 Example Reports

See how Privalyse analyzes different types of projects:

Project Type Description Report
Bad Practice App A vulnerable app full of security holes and GDPR violations. View Report
Modern Fullstack A typical React/Node.js stack with some common issues. View Report
Best Practice App A secure, compliant application following GDPR standards. View Report

⚡ Try It Now (30 seconds)

No installation needed - works in any Python project:

pip install privalyse-cli && privalyse --root . --out report.md && cat report.md | head -50

🎯 Boom. Privacy report generated in 3 seconds.


🤖 AI Agent Integration

Privalyse is designed to be "Agent-Ready". If you are building an AI coding agent or using LLMs to fix code, Privalyse provides structured, context-rich output that agents can understand.

For Coding Agents

When using Privalyse as a tool for an agent:

  1. Run with JSON output: privalyse --format json --out report.json
  2. Parse the findings array: Each finding now includes:
    • code_context: The actual lines of code (with surrounding context) where the issue was found.
    • context_start_line / context_end_line: Precise line numbers.
    • suggested_fix: A human-readable suggestion for fixing the issue.
    • confidence_score: To help the agent decide whether to act.

Example JSON Output for Agents

{
  "rule": "HARDCODED_SECRET",
  "file": "src/config.py",
  "line": 15,
  "severity": "critical",
  "suggested_fix": "Move secret to environment variable (os.environ.get) or secrets manager.",
  "confidence_score": 1.0,
  "code_context": [
    "def connect_db():",
    "    db_password = \"super_secret_password_123\"  # <--- Finding here",
    "    return connect(password=db_password)"
  ]
}

This allows agents to self-correct code without needing to read the file separately.

JSON Schema for Agents

For strict validation, you can use the official JSON schema located at privalyse_scanner/models/output_schema.json. This helps LLM agents understand the exact structure of the output they are processing.


What It Does

Privalyse performs Static Monitoring (Detection) to ensure data safety:

  • Data Flow Visualization: Tracking where user data moves across your codebase (Source -> Sink).
  • Hardcoded Secrets: Detecting API keys, passwords, and tokens.
  • PII Leakage: Identifying Personal Identifiable Information in logs and external calls.
  • GDPR Violations: Mapping findings to specific GDPR articles (Art. 5, 6, 9, 32).
  • Security Misconfigurations: Checking for HTTP vs HTTPS, CORS, and security headers.

Note on Monitoring: In the context of this CLI, "Monitoring" refers to the continuous detection of vulnerabilities in your codebase (e.g., via CI/CD or pre-commit hooks). It does not currently perform live runtime traffic interception.

The scanner uses AST (Abstract Syntax Tree) parsing for both Python and JavaScript/TypeScript to ensure deep understanding of your code structure.

Features

  • Python & JavaScript/TypeScript support
  • AST-based analysis for Python and JS/TS (deterministic, deep data flow tracking)
  • Cross-file taint tracking (follows data flows across imports and modules)
  • Cross-stack tracing (links Frontend API calls to Backend routes)
  • GDPR article mapping (Art. 5, 6, 9, 32)
  • Structured Reports (Executive Summary, Compliance View, File Hotspots)
  • Multiple output formats (JSON, Markdown, HTML)
  • Ignore file support (.privalyseignore for false positives)
  • 100% Local Execution (no code leaves your machine)

💡 Why Privalyse?

We believe security shouldn't be a question of price. Everyone deserves data safety and secure code. That's why Privalyse is MIT Licensed and free to use.

1. The "Audit-Ready" Approach

Don't just find bugs—generate documentation. When your CTO asks "Are we GDPR compliant?", you can't send them a JSON file. Privalyse generates reports you can actually hand to your Data Protection Officer (DPO).

2. Focus on Data Flows

We find problems even in massive codebases. Privalyse goes beyond simple pattern matching by implementing Cross-File & Cross-Stack Taint Tracking. It traces the journey of sensitive data throughout your application—from database models to API endpoints, across network calls to the frontend, and finally to sinks like logging or third-party APIs. By understanding how modules and services interact, we can detect when a variable defined in one file is insecurely used in another, effectively connecting the dots across your entire project structure.

Note: Visual data flow graphs are on the Roadmap!

3. The Human-in-the-Loop

The Markdown results are perfect for reviewing AI-generated code before merging. This helps keep control where it really counts. The Problem: ChatGPT just wrote 500 lines. Did it leak user emails into logs? The Solution: privalyse scan ./new-feature --format markdown

🎯 Use Cases

For Developers

  • Review AI-Generated Code: Catch hardcoded secrets and PII leaks before merging.
  • Clean Up Debug Code: Find forgotten print() and console.log() statements.
  • Learn GDPR: Understand privacy requirements while you code.

For Security Teams

  • Quick Audits: Generate compliance reports in seconds.
  • Track Progress: Monitor privacy improvements over time.
  • CI/CD Integration (Roadmap): Catch issues early in the pipeline.

🗺️ Roadmap

Current (Alpha v0.1):

  • ✅ Python & JavaScript/TypeScript analysis

  • ✅ Cross-file taint tracking

  • ✅ GDPR article mapping (Art. 5, 6, 9, 32)

  • ✅ JSON, Markdown, HTML export

  • .privalyseignore support

Next Up:

  • 🔜 Data Flow display

  • 🔜 Smarter detection Improving the rules and patterns.

  • 🔜 More Compliance Standards (CCPA, HIPAA, etc.)

  • 🔜 GitHub Actions integration (CI/CD ready)

  • 🔜 Enhanced test coverage

Vision (Future):

  • 🎯 Multi-language (Java, Go, Ruby, C#)
  • 🔜 VS Code extension (lint as you code)
  • 🎯 Team features (shared reports, trends)
  • 🎯 AI-assisted fixes (not just detection)
  • 🎯 Pre-commit hooks

Contributing

We're building this in the open. Contributions welcome!


License & Disclaimer

MIT License - See LICENSE for details.

⚠️ Alpha Software: Privalyse helps identify privacy issues but:

  • Does not guarantee complete GDPR compliance
  • Not a substitute for legal counsel
  • Should be part of a broader security strategy
  • May have false positives/negatives as we improve

Always consult privacy professionals for compliance decisions.


Built by developers who care about privacy.
Report a bugRequest a featureContribute

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

privalyse_cli-0.3.1.tar.gz (152.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

privalyse_cli-0.3.1-py3-none-any.whl (134.8 kB view details)

Uploaded Python 3

File details

Details for the file privalyse_cli-0.3.1.tar.gz.

File metadata

  • Download URL: privalyse_cli-0.3.1.tar.gz
  • Upload date:
  • Size: 152.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for privalyse_cli-0.3.1.tar.gz
Algorithm Hash digest
SHA256 a9b06b2d8e17171c9e360e14ecd2d92bf0e0672d5807ca1b808c979702921bfa
MD5 68bb5d57f354f326c8c09c141f74837d
BLAKE2b-256 ae5df286fb7ef83c507b77895774f582e3103d1ed9fc83417058d177fb091ddc

See more details on using hashes here.

Provenance

The following attestation bundles were made for privalyse_cli-0.3.1.tar.gz:

Publisher: publish.yml on Privalyse/privalyse-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file privalyse_cli-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: privalyse_cli-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 134.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for privalyse_cli-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 57a36126eeb4dbd6358384dcebf309cfcfd3f8754ba46a4b7dd654369f8e439a
MD5 58770de15c907147eef57c86ddc6ceb9
BLAKE2b-256 8372aabe96f9eb0b3009b21950cf0bc5559fe98be4b759ab78ff163ebe55a8c5

See more details on using hashes here.

Provenance

The following attestation bundles were made for privalyse_cli-0.3.1-py3-none-any.whl:

Publisher: publish.yml on Privalyse/privalyse-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page