Skip to main content

Production-grade PII/GDPR detection CLI with multi-level analysis

Project description

Levox - Production-Grade PII/GDPR Detection CLI

Python 3.8+ License: MIT Code style: black

Levox is a high-performance, enterprise-grade CLI application for detecting Personally Identifiable Information (PII) and ensuring GDPR compliance in codebases. Built with a multi-tier detection architecture, it provides fast, accurate scanning with minimal false positives.

๐Ÿš€ Features

  • 7-Stage Detection Pipeline: Regex โ†’ AST Analysis โ†’ Context Analysis โ†’ Dataflow โ†’ CFG Analysis โ†’ ML Filtering โ†’ GDPR Compliance
  • Multi-Language Support: Python, JavaScript, and extensible parser architecture
  • Performance Optimized: <10s incremental scans, <30s full repository scans
  • Enterprise Licensing: Standard, Premium, and Enterprise tiers with feature gates
  • Low False Positives: Target <10% false positive rate
  • Memory Efficient: Memory-mapped file operations for large codebases
  • Comprehensive Logging: Structured logging with performance metrics

๐Ÿ—๏ธ Architecture

levox/
โ”œโ”€โ”€ levox/
โ”‚   โ”œโ”€โ”€ cli.py                 # Main CLI entry point
โ”‚   โ”œโ”€โ”€ core/
โ”‚   โ”‚   โ”œโ”€โ”€ engine.py          # Detection engine orchestrator
โ”‚   โ”‚   โ”œโ”€โ”€ config.py          # Configuration management
โ”‚   โ”‚   โ””โ”€โ”€ exceptions.py      # Custom exceptions
โ”‚   โ”œโ”€โ”€ detection/
โ”‚   โ”‚   โ”œโ”€โ”€ regex_engine.py    # Stage 1: Optimized regex detection
โ”‚   โ”‚   โ”œโ”€โ”€ ast_analyzer.py    # Stage 2: AST-based context analysis
โ”‚   โ”‚   โ”œโ”€โ”€ context_analyzer.py # Stage 3: Semantic context analysis
โ”‚   โ”‚   โ”œโ”€โ”€ dataflow.py        # Stage 4: Taint/dataflow analysis
โ”‚   โ”‚   โ”œโ”€โ”€ cfg_analyzer.py    # Stage 5: Control Flow Graph analysis
โ”‚   โ”‚   โ””โ”€โ”€ ml_filter.py       # Stage 6: ML-based false positive reduction
โ”‚   โ”œโ”€โ”€ parsers/
โ”‚   โ”‚   โ”œโ”€โ”€ base.py           # Base parser interface
โ”‚   โ”‚   โ”œโ”€โ”€ python_parser.py  # Python AST parser
โ”‚   โ”‚   โ”œโ”€โ”€ javascript_parser.py # JS parser
โ”‚   โ”‚   โ””โ”€โ”€ multi_lang.py     # Multi-language coordinator
โ”‚   โ”œโ”€โ”€ utils/
โ”‚   โ”‚   โ”œโ”€โ”€ file_handler.py   # Memory-mapped file operations
โ”‚   โ”‚   โ”œโ”€โ”€ validators.py     # Luhn, format validators
โ”‚   โ”‚   โ””โ”€โ”€ performance.py    # Performance monitoring
โ”‚   โ””โ”€โ”€ models/
โ”‚       โ”œโ”€โ”€ detection_result.py # Result data models
โ”‚       โ””โ”€โ”€ confidence.py       # Confidence scoring

๐Ÿ“ฆ Installation

From Source

git clone https://github.com/levox/levox.git
cd levox
pip install -e .

From PyPI

pip install levox

From PyPI (Development Version)

pip install --upgrade levox

๐Ÿš€ Quick Start

Basic Usage

# Scan current directory
levox scan

# Scan specific directory
levox scan /path/to/codebase

# Scan with CFG analysis (Premium+)
levox scan --cfg

# Generate detailed report
levox scan --output report.json --format json

# Configure detection rules
levox configure --rules custom-rules.yaml

Advanced CFG Analysis

# Enable deep scanning with CFG analysis
levox scan --cfg --cfg-confidence 0.7

# Alternative flag name
levox scan --deep-scan

# Full enterprise scan with all stages
levox scan --license-tier enterprise --cfg --format json

CLI Commands

  • levox scan - Scan codebase for PII/GDPR violations
  • levox configure - Configure detection rules and settings
  • levox report - Generate and view reports
  • levox feedback - Provide feedback to improve detection

โš™๏ธ Configuration

Detection Pipeline Stages

STAGE 1: Regex Detection (Basic)

  • Fast pattern matching for basic PII patterns
  • Optimized regex engine with minimal false positives

STAGE 2: AST Analysis (Premium+)

  • Abstract syntax tree parsing for code structure
  • Multi-language support with Tree-sitter

STAGE 3: Context Analysis (Premium+)

  • Semantic analysis of variable/function names
  • Context-aware false positive reduction

STAGE 4: Dataflow Analysis (Enterprise)

  • Tracks data movement through code
  • Taint analysis for sensitive data flows

STAGE 5: CFG Analysis (Premium+)

  • Control Flow Graph analysis for complex PII flows
  • Detects conditional exposure, loop accumulation, transformation chains

STAGE 6: ML Filtering (Enterprise)

  • Machine learning false positive reduction
  • Confidence scoring and validation

STAGE 7: GDPR Compliance (Premium+)

  • Regulatory compliance checking
  • Audit logging and reporting

License Tiers

  • Standard: Basic regex detection, limited language support
  • Premium: AST analysis, context analysis, CFG analysis, GDPR compliance
  • Enterprise: Full 7-stage pipeline including dataflow and ML filtering

Detection Rules

Create custom detection rules in configs/rules.yaml:

patterns:
  credit_card:
    regex: '\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b'
    confidence: 0.8
    risk_level: high
    
  email:
    regex: '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
    confidence: 0.9
    risk_level: medium

๐Ÿ” Detection Pipeline

Level 1: Regex Engine

  • High-performance pattern matching
  • Optimized for common PII formats
  • Fast initial screening

Level 2: AST Analysis

  • Context-aware detection
  • Variable name analysis
  • Comment and string extraction

Level 3: Dataflow Analysis

  • Taint tracking
  • Variable propagation
  • Cross-function analysis

Level 4: ML Filtering

  • False positive reduction
  • Context classification
  • Confidence scoring

๐Ÿ“Š Performance

  • Incremental Scans: <10 seconds for modified files
  • Full Repository: <30 seconds for 10,000 files
  • Memory Usage: <500MB for large codebases
  • False Positive Rate: Target <10%

๐Ÿงช Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=levox

# Run specific test suite
pytest tests/test_detection/

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ†˜ Support

๐Ÿ† Enterprise Support

For enterprise customers, we offer:

  • Custom detection rules
  • API integration
  • Dedicated support
  • Training and consulting

Contact us at enterprise@levox.ai for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

levox_cli-1.0.0.tar.gz (402.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

levox_cli-1.0.0-py3-none-any.whl (326.0 kB view details)

Uploaded Python 3

File details

Details for the file levox_cli-1.0.0.tar.gz.

File metadata

  • Download URL: levox_cli-1.0.0.tar.gz
  • Upload date:
  • Size: 402.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for levox_cli-1.0.0.tar.gz
Algorithm Hash digest
SHA256 5ee4dc3b42685442e853f6c71cde6da9974f7f2499ab9e290b71a529501d280e
MD5 cd1d0ee370c01ec563624e87caa0b287
BLAKE2b-256 f81a74758f4b05c68b213c06e40fc5405d555f037bbe4c5e2e4663a6b4f1b7a6

See more details on using hashes here.

File details

Details for the file levox_cli-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: levox_cli-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 326.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for levox_cli-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4dc9cceb82ccbb05bed5cb7842d37ec9795e55094d5b7df3f2f5543fbd7ea78a
MD5 2f2126cbb1f16e7aa67d044bcafe604c
BLAKE2b-256 a7e85e504b09bdbe27ec0859b8b86e01aec6d07ec5f6d43d536c3aed1535a9e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page