Production-grade PII/GDPR detection CLI with multi-level analysis
Project description
Levox - Production-Grade PII/GDPR Detection CLI
Levox is a high-performance, enterprise-grade CLI application for detecting Personally Identifiable Information (PII) and ensuring GDPR compliance in codebases. Built with a multi-tier detection architecture, it provides fast, accurate scanning with minimal false positives.
๐ Features
- 7-Stage Detection Pipeline: Regex โ AST Analysis โ Context Analysis โ Dataflow โ CFG Analysis โ ML Filtering โ GDPR Compliance
- Multi-Language Support: Python, JavaScript, and extensible parser architecture
- Performance Optimized: <10s incremental scans, <30s full repository scans
- Enterprise Licensing: Standard, Premium, and Enterprise tiers with feature gates
- Low False Positives: Target <10% false positive rate
- Memory Efficient: Memory-mapped file operations for large codebases
- Comprehensive Logging: Structured logging with performance metrics
๐๏ธ Architecture
levox/
โโโ levox/
โ โโโ cli.py # Main CLI entry point
โ โโโ core/
โ โ โโโ engine.py # Detection engine orchestrator
โ โ โโโ config.py # Configuration management
โ โ โโโ exceptions.py # Custom exceptions
โ โโโ detection/
โ โ โโโ regex_engine.py # Stage 1: Optimized regex detection
โ โ โโโ ast_analyzer.py # Stage 2: AST-based context analysis
โ โ โโโ context_analyzer.py # Stage 3: Semantic context analysis
โ โ โโโ dataflow.py # Stage 4: Taint/dataflow analysis
โ โ โโโ cfg_analyzer.py # Stage 5: Control Flow Graph analysis
โ โ โโโ ml_filter.py # Stage 6: ML-based false positive reduction
โ โโโ parsers/
โ โ โโโ base.py # Base parser interface
โ โ โโโ python_parser.py # Python AST parser
โ โ โโโ javascript_parser.py # JS parser
โ โ โโโ multi_lang.py # Multi-language coordinator
โ โโโ utils/
โ โ โโโ file_handler.py # Memory-mapped file operations
โ โ โโโ validators.py # Luhn, format validators
โ โ โโโ performance.py # Performance monitoring
โ โโโ models/
โ โโโ detection_result.py # Result data models
โ โโโ confidence.py # Confidence scoring
๐ฆ Installation
From Source
git clone https://github.com/levox/levox.git
cd levox
pip install -e .
From PyPI
pip install levox
From PyPI (Development Version)
pip install --upgrade levox
๐ Quick Start
Basic Usage
# Scan current directory
levox scan
# Scan specific directory
levox scan /path/to/codebase
# Scan with CFG analysis (Premium+)
levox scan --cfg
# Generate detailed report
levox scan --output report.json --format json
# Configure detection rules
levox configure --rules custom-rules.yaml
Advanced CFG Analysis
# Enable deep scanning with CFG analysis
levox scan --cfg --cfg-confidence 0.7
# Alternative flag name
levox scan --deep-scan
# Full enterprise scan with all stages
levox scan --license-tier enterprise --cfg --format json
CLI Commands
levox scan- Scan codebase for PII/GDPR violationslevox configure- Configure detection rules and settingslevox report- Generate and view reportslevox feedback- Provide feedback to improve detection
โ๏ธ Configuration
Detection Pipeline Stages
STAGE 1: Regex Detection (Basic)
- Fast pattern matching for basic PII patterns
- Optimized regex engine with minimal false positives
STAGE 2: AST Analysis (Premium+)
- Abstract syntax tree parsing for code structure
- Multi-language support with Tree-sitter
STAGE 3: Context Analysis (Premium+)
- Semantic analysis of variable/function names
- Context-aware false positive reduction
STAGE 4: Dataflow Analysis (Enterprise)
- Tracks data movement through code
- Taint analysis for sensitive data flows
STAGE 5: CFG Analysis (Premium+)
- Control Flow Graph analysis for complex PII flows
- Detects conditional exposure, loop accumulation, transformation chains
STAGE 6: ML Filtering (Enterprise)
- Machine learning false positive reduction
- Confidence scoring and validation
STAGE 7: GDPR Compliance (Premium+)
- Regulatory compliance checking
- Audit logging and reporting
License Tiers
- Standard: Basic regex detection, limited language support
- Premium: AST analysis, context analysis, CFG analysis, GDPR compliance
- Enterprise: Full 7-stage pipeline including dataflow and ML filtering
Detection Rules
Create custom detection rules in configs/rules.yaml:
patterns:
credit_card:
regex: '\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b'
confidence: 0.8
risk_level: high
email:
regex: '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
confidence: 0.9
risk_level: medium
๐ Detection Pipeline
Level 1: Regex Engine
- High-performance pattern matching
- Optimized for common PII formats
- Fast initial screening
Level 2: AST Analysis
- Context-aware detection
- Variable name analysis
- Comment and string extraction
Level 3: Dataflow Analysis
- Taint tracking
- Variable propagation
- Cross-function analysis
Level 4: ML Filtering
- False positive reduction
- Context classification
- Confidence scoring
๐ Performance
- Incremental Scans: <10 seconds for modified files
- Full Repository: <30 seconds for 10,000 files
- Memory Usage: <500MB for large codebases
- False Positive Rate: Target <10%
๐งช Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=levox
# Run specific test suite
pytest tests/test_detection/
๐ค Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Support
- Documentation: docs.levox.ai
- Issues: GitHub Issues
- Discussions: GitHub Discussions
๐ Enterprise Support
For enterprise customers, we offer:
- Custom detection rules
- API integration
- Dedicated support
- Training and consulting
Contact us at enterprise@levox.ai for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file levox_cli-1.0.0.tar.gz.
File metadata
- Download URL: levox_cli-1.0.0.tar.gz
- Upload date:
- Size: 402.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ee4dc3b42685442e853f6c71cde6da9974f7f2499ab9e290b71a529501d280e
|
|
| MD5 |
cd1d0ee370c01ec563624e87caa0b287
|
|
| BLAKE2b-256 |
f81a74758f4b05c68b213c06e40fc5405d555f037bbe4c5e2e4663a6b4f1b7a6
|
File details
Details for the file levox_cli-1.0.0-py3-none-any.whl.
File metadata
- Download URL: levox_cli-1.0.0-py3-none-any.whl
- Upload date:
- Size: 326.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4dc9cceb82ccbb05bed5cb7842d37ec9795e55094d5b7df3f2f5543fbd7ea78a
|
|
| MD5 |
2f2126cbb1f16e7aa67d044bcafe604c
|
|
| BLAKE2b-256 |
a7e85e504b09bdbe27ec0859b8b86e01aec6d07ec5f6d43d536c3aed1535a9e9
|