Skip to main content

Open-Source Static Analysis for Privacy Data Flows

Project description

truScanner from truConsent

PyPI version License

Open-Source Static Analysis for Privacy Data Flows

truScanner is a static code analysis tool designed to discover and analyze personal data elements in your source code. It helps developers and security teams identify privacy-related data flows and generate comprehensive reports.

๐Ÿ“ฆ PyPI Project โ€ข ๐ŸŒ App Dashboard

๐Ÿš€ Features

  • Comprehensive Detection: Identifies 300+ personal data elements (PII, financial data, device identifiers, etc.)
  • Full Catalog Coverage: Loads and scans against all configured data elements from data_elements/ (not a truncated subset)
  • Interactive Menu: Arrow-key navigable menu for selecting output formats
  • Real-time Progress: Visual progress indicator during scanning
  • Multiple Report Formats: Generate reports in TXT, Markdown, or JSON format
  • AI-Powered Enhancement: Optional integration with Ollama or OpenAI for deeper context
  • Backend Integration: Optional upload to backend API for centralized storage
  • Auto-incrementing Reports: Automatically manages report file naming to prevent overwrites

truScanner CLI

TruScanner Terminal Demo

๐Ÿ“ฆ Installation

Prerequisites

  • Python 3.9 or higher
  • ollama (optional, for local AI scanning)

Quick Install

Using pip:

pip install truscanner

Using uv:

uv pip install truscanner

Verify installation:

truscanner --help

๐Ÿ› ๏ธ Usage

Basic Usage

Scan a directory with the interactive menu:

truscanner scan <directory_path>

Example

truscanner scan ./src
truscanner scan ./my-project
truscanner scan C:\Users\username\projects\my-app

Python API Usage

Use truScanner directly from Python:

import truscanner

# Local path
check = truscanner("/path/to/project")

# file:// URL also works
check = truscanner("file:///Users/username/project")

# Optional explicit call style
check = truscanner.scan("/path/to/project", with_ai=False)

# API metadata: total configured catalog size
print(check["configured_data_elements"])

Minimal script style:

import truscanner
scan = truscanner("folder_path")

Runnable root example:

python3 simple_truscanner_usage.py ./src

Quick smoke check script:

uv run python scripts/check_truscanner_api.py ./src

Installed-package verification (CLI + Python API):

python3 verify_truscanner_install.py

Interactive Workflow

  1. Select Output Format:

    • Use arrow keys (โ†‘โ†“) to navigate
    • Press Enter to select
    • Options: txt, md, json, or All (generates all three formats)
  2. Scanning Progress:

    • Real-time progress bar shows file count and percentage
    • Prints configured definition count at start (example: Loaded data element definitions: 380)
    • Example: Scanning: 50/200 (25%) [โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘] filename.js
  3. AI Enhanced Scan (Optional):

    • After the initial scan, you'll be prompted: Do you want to use Ollama/AI for enhanced PII detection (find what regex missed)? (Y, N):
    • This uses local LLMs (via Ollama) or OpenAI to find complex PII.
    • Live scanning timer: AI Scanning: filename.js... (5.2s taken)
  4. Report Generation:

    • Reports are saved in reports/{directory_name}/ folder
    • Files are named: truscan_report.txt, truscan_report.md, truscan_report.json
    • Subsequent scans auto-increment: truscan_report1.txt, truscan_report2.txt, etc.
    • AI findings are saved with _llm suffix.
  5. Backend Upload (Optional):

    • After reports are saved, you'll be prompted: Do you want to upload the scan report for the above purpose? (Y, N):
    • Enter Y to upload scan results to backend API
    • View your uploaded scans and analytics at app.truconsent.io

Command Options

truscanner scan <directory> [OPTIONS]

Options:
  --with-ai          Enable AI/LLM scanner directly
  --ai-mode          AI scan mode: fast, balanced, or full (default: balanced)
  --personal-only    Only report personal identifiable information (PII)
  --help             Show help message

AI Speed vs Coverage Modes

Use --ai-mode to control AI scan behavior:

  • fast: Small prompts, fastest runtime, may skip very large low-signal files
  • balanced (default): Good speed while keeping broad file coverage
  • full: Largest context and highest coverage, slowest runtime

Examples:

truscanner scan ./src --ai-mode fast
truscanner scan ./src --ai-mode balanced
truscanner scan ./src --ai-mode full

๐Ÿ“Š Report Output

Report Location

Reports are saved in: reports/{sanitized_directory_name}/

Report Formats

  • TXT Report (truscan_report.txt): Plain text format, easy to read
  • Markdown Report (truscan_report.md): Formatted markdown with headers and code blocks
  • JSON Report (truscan_report.json): Structured JSON data for programmatic access

Report Contents

Each report includes:

  • Scan Report ID: Unique 32-bit hash identifier
  • Summary: Configured data elements, distinct detected elements, total findings, and time taken
  • Findings by File: Detailed list of data elements found in each file
  • Summary by Category: Aggregated statistics by data category

JSON reports also include:

  • configured_data_elements
  • distinct_detected_elements

Report ID

Each scan generates a unique Scan Report ID (32-bit MD5 hash) that:

  • Appears in the terminal after scanning
  • Is included at the top of all generated report files
  • Can be used to track and reference specific scans

๐Ÿ”ง Configuration

The truscanner package is pre-configured with the live backend URL for seamless scan uploads. No additional configuration is required.

๐Ÿ“ Project Structure

truscanner/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ main.py              # CLI entry point
โ”‚   โ”œโ”€โ”€ regex_scanner.py     # Core scanning engine
โ”‚   โ”œโ”€โ”€ ai_scanner.py        # AI/LLM scanning engine
โ”‚   โ”œโ”€โ”€ report_utils.py      # Report utilities
โ”‚   โ””โ”€โ”€ utils.py             # Utilities
โ”œโ”€โ”€ data_elements/           # Data element definitions
โ”œโ”€โ”€ reports/                 # Generated reports
โ”œโ”€โ”€ pyproject.toml           # Project configuration
โ””โ”€โ”€ README.md

๐Ÿ“ Change Policy

For this repository, every code or behavior change must include a matching README update in the same change.

This includes:

  • CLI flags, prompts, defaults, scan behavior, output format changes
  • Python API changes (import truscanner, return schema, parameters)
  • Dependency/runtime requirements
  • Report format/location updates

๐Ÿค Support

For issues, questions, or contributions, please contact: hello@truconsent.io

MIT License - see LICENSE file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

truscanner-0.2.7.tar.gz (39.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

truscanner-0.2.7-py3-none-any.whl (52.4 kB view details)

Uploaded Python 3

File details

Details for the file truscanner-0.2.7.tar.gz.

File metadata

  • Download URL: truscanner-0.2.7.tar.gz
  • Upload date:
  • Size: 39.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for truscanner-0.2.7.tar.gz
Algorithm Hash digest
SHA256 c5b759b014185effd2ba0fe6723726da5037592f47066ec06d99bce037ffe64b
MD5 285d9ba139f3dd8b695a6d1a76e2be4b
BLAKE2b-256 e8af20ec44acc712b1eddefb2c3f03338b93282110b7c103b5d4fa544d7d5f1b

See more details on using hashes here.

File details

Details for the file truscanner-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: truscanner-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 52.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for truscanner-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 190ed465cae5d8b90a9411c70e5ca6ab8e5df468d4ee799867ed9ab6b049c51c
MD5 36506a0082b2c3ee8d81082b0deb3152
BLAKE2b-256 07d512d4dadc9229d836e85defb77caf6049968319191f0fb9e69adc19047863

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page