视频内容脱敏工具 - 基于OCR的信息识别与打码系统

These details have not been verified by PyPI

Project links

Project description

Privision - Video Content Privacy Masking Tool

Universal Video Content Privacy Solution - Intelligent Information Recognition and Masking System Based on OCR

中文文档 | English

Privision is a powerful video content privacy masking tool that uses advanced OCR technology to automatically identify and mask sensitive information in videos. It supports multiple detection modes including phone numbers, ID card numbers, and custom keywords, and provides three usage methods: command-line interface, batch processing, and RESTful API.

🌟 Key Features

🎯 Multi-Scenario Detection Support

📱 Phone Number Detection: Accurately identifies 11-digit Chinese mainland phone numbers with smart false-positive filtering
🆔 ID Card Number Detection: Recognizes 18-digit Chinese mainland ID card numbers with basic validity verification
🔑 Keyword Detection: Custom keyword lists for flexible detection of any sensitive terms
🔌 Extensible Architecture: Factory pattern design for easy addition of new detector types

⚡ High-Performance Processing

🚀 Dual Processing Modes:
- Frame-by-Frame Mode: Precise recognition for high-accuracy scenarios
- Smart Sampling Mode: 10-30x speed improvement for most scenarios
💎 GPU Acceleration: CUDA support for significantly improved processing speed
🎯 Precise Positioning: Iterative optimization algorithm ensuring only target content is masked, avoiding false positives

🎨 Flexible Masking Methods

Gaussian Blur: Natural and smooth blur effect
Pixelate: Classic mosaic effect
Black Mask: Complete coverage for strong protection

🛠 Multiple Usage Methods

Command-Line Tool: Simple and easy to use for single video processing
Batch Processing: Directory-level batch processing with recursive subdirectory support
RESTful API: Complete HTTP API with asynchronous task queue
Visual Debugging: Real-time preview of detection results and masking effects

⚡ Quick Start

For Users (Recommended)

# Install from PyPI
pip install privision

# Verify installation
privision --help

For GPU acceleration, please refer to the GPU Acceleration Installation section

Basic Usage

# 1. Detect and mask phone numbers
privision input.mp4 output.mp4

# 2. Detect ID card numbers
privision input.mp4 output.mp4 --detector idcard

# 3. Detect custom keywords
privision input.mp4 output.mp4 --detector keyword --keywords password account name

# 4. Smart sampling mode (fast)
privision input.mp4 output.mp4 --mode smart

# 5. GPU acceleration
privision input.mp4 output.mp4 --device gpu:0 --mode smart

🚀 Installation

System Requirements

Python 3.8+
pip
(Optional) NVIDIA GPU + CUDA Toolkit

For Users: Install from PyPI

Install the latest stable version:

# Install from PyPI
pip install privision

# Verify installation
privision --help

After installation, you can directly use the following commands:

privision - Single video processing
privision-batch - Batch processing
privision-server - API server

Basic usage example:

# Detect and mask phone numbers
privision input.mp4 output.mp4

# Smart sampling mode (10-30x faster)
privision input.mp4 output.mp4 --mode smart

For Developers: Install from Source

Method 1: Development Mode Installation (Recommended)

# Clone the repository
git clone https://github.com/0xyk3r/Privision.git
cd Privision

# Install in development mode
pip install -e .

# Verify installation
privision --help

After installation, you can directly use the following commands:

privision - Single video processing
privision-batch - Batch processing
privision-server - API server

Method 2: Install Dependencies Only

# Clone the repository
git clone https://github.com/0xyk3r/Privision.git
cd Privision

# Install dependencies only
pip install -r requirements.txt

With this method, run the program using python -m privision.main

GPU Acceleration Installation

Check CUDA Version:

nvidia-smi  # Check "CUDA Version: xx.x" in the upper right corner

Install GPU Dependencies:

# Install common dependencies first
pip install -r requirements.txt

# Choose installation based on CUDA version
# CUDA 11.8
python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

# CUDA 12.6
python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/

# CUDA 12.9
python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/

Verify GPU Installation:

python -c "import paddle; print('GPU available:', paddle.device.is_compiled_with_cuda())"

Core Dependencies

paddlepaddle >= 3.0.0 - Deep learning framework
paddleocr >= 3.0.0 - OCR recognition engine
opencv-python >= 4.8.0 - Video processing
numpy >= 1.24.0 - Numerical computing
fastapi >= 0.104.0 - API framework
rich >= 13.0.0 - Terminal beautification

📖 Usage Guide

1. Command-Line Tool

Basic Usage

# Detect phone numbers (default)
privision input.mp4 output.mp4

# Detect ID card numbers
privision input.mp4 output.mp4 --detector idcard

# Detect custom keywords
privision input.mp4 output.mp4 --detector keyword --keywords password account username

# Smart sampling mode (recommended)
privision input.mp4 output.mp4 --mode smart

# GPU acceleration
privision input.mp4 output.mp4 --device gpu:0

Advanced Options

Choose Masking Method:

# Gaussian blur (default)
privision input.mp4 output.mp4 --blur-method gaussian

# Pixelate (mosaic)
privision input.mp4 output.mp4 --blur-method pixelate

# Black mask
privision input.mp4 output.mp4 --blur-method black

Precise Location Mode:

# Enable precise location to avoid masking irrelevant content
privision input.mp4 output.mp4 --precise-location

Visual Debugging:

# Show real-time processing window
privision input.mp4 output.mp4 --visualize

Complete Parameters

Positional Arguments:
  input                         Input video file path
  output                        Output video file path

Detector Settings:
  --detector {phone,keyword,idcard}
                                Detector type [default: phone]
                                  phone   - Phone number detection
                                  keyword - Keyword detection
                                  idcard  - ID card number detection

  --keywords WORD [WORD ...]    Keyword list (keyword detector only)
  --case-sensitive              Keywords are case-sensitive (keyword detector only)

Processing Mode:
  --mode {frame-by-frame,smart}
                                Processing mode [default: frame-by-frame]
                                  frame-by-frame - Frame-by-frame processing
                                  smart          - Smart sampling

Masking Settings:
  --blur-method {gaussian,pixelate,black}
                                Masking method [default: gaussian]
  --blur-strength INT           Blur strength (must be odd) [default: 51]

Device Settings:
  --device DEVICE               Computing device (cpu, gpu:0, gpu:1, ...) [default: cpu]

Sampling Settings (smart mode only):
  --sample-interval FLOAT       Sampling interval (seconds) [default: 1.0]
  --buffer-time FLOAT           Buffer time (seconds)

Precise Location:
  --precise-location            Enable precise location mode
  --precise-max-iterations INT  Maximum iterations [default: 3]

Interface Settings:
  --visualize                   Enable visualization window
  --no-rich                     Disable Rich UI

Other:
  -h, --help                    Show help message

2. Batch Processing

Use the privision-batch command to batch process all videos in a directory.

Basic Usage

# Batch process directory
privision-batch input_dir/ output_dir/

# Process subdirectories recursively
privision-batch input_dir/ output_dir/ --recursive

# Use ID card detector for batch processing
privision-batch input_dir/ output_dir/ --detector idcard

# Smart mode + GPU acceleration
privision-batch input_dir/ output_dir/ --mode smart --device gpu:0

Parameters

Positional Arguments:
  input_dir                     Input video directory
  output_dir                    Output video directory

Detector Settings:
  --detector {phone,keyword,idcard}
                                Detector type [default: phone]
  --keywords WORD [WORD ...]    Keyword list (keyword detector only)
  --case-sensitive              Keywords are case-sensitive

Optional Arguments:
  --blur-method {gaussian,pixelate,black}
                                Masking method [default: gaussian]
  --device DEVICE               Computing device [default: cpu]
  --mode {frame-by-frame,smart}
                                Processing mode [default: frame-by-frame]
  --recursive                   Process subdirectories recursively
  --output-suffix SUFFIX        Output file suffix [default: _masked]

Supported video formats: .mp4, .avi, .mov, .mkv, .flv, .wmv, .webm

3. API Service

Use privision-server to start the FastAPI server, providing RESTful API endpoints.

Start Server

# Start with default configuration
privision-server

# Custom port
privision-server --port 9000

# Custom data directory
privision-server --data-dir /path/to/data

After server starts:

API Service: http://localhost:8000
Interactive Docs: http://localhost:8000/docs
API Documentation: http://localhost:8000/redoc

API Endpoints

1. Create Task

POST /api/tasks

curl -X POST "http://localhost:8000/api/tasks" \
  -F "file=@test.mp4" \
  -F "detector_type=phone" \
  -F "blur_method=gaussian" \
  -F "device=cpu"

Supported parameters:

file: Video file to process (required)
detector_type: Detector type (phone/keyword/idcard)
keywords: Keyword list (keyword detector only)
case_sensitive: Case-sensitive (keyword detector only)
blur_method: Masking method (gaussian/pixelate/black)
blur_strength: Blur strength (Gaussian blur only, odd number, default 51)
device: Computing device (cpu, gpu:0, gpu:1, etc.)
sample_interval: Sampling interval (seconds)
buffer_time: Buffer time (seconds)
precise_location: Enable precise location
precise_max_iterations: Maximum iterations for precise location (default 3)

Response:

{
  "task_id": "550e8400-e29b-41d4-a716-446655440000",
  "message": "Task created successfully"
}

2. Query Task Progress

GET /api/tasks/{task_id}

curl "http://localhost:8000/api/tasks/{task_id}"

3. Download Result

GET /api/tasks/{task_id}/download

curl -O -J "http://localhost:8000/api/tasks/{task_id}/download"

4. Get Task List

GET /api/tasks?status={status}&limit={limit}

Supported parameters:

status (optional): Filter by status (pending/processing/completed/failed)
limit (optional): Maximum number of tasks to return, default 100

5. Delete Task

DELETE /api/tasks/{task_id}

Python Client Example

import requests
import time

API_BASE = "http://localhost:8000"

# 1. Upload video
with open("test.mp4", "rb") as f:
    files = {"file": f}
    data = {
        "detector_type": "phone",
        "blur_method": "gaussian",
        "device": "cpu"
    }
    response = requests.post(f"{API_BASE}/api/tasks", files=files, data=data)
    task_id = response.json()["task_id"]

# 2. Poll for progress
while True:
    response = requests.get(f"{API_BASE}/api/tasks/{task_id}")
    status = response.json()

    if status['status'] == 'completed':
        break
    time.sleep(2)

# 3. Download result
response = requests.get(f"{API_BASE}/api/tasks/{task_id}/download")
with open("output.mp4", "wb") as f:
    f.write(response.content)

🎯 Detector Documentation

1. Phone Number Detector (phone)

Function: Identifies 11-digit Chinese mainland phone numbers

Features:

Regex matching: 1[3-9]\d{9}
Smart filtering of long digit strings and false positives
Context validation to avoid misidentification

Usage:

privision input.mp4 output.mp4 --detector phone

2. ID Card Number Detector (idcard)

Function: Identifies 18-digit Chinese mainland ID card numbers

Features:

Regex matching: \d{17}[\dXx]
Date validity verification
Excludes invalid numbers

Usage:

privision input.mp4 output.mp4 --detector idcard

3. Keyword Detector (keyword)

Function: Detects custom keywords

Features:

Custom keyword list support
Chinese and English mixed support
Optional case sensitivity
Smart boundary matching

Usage:

# Default keywords (password, account, username, etc.)
privision input.mp4 output.mp4 --detector keyword

# Custom keywords
privision input.mp4 output.mp4 --detector keyword --keywords name phone address

# Case-sensitive
privision input.mp4 output.mp4 --detector keyword --keywords Password --case-sensitive

Extending Custom Detectors

The project uses factory pattern design for easy extension of new detectors:

Inherit from BaseDetector base class
Implement required abstract methods
Register in DetectorFactory

See src/privision/core/detector_base.py and src/privision/core/detector_factory.py for details

🏗 Project Architecture

Directory Structure

Privision/
├── src/                          # Source code
├── ├── privision/                # Main package
│   │  ├── main.py                   # CLI entry point
│   │  ├── batch.py                  # Batch processing entry
│   │  ├── server.py                 # API server entry
│   │  │
│   │  ├── core/                     # Core functionality
│   │  │   ├── video_processor.py   # Video processor (frame-by-frame/smart)
│   │  │   ├── ocr_detector.py      # OCR detection
│   │  │   ├── detector_base.py     # Detector base class
│   │  │   ├── detector_factory.py  # Detector factory
│   │  │   ├── detectors/           # Detector implementations
│   │  │   │   ├── phone_detector.py
│   │  │   │   ├── idcard_detector.py
│   │  │   │   └── keyword_detector.py
│   │  │   ├── precise_locator.py   # Precise location
│   │  │   ├── blur.py              # Masking effects
│   │  │   └── bbox_calculator.py   # Bounding box calculation
│   │  │
│   │  ├── api/                      # API service
│   │  │   └── task_queue.py        # Task queue management
│   │  │
│   │  ├── ui/                       # User interface
│   │  │   ├── rich_ui.py           # Rich terminal UI
│   │  │   ├── progress.py          # Progress callback interface
│   │  │   └── visualizer.py        # Visualization window
│   │  │
│   │  ├── config/                   # Configuration management
│   │  │   └── args.py              # Argument parsing
│   │  │
│   │  └── test/                     # Test modules
│
├── pyproject.toml                # Project configuration
├── requirements.txt              # Dependency list
├── README.md                     # This document

Core Modules

Detector Architecture

BaseDetector (Abstract Base Class)
    ├── PhoneDetector (Phone numbers)
    ├── IDCardDetector (ID card numbers)
    └── KeywordDetector (Keywords)

DetectorFactory (Factory)
    └── create_detector()

Processing Flow

Frame-by-Frame Mode:

Video Input → Frame-by-Frame Read → OCR → Detector → Precise Location (optional) → Mask → Output

Smart Sampling Mode:

Video Input → Periodic Sampling → OCR → Detector → Record Areas → Batch Mask → Output

Technology Stack

PaddleOCR: Text detection and recognition
OpenCV: Video processing and masking
FastAPI: RESTful API framework
Rich: Terminal beautification
NumPy: Numerical computing

🚀 Performance Optimization

Recommended Configuration

1. Use GPU Acceleration

privision input.mp4 output.mp4 --device gpu:0

GPU can improve OCR speed by 3-10x

2. Use Smart Sampling Mode

privision input.mp4 output.mp4 --mode smart

Speed improvement of 10-30x, suitable for most scenarios

3. Adjust Sampling Interval

# Static scenes (phone number position changes slowly)
privision input.mp4 output.mp4 --mode smart --sample-interval 2.0

# Dynamic scenes (phone number position changes quickly)
privision input.mp4 output.mp4 --mode smart --sample-interval 0.5

4. Video Preprocessing

Ultra-high resolution videos should be downscaled first
Use H.264 encoding for faster processing

5. API Concurrent Processing

Modify the max_workers parameter in src/privision/api/task_queue.py:

get_task_queue(storage_dir=TASKS_DIR, max_workers=2)  # Increase concurrency

🔧 FAQ

Q1: How to verify GPU availability?

# Check CUDA
nvidia-smi

# Check PaddlePaddle GPU support
python -c "import paddle; print('GPU available:', paddle.device.is_compiled_with_cuda())"

Q2: Why can't I directly run `python privision/main.py`?

Because import statements use the privision.xxx format, Python needs to import privision as a package.

Solutions:

Run using python -m privision.main
Or install using pip install -e . and directly use the privision command

Q3: Why is the first run slow?

The first run automatically downloads PaddleOCR model files (about 100-200 MB), requiring network connection. After downloading, files are cached locally.

Q4: How to improve recognition accuracy?

Ensure video clarity is sufficient
Use frame-by-frame mode instead of smart sampling
Enable precise location mode: --precise-location
Complex fonts or backgrounds affect OCR performance

Q5: How to add new detectors?

Create a new detector class in src/privision/core/detectors/
Inherit from BaseDetector and implement required methods
Register in DetectorFactory._detectors
Update command-line arguments and documentation

Q6: What video formats are supported?

Supports all formats supported by OpenCV: .mp4, .avi, .mov, .mkv, .flv, .wmv, .webm

Output format currently only supports MP4.

Q7: How to deploy API service in production?

Use reverse proxy (such as Nginx)
Configure HTTPS
Modify CORS settings (in src/privision/server.py)
Use process management tools (such as systemd, supervisor)
Configure logging and monitoring

🛠 Development Guide

Development Environment Setup

# Clone repository
git clone https://github.com/0xyk3r/Privision.git
cd Privision

# Install development dependencies
pip install -e ".[dev]"

Run Tests

# Run all tests
pytest src/privision/test/

# Run specific tests
python -m privision.test.test_phone_filter
python -m privision.test.test_ocr_and_detector

Code Structure Design

Separation of Concerns: Core functionality, API, UI, and configuration are independently modularized
Configuration-Driven: Use ProcessConfig for unified configuration management
Interface Abstraction: ProgressCallback interface decouples business logic from UI
Factory Pattern: DetectorFactory manages detector creation
Extensibility: Easy to add new detectors, masking methods, and UIs

Contributing

Issues and Pull Requests are welcome!

Fork the project
Create a feature branch (git checkout -b feature/YourFeature)
Commit changes (git commit -m 'Add some YourFeature')
Push to branch (git push origin feature/YourFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License.

🙏 Acknowledgments

PaddleOCR - OCR toolkit
OpenCV - Computer vision library
FastAPI - Modern web framework
Rich - Terminal beautification library

📧 Contact

Author: 0xyk3r
GitHub: https://github.com/0xyk3r/Privision
Issues: https://github.com/0xyk3r/Privision/issues

Note: This tool is only for legal privacy protection purposes and should not be used for illegal purposes. Users are responsible for any legal liability related to videos processed with this tool.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.1

Oct 7, 2025

1.0.0

Oct 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

privision-1.0.1.tar.gz (73.4 kB view details)

Uploaded Oct 7, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

privision-1.0.1-py3-none-any.whl (83.2 kB view details)

Uploaded Oct 7, 2025 Python 3

File details

Details for the file privision-1.0.1.tar.gz.

File metadata

Download URL: privision-1.0.1.tar.gz
Upload date: Oct 7, 2025
Size: 73.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for privision-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`8ff3fa9ec6d631819ccf0b61904931a3b986e7c7ec6dc817ca3a73330ffc9ac9`
MD5	`ea22fc6baca3947ff6b41685ccd1679b`
BLAKE2b-256	`fb8342afc5bd01494b0d58c9efc942895353a03ffb85a49e8d168dab0c5dab74`

See more details on using hashes here.

File details

Details for the file privision-1.0.1-py3-none-any.whl.

File metadata

Download URL: privision-1.0.1-py3-none-any.whl
Upload date: Oct 7, 2025
Size: 83.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for privision-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fd8fb8aa8c6b21d2a6c130d9549ae9c3bc9e602ea0d43ff6c658c62f01c61db6`
MD5	`bb81c9f29abe2e3b3d9580d6c9d4dc50`
BLAKE2b-256	`220dd3ddf4da377f693c05971ba8b2aa0906e85c2332ece37ff3ec043b826256`

See more details on using hashes here.

privision 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Privision - Video Content Privacy Masking Tool

🌟 Key Features

🎯 Multi-Scenario Detection Support

⚡ High-Performance Processing

🎨 Flexible Masking Methods

🛠 Multiple Usage Methods

📋 Table of Contents

⚡ Quick Start

For Users (Recommended)

Basic Usage

🚀 Installation

System Requirements

For Users: Install from PyPI

For Developers: Install from Source

Method 1: Development Mode Installation (Recommended)

Method 2: Install Dependencies Only

GPU Acceleration Installation

Core Dependencies

📖 Usage Guide

1. Command-Line Tool

Basic Usage

Advanced Options

Complete Parameters

2. Batch Processing

Basic Usage

Parameters

3. API Service

Start Server

API Endpoints

Python Client Example

🎯 Detector Documentation

1. Phone Number Detector (phone)

2. ID Card Number Detector (idcard)

3. Keyword Detector (keyword)

Extending Custom Detectors

🏗 Project Architecture

Directory Structure

Core Modules

Detector Architecture

Processing Flow

Technology Stack

🚀 Performance Optimization

Recommended Configuration

🔧 FAQ

Q1: How to verify GPU availability?

Q2: Why can't I directly run python privision/main.py?

Q3: Why is the first run slow?

Q4: How to improve recognition accuracy?

Q5: How to add new detectors?

Q6: What video formats are supported?

Q7: How to deploy API service in production?

🛠 Development Guide

Development Environment Setup

Run Tests

Code Structure Design

Contributing

📄 License

🙏 Acknowledgments

📧 Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

Q2: Why can't I directly run `python privision/main.py`?