Skip to main content

This package is made to censor sensitive data in images and extract the contents. NER is planned for the future

Project description

AGL Anonymizer Pipeline

This Module is designed to work with the Django API AGL Anonymizer.

The AGL Anonymizer Pipeline is a comprehensive Python module designed for image processing with specific functionalities for anonymization using common german names, saving, blurring, and OCR (Optical Character Recognition). This tool is particularly useful in scenarios where sensitive information needs to be redacted from images or documents while retaining the overall context and visual structure.

Features

  • Text detection and anonymization: Utilizes advanced OCR techniques to detect text in images and applies anonymizing to safeguard sensitive information.
  • Blurring Functionality: Offers customizable blurring options to obscure parts of an image, providing an additional layer of privacy.
  • Image Saving: Efficiently saves processed images in a desired format, maintaining high-quality output.
  • Extensive Format Support: Capable of handling various image and document formats for a wide range of applications.

Installation

To get started with AGL Anonymizer, clone this repository and install the required dependencies.

git clone https://github.com/maxhild/agl_anonymizer_pipeline.git cd agl_anonymizer_pipeline nix develop dowload a text detection model like frozen_east_text_detection.pb and place it inside the agl_anonymizer_pipeline folder.

Usage

To use AGL Anonymizer Pipeline, follow these steps:

Prepare Your Images: Place the images you want to process in the designated folder. Configure Settings: Adjust the settings in the configuration file (if applicable) to suit your anonymizing and blurring needs. Run the Module: Execute the main script from the command line to process the images. bash

code:

python main.py --image images/your_image.jpg --east frozen_east_text_detection.pb

example:

python main.py --image images/lebron_james.jpg --east frozen_east_text_detection.pb

Modules

AGL Anonymizer is comprised of several key modules:

OCR Module: Detects and extracts text from images. Anonymizer Module: Applies anonymizing techniques to identified sensitive text regions. Blur Module: Provides functions to blur specific areas in the image. Save Module: Handles the saving of processed images in a chosen format. Customization

You can customize the behavior of AGL Anonymize by modifying the parameters in the config.py file (if included). This includes adjusting the OCR sensitivity, blur intensity, and more.

Contributing

Contributions to the AGL Anonymizer Pipeline are welcome! If you have suggestions for improvements or bug fixes, please open an issue or a pull request.

TO DO:

  • UTF-8 Handling of Names
  • Improving the text region detection
  • Re-adding Tesseract OCR or adding Paddle OCR for improved full text document OCR.

License

This project is licensed under the MIT License.

Contact

For any inquiries or assistance with AGL Anonymizer, please contact Max Hild at Maxhild10@gmail.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agl_anonymizer_pipeline-0.1.5.tar.gz (173.6 kB view details)

Uploaded Source

Built Distribution

agl_anonymizer_pipeline-0.1.5-py3-none-any.whl (225.1 kB view details)

Uploaded Python 3

File details

Details for the file agl_anonymizer_pipeline-0.1.5.tar.gz.

File metadata

  • Download URL: agl_anonymizer_pipeline-0.1.5.tar.gz
  • Upload date:
  • Size: 173.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Linux/6.10.10

File hashes

Hashes for agl_anonymizer_pipeline-0.1.5.tar.gz
Algorithm Hash digest
SHA256 6bdf3d69a3e7f850bfc68e3f63959b0e2586a1ff3b403fb57168ee82f09d1426
MD5 b595e034ab740ea7be4eb3477f62e99c
BLAKE2b-256 153363fa4bb0b04e2c2441e2c1e8e1dd93aa6d3acd1459345c0b8528af3f1cf9

See more details on using hashes here.

File details

Details for the file agl_anonymizer_pipeline-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for agl_anonymizer_pipeline-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 2f5c42aa1701a987eb0e0a5092e7c609fd426a65a27c1a8b0cb814cdf3b3a09e
MD5 210ca95f8e50b920bb0a3ef994188b46
BLAKE2b-256 c1b7eb1ca86ad4768cab2bc8ccf824b64753be4c533c712819b9c05eb8bd5a2e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page