Skip to main content

A Python library for pseudonymizing Ukrainian text using Presidio analyzer with Ukrainian NER model

Project description

Ukrainian Text Pseudonymizer

A Python library for pseudonymizing Ukrainian text using the Presidio analyzer framework with a Ukrainian NER model. This tool can identify and anonymize various types of entities in Ukrainian text, including:

  • Person names
  • Job titles
  • Locations
  • Organizations
  • Date/Time expressions
  • Email addresses
  • Credit card numbers
  • URLs
  • Phone numbers

Requirements

  • Python 3.8
  • Git LFS (for model download)

Installation

  1. Install the package using uv:
uv pip install pseudonymizer-uk
  1. Install Git LFS (required for model download):
git lfs install
  1. Download the Ukrainian NER model:
git clone https://huggingface.co/dchaplinsky/uk_ner_web_trf_13class

Usage

from pseudonymizer_uk import UkPseudonymizer

# Initialize the pseudonymizer with the path to the downloaded model
pseudonymizer = UkPseudonymizer(path_to_model="./uk_ner_web_trf_13class")

# Pseudonymize text
text = "Іван Франко народився в селі Нагуєвичі"
anonymized_text = pseudonymizer.pseudonymize(text)

Supported Entity Types

By default, the pseudonymizer recognizes the following entity types:

  • PERSON
  • JOB
  • LOCATION
  • ORGANIZATION
  • DATE_TIME

You can customize which entities to recognize by passing the entities parameter:

pseudonymizer = UkPseudonymizer(
    path_to_model="uk_ner_web_trf_13class",
    entities=['PERSON', 'LOCATION']  # Only recognize persons and locations
)

Custom Recognizers and Operators

You can extend the functionality by adding custom recognizers and operators:

from presidio_analyzer import EntityRecognizer
from presidio_anonymizer import OperatorConfig

# Add custom recognizer
pseudonymizer.add_custom_recognizer(your_custom_recognizer)

# Add custom operator
pseudonymizer.add_custom_operator(
    "CUSTOM_ENTITY",
    OperatorConfig("custom", {"param": "value"})
)

Development

To set up the development environment:

  1. Clone the repository:
git clone https://github.com/fox-rudie/pseudonymizer-uk.git
cd pseudonymizer-uk
  1. Create a virtual environment and install dependencies using uv:
uv venv
source .venv/bin/activate  # On Unix/macOS
# or
.venv\Scripts\activate  # On Windows

uv pip install -e ".[dev]"
  1. Install pre-commit hooks (optional):
uv pip install pre-commit
pre-commit install

Publishing

To publish a new version to PyPI:

  1. Update version in pyproject.toml and __init__.py

  2. Build the package:

uv pip install build
python -m build
  1. Upload to PyPI:
uv pip install twine
python -m twine upload dist/*

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pseudonymizer_uk-0.1.3.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pseudonymizer_uk-0.1.3-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file pseudonymizer_uk-0.1.3.tar.gz.

File metadata

  • Download URL: pseudonymizer_uk-0.1.3.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for pseudonymizer_uk-0.1.3.tar.gz
Algorithm Hash digest
SHA256 8488f49337a02159272be39cb5ad0373c8a1d1318214d78241f31a84c4836d2f
MD5 e6c1d67218d938caa8ade1078ac56749
BLAKE2b-256 ef6a320d1b06357dd0a5272bfd7fe95fce9901a9a9bad3f4ed1c84741625fd0c

See more details on using hashes here.

File details

Details for the file pseudonymizer_uk-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for pseudonymizer_uk-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d996856590b1cae51402a3daac9a5a35679ab82e97a815b222059e05a00cd8ee
MD5 9da91637e68a97ccc2ebf5d7184e54ca
BLAKE2b-256 5c87b8c5057bb1a1a8a0f536a26ae3c7ee53cb3aadb91d9fd63f580b57f96f3e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page