Skip to main content

A Python library for pseudonymizing Ukrainian text using Presidio analyzer with Ukrainian NER model

Project description

Ukrainian Text Pseudonymizer

A Python library for pseudonymizing Ukrainian text using the Presidio analyzer framework with a Ukrainian NER model. This tool can identify and anonymize various types of entities in Ukrainian text, including:

  • Person names
  • Job titles
  • Locations
  • Organizations
  • Date/Time expressions
  • Email addresses
  • Credit card numbers
  • URLs
  • Phone numbers

Requirements

  • Python 3.8
  • Git LFS (for model download)

Installation

  1. Install the package using uv:
uv pip install pseudonymizer-uk
  1. Install Git LFS (required for model download):
git lfs install
  1. Download the Ukrainian NER model:
git clone https://huggingface.co/dchaplinsky/uk_ner_web_trf_13class

Usage

from pseudonymizer_uk import UkPseudonymizer

# Initialize the pseudonymizer with the path to the downloaded model
pseudonymizer = UkPseudonymizer(path_to_model="./uk_ner_web_trf_13class")

# Pseudonymize text
text = "Іван Франко народився в селі Нагуєвичі"
anonymized_text = pseudonymizer.pseudonymize(text)

Supported Entity Types

By default, the pseudonymizer recognizes the following entity types:

  • PERSON
  • JOB
  • LOCATION
  • ORGANIZATION
  • DATE_TIME

You can customize which entities to recognize by passing the entities parameter:

pseudonymizer = UkPseudonymizer(
    path_to_model="uk_ner_web_trf_13class",
    entities=['PERSON', 'LOCATION']  # Only recognize persons and locations
)

Custom Recognizers and Operators

You can extend the functionality by adding custom recognizers and operators:

from presidio_analyzer import EntityRecognizer
from presidio_anonymizer import OperatorConfig

# Add custom recognizer
pseudonymizer.add_custom_recognizer(your_custom_recognizer)

# Add custom operator
pseudonymizer.add_custom_operator(
    "CUSTOM_ENTITY",
    OperatorConfig("custom", {"param": "value"})
)

Development

To set up the development environment:

  1. Clone the repository:
git clone https://github.com/fox-rudie/pseudonymizer-uk.git
cd pseudonymizer-uk
  1. Create a virtual environment and install dependencies using uv:
uv venv
source .venv/bin/activate  # On Unix/macOS
# or
.venv\Scripts\activate  # On Windows

uv pip install -e ".[dev]"
  1. Install pre-commit hooks (optional):
uv pip install pre-commit
pre-commit install

Publishing

To publish a new version to PyPI:

  1. Update version in pyproject.toml and __init__.py

  2. Build the package:

uv pip install build
python -m build
  1. Upload to PyPI:
uv pip install twine
python -m twine upload dist/*

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pseudonymizer_uk-0.1.2.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pseudonymizer_uk-0.1.2-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file pseudonymizer_uk-0.1.2.tar.gz.

File metadata

  • Download URL: pseudonymizer_uk-0.1.2.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for pseudonymizer_uk-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d22c1b0f87deb3be61ea7f6b2066e5991da2d6bfb5143fa94f7b128fd817da6c
MD5 502fe6655130cc6290ba6d1646c690dd
BLAKE2b-256 afef7c89623b65da7c7a862bb3bf98393f8961dcfc2176c800d0c3f9848ca23b

See more details on using hashes here.

File details

Details for the file pseudonymizer_uk-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for pseudonymizer_uk-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 13769899c54ec2f72f785148ca03b8842b8161db89e2a8f4708ff27813a11183
MD5 3daba180a810dce466a1d9e750bbee55
BLAKE2b-256 47cbb5e26487afb81f4738cdb6916b2207ef03c34cb7f41202a4219afadec5d5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page