Skip to main content

A Python library for pseudonymizing Ukrainian text using Presidio analyzer with Ukrainian NER model

Project description

Ukrainian Text Pseudonymizer

A Python library for pseudonymizing Ukrainian text using the Presidio analyzer framework with a Ukrainian NER model. This tool can identify and anonymize various types of entities in Ukrainian text, including:

  • Person names
  • Job titles
  • Locations
  • Organizations
  • Date/Time expressions
  • Email addresses
  • Credit card numbers
  • URLs
  • Phone numbers

Requirements

  • Python 3.8
  • Git LFS (for model download)

Installation

  1. Install the package using uv:
uv pip install pseudonymizer-uk
  1. Install Git LFS (required for model download):
git lfs install
  1. Download the Ukrainian NER model:
git clone https://huggingface.co/dchaplinsky/uk_ner_web_trf_13class

Usage

from pseudonymizer_uk import UkPseudonymizer

# Initialize the pseudonymizer with the path to the downloaded model
pseudonymizer = UkPseudonymizer(path_to_model="./uk_ner_web_trf_13class")

# Pseudonymize text
text = "Іван Франко народився в селі Нагуєвичі"
anonymized_text = pseudonymizer.pseudonymize(text)

Supported Entity Types

By default, the pseudonymizer recognizes the following entity types:

  • PERSON
  • JOB
  • LOCATION
  • ORGANIZATION
  • DATE_TIME

You can customize which entities to recognize by passing the entities parameter:

pseudonymizer = UkPseudonymizer(
    path_to_model="uk_ner_web_trf_13class",
    entities=['PERSON', 'LOCATION']  # Only recognize persons and locations
)

Custom Recognizers and Operators

You can extend the functionality by adding custom recognizers and operators:

from presidio_analyzer import EntityRecognizer
from presidio_anonymizer import OperatorConfig

# Add custom recognizer
pseudonymizer.add_custom_recognizer(your_custom_recognizer)

# Add custom operator
pseudonymizer.add_custom_operator(
    "CUSTOM_ENTITY",
    OperatorConfig("custom", {"param": "value"})
)

Development

To set up the development environment:

  1. Clone the repository:
git clone https://github.com/fox-rudie/pseudonymizer-uk.git
cd pseudonymizer-uk
  1. Create a virtual environment and install dependencies using uv:
uv venv
source .venv/bin/activate  # On Unix/macOS
# or
.venv\Scripts\activate  # On Windows

uv pip install -e ".[dev]"
  1. Install pre-commit hooks (optional):
uv pip install pre-commit
pre-commit install

Publishing

To publish a new version to PyPI:

  1. Update version in pyproject.toml and __init__.py

  2. Build the package:

uv pip install build
python -m build
  1. Upload to PyPI:
uv pip install twine
python -m twine upload dist/*

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pseudonymizer_uk-0.1.1.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pseudonymizer_uk-0.1.1-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file pseudonymizer_uk-0.1.1.tar.gz.

File metadata

  • Download URL: pseudonymizer_uk-0.1.1.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for pseudonymizer_uk-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9fe65cf1d02c70e464612c765f94448827979d53faf45282bc9176a5a43d1657
MD5 d6c547439e07c98ec7d3159a9afd31da
BLAKE2b-256 c813a49905f5054a60a986b2ff902ce26ff8f3a5f3fe7de6fb9bce0dae3bbd8c

See more details on using hashes here.

File details

Details for the file pseudonymizer_uk-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pseudonymizer_uk-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 69be687d58a7d356e4aff3cb5636057eed1629603406b982040368c59c827062
MD5 f044bd0362dc7686fe8f431d948e1850
BLAKE2b-256 4d47d8ecffc45978a4a45f0dcf97cff6ae91a3447ae145e461eea2b7ce9a80de

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page