Skip to main content

A Python library for pseudonymizing Ukrainian text using Presidio analyzer with Ukrainian NER model

Project description

Ukrainian Text Pseudonymizer

A Python library for pseudonymizing Ukrainian text using the Presidio analyzer framework with a Ukrainian NER model. This tool can identify and anonymize various types of entities in Ukrainian text, including:

  • Person names
  • Job titles
  • Locations
  • Organizations
  • Date/Time expressions
  • Email addresses
  • Credit card numbers
  • URLs
  • Phone numbers

Requirements

  • Python 3.8
  • Git LFS (for model download)

Installation

  1. Install the package using uv:
uv pip install pseudonymizer-uk
  1. Install Git LFS (required for model download):
git lfs install
  1. Download the Ukrainian NER model:
git clone https://huggingface.co/dchaplinsky/uk_ner_web_trf_13class

Usage

from pseudonymizer_uk import UkPseudonymizer

# Initialize the pseudonymizer with the path to the downloaded model
pseudonymizer = UkPseudonymizer(path_to_model="./uk_ner_web_trf_13class")

# Pseudonymize text
text = "Іван Франко народився в селі Нагуєвичі"
anonymized_text = pseudonymizer.pseudonymize(text)

Supported Entity Types

By default, the pseudonymizer recognizes the following entity types:

  • PERSON
  • JOB
  • LOCATION
  • ORGANIZATION
  • DATE_TIME

You can customize which entities to recognize by passing the entities parameter:

pseudonymizer = UkPseudonymizer(
    path_to_model="uk_ner_web_trf_13class",
    entities=['PERSON', 'LOCATION']  # Only recognize persons and locations
)

Custom Recognizers and Operators

You can extend the functionality by adding custom recognizers and operators:

from presidio_analyzer import EntityRecognizer
from presidio_anonymizer import OperatorConfig

# Add custom recognizer
pseudonymizer.add_custom_recognizer(your_custom_recognizer)

# Add custom operator
pseudonymizer.add_custom_operator(
    "CUSTOM_ENTITY",
    OperatorConfig("custom", {"param": "value"})
)

Development

To set up the development environment:

  1. Clone the repository:
git clone https://github.com/fox-rudie/pseudonymizer-uk.git
cd pseudonymizer-uk
  1. Create a virtual environment and install dependencies using uv:
uv venv
source .venv/bin/activate  # On Unix/macOS
# or
.venv\Scripts\activate  # On Windows

uv pip install -e ".[dev]"
  1. Install pre-commit hooks (optional):
uv pip install pre-commit
pre-commit install

Publishing

To publish a new version to PyPI:

  1. Update version in pyproject.toml and __init__.py

  2. Build the package:

uv pip install build
python -m build
  1. Upload to PyPI:
uv pip install twine
python -m twine upload dist/*

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pseudonymizer_uk-0.1.0.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pseudonymizer_uk-0.1.0-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file pseudonymizer_uk-0.1.0.tar.gz.

File metadata

  • Download URL: pseudonymizer_uk-0.1.0.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for pseudonymizer_uk-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a6cd41abf941db98f3b0bc0481dd50c05772c4d7602ceef4f3d82d5d43fe99ed
MD5 21c40231dd74cad82f40cc69c5808474
BLAKE2b-256 c50ec02805089ca4d3158435292e7dfd02ab3ee76580246488aa4c7111262369

See more details on using hashes here.

File details

Details for the file pseudonymizer_uk-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pseudonymizer_uk-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 63e6cf61004a68772bc9803c395014ae920af8e243a40f88e521bdd33254c802
MD5 adce5cb62299b4d61a26118422ebe986
BLAKE2b-256 1808aa08bb1ae3e5095e6f9fada3b2be747502a2af601bbb4afbf63172504fb5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page