Skip to main content

A Python library for masking personal information in text using Named Entity Recognition models.

Project description

kuronuri — PII redaction library

PyPI version Python Versions License CI codecov uv Ruff ty

kuronuri (黒塗り) is a Python library for masking personal information (PII) in text using Named Entity Recognition (NER) models.

kuronuri (黒塗り) is the Japanese word for redaction — the act of blacking out sensitive information in documents.

Why kuronuri?

Sending text to an external LLM API is often the most capable approach, but it means personal information leaves your environment. kuronuri is designed for the use case where you want to redact PII before passing text to an LLM (or any other external service), reducing the risk of inadvertent data exposure.

Inference runs entirely on your machine: after the model is downloaded on first run, kuronuri works fully offline. kuronuri never logs, stores, or transmits the text you process.

[!WARNING] NER models are not perfect. kuronuri will miss some entities and may flag false positives. Always have a human review the output before treating it as fully anonymised. The goal is to reduce the manual redaction burden, not to replace human judgement entirely.

Features

  • 🌐 Built-in models for English (EN_MODEL) and Japanese (JA_MODEL); any language is supported via any Hugging Face token-classification model
  • ✏️ Three masking strategies — block fill, human-readable labels, or fixed string
  • 🖥️ CLI included — mask files or inline strings from the terminal
  • 🐍 Requires Python 3.10 or later

Installation

pip:

pip install kuronuri

uv:

uv add kuronuri

CPU-only environments

If you do not have a GPU, installing the CPU build of PyTorch before installing kuronuri avoids downloading the default CUDA build (~200 MB vs ~2 GB).

pip:

pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install kuronuri

uv: First, add the following to your pyproject.toml:

[[tool.uv.index]]
explicit = true
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"

[tool.uv.sources]
torch = {index = "pytorch-cpu"}

Then run:

uv add torch
uv add kuronuri

Quick Start

from kuronuri import mask

# English (default)
mask("Hello, I'm Shinsuke Mori. My email address is sincekmori@gmail.com.")
# → "Hello, I'm██████████████. My email address is█████████████████████."

# Japanese
from kuronuri import JA_MODEL
mask("こんにちは、森信輔です。私のメールアドレスは sincekmori@gmail.com です。", model=JA_MODEL)
# → 'こんにちは、███です。私のメールアドレスは ██████████@gmail.com です。'

Built-in Models

Constant Model Default language
EN_MODEL (default) openai/privacy-filter English
JA_MODEL tsmatz/xlm-roberta-ner-japanese Japanese

Custom model

Build a NERModel for any Hugging Face token-classification model:

from kuronuri import NERModel, mask

my_model = NERModel(
    model_name="my-org/my-ner-model",
    default_mask_tags={"PERSON", "ORG"},
    tag_labels={"PERSON": "Person", "ORG": "Organization"},
)
mask("...", model=my_model)

Masking Strategies

Three strategies are provided out of the box. You can also pass any callable (entity: dict) -> str.

Strategy Example output Description
mask_with_block (default) ███ Fills with matching the entity's character length
mask_with_label <Person> Replaces with a human-readable label
mask_with_fixed(char, length) *** Replaces with a fixed string
from kuronuri import mask, mask_with_label, mask_with_fixed

mask("Hello, I'm Shinsuke Mori. My email address is sincekmori@gmail.com.")
# → "Hello, I'm██████████████. My email address is█████████████████████."

mask("Hello, I'm Shinsuke Mori. My email address is sincekmori@gmail.com.", strategy=mask_with_label)
# → "Hello, I'm<Person><Person>. My email address is<Email><Email>."

mask("Hello, I'm Shinsuke Mori. My email address is sincekmori@gmail.com.", strategy=mask_with_fixed(char="*", length=5))
# → "Hello, I'm*****. My email address is*****."

mask("Hello, I'm Shinsuke Mori. My email address is sincekmori@gmail.com.", strategy=lambda e, _labels: f"[{e['entity_group']}]")
# → "Hello, I'm[private_person][private_person]. My email address is[private_email][private_email]."

NER Tags

EN_MODELopenai/privacy-filter

openai/privacy-filter is designed specifically for PII detection, so all of its tags are masked by default.

Tag mask_with_label output Description
private_person <Person> Person name
private_address <Address> Physical address
private_email <Email> Email address
private_phone <Phone> Phone number
private_url <URL> Private URL
private_date <Date> Private date
account_number <AccountNumber> Account / card number
secret <Secret> API keys, passwords, etc.

JA_MODELtsmatz/xlm-roberta-ner-japanese

Tag mask_with_label output Description Masked by default
PER <Person> Person name
ORG <Organization> General organisation
LOC <Location> Location
ORG-P <PoliticalOrganization> Political organisation
ORG-O <OtherOrganization> Other organisation
INS <Institution> Institution / facility
PRD <Product> Product
EVT <Event> Event

Use mask_tags to override the default set for any model:

from kuronuri import JA_MODEL, mask

# Mask only person names
mask("こんにちは、森信輔です。私のメールアドレスは sincekmori@gmail.com です。", model=JA_MODEL, mask_tags={"PER"})
# → 'こんにちは、███です。私のメールアドレスは sincekmori@gmail.com です。'

CLI

Usage: kuronuri [OPTIONS] INPUT

  Mask PII in a text file or an inline string.

Arguments:
  INPUT  Path to a text file, or a literal string to mask inline.

Options:
  -o, --output PATH         Output file path. Defaults to stdout.
  -s, --strategy TEXT       Masking strategy: 'block' (███, default),
                            'label' (<Person>), or 'fixed'.
  --fixed-char TEXT         Character used by the 'fixed' strategy.
  --fixed-length INTEGER    Length used by the 'fixed' strategy.  [default: 3]
  -t, --tag TEXT            Entity tag to mask. Repeatable.
      --lang TEXT           Built-in language: 'en' (default) or 'ja'.
                            Mutually exclusive with --model.
  -m, --model TEXT          Hugging Face model identifier for a custom model.
                            Mutually exclusive with --lang.
  -v, --version             Show version and exit.
  --help                    Show this message and exit.

Examples:

# Inline string (default: English)
kuronuri "Hello, I'm Shinsuke Mori. My email address is sincekmori@gmail.com."

# Japanese text
kuronuri --lang ja "こんにちは、森信輔です。私のメールアドレスは sincekmori@gmail.com です。"

# File → stdout with label strategy
kuronuri --strategy label report.txt

# File → output file
kuronuri input.txt -o output.txt

# Custom model
kuronuri --model my-org/my-ner-model input.txt

# Show version
kuronuri --version

The CLI preserves the original file encoding (including BOM) and line endings.

API Reference

mask(text, *, model, mask_tags, strategy) -> str

Parameter Type Default Description
text str Input string
model NERModel EN_MODEL NER model to use
mask_tags set[str] | None None Tags to mask. None uses model.default_mask_tags.
strategy Callable[[dict, dict[str, str]], str] mask_with_block Masking strategy

NERModel

@dataclass
class NERModel:
    model_name: str                   # Hugging Face model identifier
    default_mask_tags: frozenset[str] # tags redacted when mask_tags=None
    tag_labels: dict[str, str]        # tag → label for mask_with_label
    aggregation_strategy: str         # default: "simple"

Built-in strategies

Function Description
mask_with_block × entity character length (default)
mask_with_label <Person> style label
mask_with_fixed(char, length) Factory for a fixed replacement string

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kuronuri-0.2.0.tar.gz (14.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kuronuri-0.2.0-py3-none-any.whl (15.2 kB view details)

Uploaded Python 3

File details

Details for the file kuronuri-0.2.0.tar.gz.

File metadata

  • Download URL: kuronuri-0.2.0.tar.gz
  • Upload date:
  • Size: 14.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kuronuri-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1838c2fa09a093f81aef811d90c7a70a625830fe2e0b64c3d1b3b519dffe71b4
MD5 7511b7ad9ee80e849efb9dae85d5b1bd
BLAKE2b-256 1ca8f0229b9fb96c8ba27691a800ab7f796638da17af6f5e04e7123d528bc895

See more details on using hashes here.

Provenance

The following attestation bundles were made for kuronuri-0.2.0.tar.gz:

Publisher: publish.yml on sincekmori/kuronuri

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kuronuri-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: kuronuri-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 15.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kuronuri-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9413060f3ded3953bc451555ebdc3dd56d470589b2a2d5a2628dc96f169ac487
MD5 3c0192acb26bd445673ba3c3402492a8
BLAKE2b-256 81c2599f04de79db28df0c8e543260c29daa14f4cad442900d9baebf37ccaf9d

See more details on using hashes here.

Provenance

The following attestation bundles were made for kuronuri-0.2.0-py3-none-any.whl:

Publisher: publish.yml on sincekmori/kuronuri

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page