Skip to main content

A Python library for masking personal information in text using Named Entity Recognition models.

Project description

kuronuri — PII redaction library

PyPI version Python Versions License CI uv Ruff ty

kuronuri (黒塗り) is a Python library for masking personal information (PII) in text using Named Entity Recognition (NER) models.

kuronuri (黒塗り) is the Japanese word for redaction — the act of blacking out sensitive information in documents.

Why kuronuri?

Sending text to an external LLM API is often the most capable approach, but it means personal information leaves your environment. kuronuri is designed for the use case where you want to redact PII before passing text to an LLM (or any other external service), reducing the risk of inadvertent data exposure.

Inference runs entirely on your machine: after the model is downloaded on first run, kuronuri works fully offline. kuronuri never logs, stores, or transmits the text you process.

[!WARNING] NER models are not perfect. kuronuri will miss some entities and may flag false positives. Always have a human review the output before treating it as fully anonymised. The goal is to reduce the manual redaction burden, not to replace human judgement entirely.

Features

  • 🌐 Built-in models for English (EN_MODEL) and Japanese (JA_MODEL); any language is supported via any Hugging Face token-classification model
  • ✏️ Three masking strategies — block fill, human-readable labels, or fixed string
  • 🖥️ CLI included — mask files or inline strings from the terminal
  • 🐍 Requires Python 3.10 or later

Installation

pip:

pip install kuronuri

uv:

uv add kuronuri

CPU-only environments

If you do not have a GPU, installing the CPU build of PyTorch before installing kuronuri avoids downloading the default CUDA build (~200 MB vs ~2 GB).

pip:

pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install kuronuri

uv: First, add the following to your pyproject.toml:

[[tool.uv.index]]
explicit = true
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"

[tool.uv.sources]
torch = {index = "pytorch-cpu"}

Then run:

uv add torch
uv add kuronuri

Quick Start

from kuronuri import mask

# English (default)
mask("Hello, I'm Shinsuke Mori. My email address is sincekmori@gmail.com.")
# → "Hello, I'm██████████████. My email address is█████████████████████."

# Japanese
from kuronuri import JA_MODEL
mask("こんにちは、森信輔です。私のメールアドレスは sincekmori@gmail.com です。", model=JA_MODEL)
# → 'こんにちは、███です。私のメールアドレスは ██████████@gmail.com です。'

Built-in Models

Constant Model Default language
EN_MODEL (default) openai/privacy-filter English
JA_MODEL tsmatz/xlm-roberta-ner-japanese Japanese

Custom model

Build a NERModel for any Hugging Face token-classification model:

from kuronuri import NERModel, mask

my_model = NERModel(
    model_name="my-org/my-ner-model",
    default_mask_tags={"PERSON", "ORG"},
    tag_labels={"PERSON": "Person", "ORG": "Organization"},
)
mask("...", model=my_model)

Masking Strategies

Three strategies are provided out of the box. You can also pass any callable (entity: dict) -> str.

Strategy Example output Description
mask_with_block (default) ███ Fills with matching the entity's character length
mask_with_label <Person> Replaces with a human-readable label
mask_with_fixed(char, length) *** Replaces with a fixed string
from kuronuri import mask, mask_with_label, mask_with_fixed

mask("Hello, I'm Shinsuke Mori. My email address is sincekmori@gmail.com.")
# → "Hello, I'm██████████████. My email address is█████████████████████."

mask("Hello, I'm Shinsuke Mori. My email address is sincekmori@gmail.com.", strategy=mask_with_label)
# → "Hello, I'm<Person><Person>. My email address is<Email><Email>."

mask("Hello, I'm Shinsuke Mori. My email address is sincekmori@gmail.com.", strategy=mask_with_fixed(char="*", length=5))
# → "Hello, I'm*****. My email address is*****."

mask("Hello, I'm Shinsuke Mori. My email address is sincekmori@gmail.com.", strategy=lambda e: f"[{e['entity_group']}]")
# → "Hello, I'm[private_person][private_person]. My email address is[private_email][private_email]."

NER Tags

EN_MODELopenai/privacy-filter

openai/privacy-filter is designed specifically for PII detection, so all of its tags are masked by default.

Tag mask_with_label output Description
private_person <Person> Person name
private_address <Address> Physical address
private_email <Email> Email address
private_phone <Phone> Phone number
private_url <URL> Private URL
private_date <Date> Private date
account_number <AccountNumber> Account / card number
secret <Secret> API keys, passwords, etc.

JA_MODELtsmatz/xlm-roberta-ner-japanese

Tag mask_with_label output Description Masked by default
PER <Person> Person name
ORG <Organization> General organisation
LOC <Location> Location
ORG-P <PoliticalOrganization> Political organisation
ORG-O <OtherOrganization> Other organisation
INS <Institution> Institution / facility
PRD <Product> Product
EVT <Event> Event

Use mask_tags to override the default set for any model:

from kuronuri import JA_MODEL, mask

# Mask only person names
mask("こんにちは、森信輔です。私のメールアドレスは sincekmori@gmail.com です。", model=JA_MODEL, mask_tags={"PER"})
# → 'こんにちは、███です。私のメールアドレスは sincekmori@gmail.com です。'

CLI

Usage: kuronuri [OPTIONS] INPUT

  Mask PII in a text file or an inline string.

Arguments:
  INPUT  Path to a text file, or a literal string to mask inline.

Options:
  -o, --output PATH         Output file path. Defaults to stdout.
  -s, --strategy TEXT       Masking strategy: 'block' (███, default),
                            'label' (<Person>), or 'fixed'.
  --fixed-char TEXT         Character used by the 'fixed' strategy.
  --fixed-length INTEGER    Length used by the 'fixed' strategy.  [default: 3]
  -t, --tag TEXT            Entity tag to mask. Repeatable.
      --lang TEXT           Built-in language: 'en' (default) or 'ja'.
                            Mutually exclusive with --model.
  -m, --model TEXT          Hugging Face model identifier for a custom model.
                            Mutually exclusive with --lang.
  -v, --version             Show version and exit.
  --help                    Show this message and exit.

Examples:

# Inline string (default: English)
kuronuri "Hello, I'm Shinsuke Mori. My email address is sincekmori@gmail.com."

# Japanese text
kuronuri --lang ja "こんにちは、森信輔です。私のメールアドレスは sincekmori@gmail.com です。"

# File → stdout with label strategy
kuronuri --strategy label report.txt

# File → output file
kuronuri input.txt -o output.txt

# Custom model
kuronuri --model my-org/my-ner-model input.txt

# Show version
kuronuri --version

The CLI preserves the original file encoding (including BOM) and line endings.

API Reference

mask(text, *, model, mask_tags, strategy) -> str

Parameter Type Default Description
text str Input string
model NERModel EN_MODEL NER model to use
mask_tags set[str] | None None Tags to mask. None uses model.default_mask_tags.
strategy Callable[[dict], str] mask_with_block Masking strategy

NERModel

@dataclass
class NERModel:
    model_name: str                   # Hugging Face model identifier
    default_mask_tags: frozenset[str] # tags redacted when mask_tags=None
    tag_labels: dict[str, str]        # tag → label for mask_with_label
    aggregation_strategy: str         # default: "simple"

Built-in strategies

Function Description
mask_with_block × entity character length (default)
mask_with_label <Person> style label
mask_with_fixed(char, length) Factory for a fixed replacement string

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kuronuri-0.1.0.tar.gz (13.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kuronuri-0.1.0-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file kuronuri-0.1.0.tar.gz.

File metadata

  • Download URL: kuronuri-0.1.0.tar.gz
  • Upload date:
  • Size: 13.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kuronuri-0.1.0.tar.gz
Algorithm Hash digest
SHA256 310249df68a23a6ae73ddea0c5ba9b0cb5f4ab07847173771dc450501b52a50b
MD5 d871b66280281e154a28f86563786c7a
BLAKE2b-256 892e4fbd17ad7976fb74b62fd63a226395c82a6002daf380691e6602136b1e33

See more details on using hashes here.

File details

Details for the file kuronuri-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: kuronuri-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kuronuri-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2267c57e44d2b05803573b8364adfc66de38bb2b89e3b25384b49ca4ff14966b
MD5 d3a20fa850b035a05f1e824e3e180f00
BLAKE2b-256 38fb0424f3f476fc680af1c3afa02a3e3e423a5094f2b57596785d6a5e4314df

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page