A Python library for masking personal information in text using Named Entity Recognition models.
Project description
kuronuri (黒塗り) is a Python library for masking personal information (PII) in text using Named Entity Recognition (NER) models.
kuronuri (黒塗り) is the Japanese word for redaction — the act of blacking out sensitive information in documents.
Why kuronuri?
Sending text to an external LLM API is often the most capable approach, but it means personal information leaves your environment. kuronuri is designed for the use case where you want to redact PII before passing text to an LLM (or any other external service), reducing the risk of inadvertent data exposure.
Inference runs entirely on your machine: after the model is downloaded on first run, kuronuri works fully offline. kuronuri never logs, stores, or transmits the text you process.
[!WARNING] NER models are not perfect. kuronuri will miss some entities and may flag false positives. Always have a human review the output before treating it as fully anonymised. The goal is to reduce the manual redaction burden, not to replace human judgement entirely.
Features
- 🌐 Built-in models for English (
EN_MODEL) and Japanese (JA_MODEL); any language is supported via any Hugging Facetoken-classificationmodel - ✏️ Three masking strategies — block fill, human-readable labels, or fixed string
- 🖥️ CLI included — mask files or inline strings from the terminal
- 🐍 Requires Python 3.10 or later
Installation
pip:
pip install kuronuri
uv:
uv add kuronuri
CPU-only environments
If you do not have a GPU, installing the CPU build of PyTorch before installing kuronuri avoids downloading the default CUDA build (~200 MB vs ~2 GB).
pip:
pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install kuronuri
uv: First, add the following to your pyproject.toml:
[[tool.uv.index]]
explicit = true
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"
[tool.uv.sources]
torch = {index = "pytorch-cpu"}
Then run:
uv add torch
uv add kuronuri
Quick Start
from kuronuri import mask
# English (default)
mask("Hello, I'm Shinsuke Mori. My email address is sincekmori@gmail.com.")
# → "Hello, I'm██████████████. My email address is█████████████████████."
# Japanese
from kuronuri import JA_MODEL
mask("こんにちは、森信輔です。私のメールアドレスは sincekmori@gmail.com です。", model=JA_MODEL)
# → 'こんにちは、███です。私のメールアドレスは ██████████@gmail.com です。'
Built-in Models
| Constant | Model | Default language |
|---|---|---|
EN_MODEL (default) |
openai/privacy-filter |
English |
JA_MODEL |
tsmatz/xlm-roberta-ner-japanese |
Japanese |
Custom model
Build a NERModel for any Hugging Face token-classification model:
from kuronuri import NERModel, mask
my_model = NERModel(
model_name="my-org/my-ner-model",
default_mask_tags={"PERSON", "ORG"},
tag_labels={"PERSON": "Person", "ORG": "Organization"},
)
mask("...", model=my_model)
Masking Strategies
Three strategies are provided out of the box. You can also pass any callable (entity: dict) -> str.
| Strategy | Example output | Description |
|---|---|---|
mask_with_block (default) |
███ |
Fills with █ matching the entity's character length |
mask_with_label |
<Person> |
Replaces with a human-readable label |
mask_with_fixed(char, length) |
*** |
Replaces with a fixed string |
from kuronuri import mask, mask_with_label, mask_with_fixed
mask("Hello, I'm Shinsuke Mori. My email address is sincekmori@gmail.com.")
# → "Hello, I'm██████████████. My email address is█████████████████████."
mask("Hello, I'm Shinsuke Mori. My email address is sincekmori@gmail.com.", strategy=mask_with_label)
# → "Hello, I'm<Person><Person>. My email address is<Email><Email>."
mask("Hello, I'm Shinsuke Mori. My email address is sincekmori@gmail.com.", strategy=mask_with_fixed(char="*", length=5))
# → "Hello, I'm*****. My email address is*****."
mask("Hello, I'm Shinsuke Mori. My email address is sincekmori@gmail.com.", strategy=lambda e, _labels: f"[{e['entity_group']}]")
# → "Hello, I'm[private_person][private_person]. My email address is[private_email][private_email]."
NER Tags
EN_MODEL — openai/privacy-filter
openai/privacy-filter is designed specifically for PII detection, so all of its tags are masked by default.
| Tag | mask_with_label output |
Description |
|---|---|---|
private_person |
<Person> |
Person name |
private_address |
<Address> |
Physical address |
private_email |
<Email> |
Email address |
private_phone |
<Phone> |
Phone number |
private_url |
<URL> |
Private URL |
private_date |
<Date> |
Private date |
account_number |
<AccountNumber> |
Account / card number |
secret |
<Secret> |
API keys, passwords, etc. |
JA_MODEL — tsmatz/xlm-roberta-ner-japanese
| Tag | mask_with_label output |
Description | Masked by default |
|---|---|---|---|
PER |
<Person> |
Person name | ✅ |
ORG |
<Organization> |
General organisation | ✅ |
LOC |
<Location> |
Location | ✅ |
ORG-P |
<PoliticalOrganization> |
Political organisation | — |
ORG-O |
<OtherOrganization> |
Other organisation | — |
INS |
<Institution> |
Institution / facility | — |
PRD |
<Product> |
Product | — |
EVT |
<Event> |
Event | — |
Use mask_tags to override the default set for any model:
from kuronuri import JA_MODEL, mask
# Mask only person names
mask("こんにちは、森信輔です。私のメールアドレスは sincekmori@gmail.com です。", model=JA_MODEL, mask_tags={"PER"})
# → 'こんにちは、███です。私のメールアドレスは sincekmori@gmail.com です。'
CLI
Usage: kuronuri [OPTIONS] INPUT
Mask PII in a text file or an inline string.
Arguments:
INPUT Path to a text file, or a literal string to mask inline.
Options:
-o, --output PATH Output file path. Defaults to stdout.
-s, --strategy TEXT Masking strategy: 'block' (███, default),
'label' (<Person>), or 'fixed'.
--fixed-char TEXT Character used by the 'fixed' strategy.
--fixed-length INTEGER Length used by the 'fixed' strategy. [default: 3]
-t, --tag TEXT Entity tag to mask. Repeatable.
--lang TEXT Built-in language: 'en' (default) or 'ja'.
Mutually exclusive with --model.
-m, --model TEXT Hugging Face model identifier for a custom model.
Mutually exclusive with --lang.
-v, --version Show version and exit.
--help Show this message and exit.
Examples:
# Inline string (default: English)
kuronuri "Hello, I'm Shinsuke Mori. My email address is sincekmori@gmail.com."
# Japanese text
kuronuri --lang ja "こんにちは、森信輔です。私のメールアドレスは sincekmori@gmail.com です。"
# File → stdout with label strategy
kuronuri --strategy label report.txt
# File → output file
kuronuri input.txt -o output.txt
# Custom model
kuronuri --model my-org/my-ner-model input.txt
# Show version
kuronuri --version
The CLI preserves the original file encoding (including BOM) and line endings.
API Reference
mask(text, *, model, mask_tags, strategy) -> str
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
str |
— | Input string |
model |
NERModel |
EN_MODEL |
NER model to use |
mask_tags |
set[str] | None |
None |
Tags to mask. None uses model.default_mask_tags. |
strategy |
Callable[[dict, dict[str, str]], str] |
mask_with_block |
Masking strategy |
NERModel
@dataclass
class NERModel:
model_name: str # Hugging Face model identifier
default_mask_tags: frozenset[str] # tags redacted when mask_tags=None
tag_labels: dict[str, str] # tag → label for mask_with_label
aggregation_strategy: str # default: "simple"
Built-in strategies
| Function | Description |
|---|---|
mask_with_block |
█ × entity character length (default) |
mask_with_label |
<Person> style label |
mask_with_fixed(char, length) |
Factory for a fixed replacement string |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kuronuri-0.2.0.tar.gz.
File metadata
- Download URL: kuronuri-0.2.0.tar.gz
- Upload date:
- Size: 14.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1838c2fa09a093f81aef811d90c7a70a625830fe2e0b64c3d1b3b519dffe71b4
|
|
| MD5 |
7511b7ad9ee80e849efb9dae85d5b1bd
|
|
| BLAKE2b-256 |
1ca8f0229b9fb96c8ba27691a800ab7f796638da17af6f5e04e7123d528bc895
|
Provenance
The following attestation bundles were made for kuronuri-0.2.0.tar.gz:
Publisher:
publish.yml on sincekmori/kuronuri
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kuronuri-0.2.0.tar.gz -
Subject digest:
1838c2fa09a093f81aef811d90c7a70a625830fe2e0b64c3d1b3b519dffe71b4 - Sigstore transparency entry: 1436809363
- Sigstore integration time:
-
Permalink:
sincekmori/kuronuri@222fdf3a701a0a7e5873520c796773afd796e476 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/sincekmori
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@222fdf3a701a0a7e5873520c796773afd796e476 -
Trigger Event:
push
-
Statement type:
File details
Details for the file kuronuri-0.2.0-py3-none-any.whl.
File metadata
- Download URL: kuronuri-0.2.0-py3-none-any.whl
- Upload date:
- Size: 15.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9413060f3ded3953bc451555ebdc3dd56d470589b2a2d5a2628dc96f169ac487
|
|
| MD5 |
3c0192acb26bd445673ba3c3402492a8
|
|
| BLAKE2b-256 |
81c2599f04de79db28df0c8e543260c29daa14f4cad442900d9baebf37ccaf9d
|
Provenance
The following attestation bundles were made for kuronuri-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on sincekmori/kuronuri
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kuronuri-0.2.0-py3-none-any.whl -
Subject digest:
9413060f3ded3953bc451555ebdc3dd56d470589b2a2d5a2628dc96f169ac487 - Sigstore transparency entry: 1436809379
- Sigstore integration time:
-
Permalink:
sincekmori/kuronuri@222fdf3a701a0a7e5873520c796773afd796e476 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/sincekmori
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@222fdf3a701a0a7e5873520c796773afd796e476 -
Trigger Event:
push
-
Statement type: