Local-first Japanese PII detection and redaction. SDK + `pleno-anonymize` CLI sharing the same recognizer registry and NER pipeline as the pleno-anonymize server.
Project description
pleno-anonymize
Local-first Japanese PII detection and redaction — SDK + CLI.
The package ships:
- A
PlenoAnonymizefactory that defaults to running Presidio + the spaCypleno_anonymize_ja/pleno_anonymize_enmodels in-process (no network at scan time). - An optional remote mode (
--base-url/base_url=) that talks to a hostedpleno-anonymizeserver — same wire protocol ashttps://pleno-anonymize.fly.dev. - A filesystem scanner (
scan_paths) that walks paths and reports PII per file. - The
pleno-anonymizeCLI installed as a[project.scripts]entry — run withuvx pleno-anonymize,pipx run pleno-anonymize, or afterpip install pleno-anonymize.
Install
# one-shot via uvx (no install)
uvx pleno-anonymize scan .
# or as a dependency
uv add pleno-anonymize
pip install pleno-anonymize
Requires Python 3.12+.
The first time you scan a language, the matching NER wheel
(pleno_anonymize_ja / pleno_anonymize_en, hosted on Hugging Face) is fetched and pip-installed
into the active environment. Pre-install with:
uvx pleno-anonymize models install ja
uvx pleno-anonymize models install en
Or disable auto-install with --no-auto-download (falls back to a blank
tokenizer + pattern recognizers — regex/checksum classes still detect, but
free-text NER classes won't).
CLI
pleno-anonymize scan <path...> # walk paths, detect PII per file
pleno-anonymize analyze [text] # detect entities in text / stdin / --file
pleno-anonymize redact [text] # replace detected PII with <PLACEHOLDERS>
pleno-anonymize models {install,status}
pleno-anonymize health # ping --base-url (remote mode only)
Common flags:
| Flag | Description |
|---|---|
--base-url <url> |
Use a hosted endpoint instead of running locally (env: PLENO_ANONYMIZE_BASE_URL) |
--api-key <key> |
Bearer token for --base-url (env: PLENO_ANONYMIZE_API_KEY) |
--language ja|en |
Detection language (default ja) |
--entities A,B,C |
Restrict to specific entity types |
--no-auto-download |
Do not pip-install missing NER wheels (local mode only) |
--json |
Emit JSON |
--fail-on-findings |
Exit 2 from scan when PII is found (CI gate) |
--workers <n> |
Parallel scan workers (default 4) |
--max-bytes <n> |
Per-file byte cap for scan (default 262144) |
--ignore a,b |
Extra directory names to skip |
--ext .md,.py |
Restrict scan to extensions |
-f, --file <path> |
Read input text from file |
Examples
# scan the current repo locally, fail CI on any finding
uvx pleno-anonymize scan . --fail-on-findings
# analyze a Japanese string with the local model
echo "山田太郎 090-1234-5678 yamada@example.com" \
| uvx pleno-anonymize analyze --language ja
# same call, but offload to the hosted server
echo "山田太郎 090-1234-5678" \
| uvx pleno-anonymize analyze \
--base-url https://pleno-anonymize.fly.dev
# redact and pipe to file
uvx pleno-anonymize redact -f notes.md > notes.redacted.md
# JSON output for tooling
uvx pleno-anonymize scan src --json | jq '.byEntity'
SDK
from pleno_anonymize import PlenoAnonymize, scan_paths
# default: local engine, auto-downloads pleno_anonymize_ja on first call
engine = PlenoAnonymize()
findings = engine.analyze("山田太郎 090-1234-5678", language="ja")
# [Finding(entity_type='PERSON', start=0, end=4, score=0.85, text='山田太郎'), ...]
result = engine.redact("Contact john@example.com", language="en")
# RedactResult(text='Contact <EMAIL_ADDRESS>')
summary = scan_paths(
engine,
["src", "docs"],
language="ja",
ignore=["fixtures"],
on_file=lambda f: f.findings and print(f.path, len(f.findings)),
)
print(summary.by_entity, summary.total_findings)
# remote mode — same surface, no local model footprint
remote = PlenoAnonymize(base_url="https://pleno-anonymize.fly.dev")
remote.analyze("...")
API surface
| Export | Purpose |
|---|---|
PlenoAnonymize(base_url=None, ...) |
Factory: returns LocalEngine (default) or RemoteEngine |
LocalEngine |
In-process Presidio + spaCy + recognizer registry |
RemoteEngine |
HTTP client (stdlib urllib) for a hosted server |
PlenoAnonymizeError |
Raised by RemoteEngine on HTTP / transport failures |
scan_file(engine, path, ...) |
Analyze a single file |
scan_paths(engine, paths, ...) |
Walk paths with worker pool, return ScanSummary |
Finding, RedactResult, FileScanResult, ScanSummary |
Dataclasses |
Environment variables
| Var | Purpose |
|---|---|
PLENO_ANONYMIZE_BASE_URL |
Default --base-url |
PLENO_ANONYMIZE_API_KEY |
Default --api-key |
NO_COLOR |
Disable ANSI colors in CLI output |
Detected entities
Free-text NER (PERSON, ADDRESS, ORGANIZATION, DATE_OF_BIRTH, BANK_ACCOUNT) and structured / regex+checksum classes (PHONE_NUMBER, MY_NUMBER, MY_NUMBER_CORPORATE, CREDIT_CARD, PASSPORT, DRIVER_LICENSE, HEALTH_INSURANCE, RESIDENCE_CARD, POSTAL_CODE, EMAIL_ADDRESS, IP_ADDRESS, URL).
See the server README for the full list.
Exit codes (CLI)
| Code | Meaning |
|---|---|
0 |
Success |
1 |
Usage / runtime error |
2 |
scan --fail-on-findings and findings were detected |
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pleno_anonymize-0.2.2.tar.gz.
File metadata
- Download URL: pleno_anonymize-0.2.2.tar.gz
- Upload date:
- Size: 27.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ad07bcc679eef99a84285eaf1c756b1ccd59f5ad02244946f971eb421193264
|
|
| MD5 |
74330932a5eb6bfd6bbfe091af2fda67
|
|
| BLAKE2b-256 |
a8d6f360d78b9916c6f90e6aa449f83992f13030913ffc6191ac5f9269dd4bd5
|
Provenance
The following attestation bundles were made for pleno_anonymize-0.2.2.tar.gz:
Publisher:
release-pypi.yml on plenoai/pleno-anonymize
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pleno_anonymize-0.2.2.tar.gz -
Subject digest:
4ad07bcc679eef99a84285eaf1c756b1ccd59f5ad02244946f971eb421193264 - Sigstore transparency entry: 1558641680
- Sigstore integration time:
-
Permalink:
plenoai/pleno-anonymize@a075208bd94b8b3676b5b48674852833b9c2d16b -
Branch / Tag:
refs/tags/sdk/v0.2.2 - Owner: https://github.com/plenoai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-pypi.yml@a075208bd94b8b3676b5b48674852833b9c2d16b -
Trigger Event:
push
-
Statement type:
File details
Details for the file pleno_anonymize-0.2.2-py3-none-any.whl.
File metadata
- Download URL: pleno_anonymize-0.2.2-py3-none-any.whl
- Upload date:
- Size: 25.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4119681831fa8b91854b66deea393ac411350fe389dbde5895126fac89c60d4f
|
|
| MD5 |
66edd8f6b1484c228bdaa7fd26c5c29f
|
|
| BLAKE2b-256 |
ea9cfdc1588c7b143e9d5f4f892dab27f9642ca0750acc325d314de8ae309e94
|
Provenance
The following attestation bundles were made for pleno_anonymize-0.2.2-py3-none-any.whl:
Publisher:
release-pypi.yml on plenoai/pleno-anonymize
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pleno_anonymize-0.2.2-py3-none-any.whl -
Subject digest:
4119681831fa8b91854b66deea393ac411350fe389dbde5895126fac89c60d4f - Sigstore transparency entry: 1558641801
- Sigstore integration time:
-
Permalink:
plenoai/pleno-anonymize@a075208bd94b8b3676b5b48674852833b9c2d16b -
Branch / Tag:
refs/tags/sdk/v0.2.2 - Owner: https://github.com/plenoai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-pypi.yml@a075208bd94b8b3676b5b48674852833b9c2d16b -
Trigger Event:
push
-
Statement type: