Skip to main content

Local-first Japanese PII detection and redaction. SDK + `pleno-anonymize` CLI sharing the same recognizer registry and NER pipeline as the pleno-anonymize server.

Project description

pleno-anonymize

Local-first Japanese PII detection and redaction — SDK + CLI.

The package ships:

  • A PlenoAnonymize factory that defaults to running Presidio + the spaCy pleno_anonymize_ja / pleno_anonymize_en models in-process (no network at scan time).
  • An optional remote mode (--base-url / base_url=) that talks to a hosted pleno-anonymize server — same wire protocol as https://pleno-anonymize.fly.dev.
  • A filesystem scanner (scan_paths) that walks paths and reports PII per file.
  • The pleno-anonymize CLI installed as a [project.scripts] entry — run with uvx pleno-anonymize, pipx run pleno-anonymize, or after pip install pleno-anonymize.

Install

# one-shot via uvx (no install)
uvx pleno-anonymize scan .

# or as a dependency
uv add pleno-anonymize
pip install pleno-anonymize

Requires Python 3.12+.

The first time you scan a language, the matching NER wheel (pleno_anonymize_ja / pleno_anonymize_en, hosted on Hugging Face) is fetched and pip-installed into the active environment. Pre-install with:

uvx pleno-anonymize models install ja
uvx pleno-anonymize models install en

Or disable auto-install with --no-auto-download (falls back to a blank tokenizer + pattern recognizers — regex/checksum classes still detect, but free-text NER classes won't).

CLI

pleno-anonymize scan <path...>     # walk paths, detect PII per file
pleno-anonymize analyze [text]     # detect entities in text / stdin / --file
pleno-anonymize redact  [text]     # replace detected PII with <PLACEHOLDERS>
pleno-anonymize models {install,status}
pleno-anonymize health             # ping --base-url (remote mode only)

Common flags:

Flag Description
--base-url <url> Use a hosted endpoint instead of running locally (env: PLENO_ANONYMIZE_BASE_URL)
--api-key <key> Bearer token for --base-url (env: PLENO_ANONYMIZE_API_KEY)
--language ja|en Detection language (default ja)
--entities A,B,C Restrict to specific entity types
--no-auto-download Do not pip-install missing NER wheels (local mode only)
--json Emit JSON
--fail-on-findings Exit 2 from scan when PII is found (CI gate)
--workers <n> Parallel scan workers (default 4)
--max-bytes <n> Per-file byte cap for scan (default 262144)
--ignore a,b Extra directory names to skip
--ext .md,.py Restrict scan to extensions
-f, --file <path> Read input text from file

Examples

# scan the current repo locally, fail CI on any finding
uvx pleno-anonymize scan . --fail-on-findings

# analyze a Japanese string with the local model
echo "山田太郎 090-1234-5678 yamada@example.com" \
  | uvx pleno-anonymize analyze --language ja

# same call, but offload to the hosted server
echo "山田太郎 090-1234-5678" \
  | uvx pleno-anonymize analyze \
      --base-url https://pleno-anonymize.fly.dev

# redact and pipe to file
uvx pleno-anonymize redact -f notes.md > notes.redacted.md

# JSON output for tooling
uvx pleno-anonymize scan src --json | jq '.byEntity'

SDK

from pleno_anonymize import PlenoAnonymize, scan_paths

# default: local engine, auto-downloads pleno_anonymize_ja on first call
engine = PlenoAnonymize()
findings = engine.analyze("山田太郎 090-1234-5678", language="ja")
# [Finding(entity_type='PERSON', start=0, end=4, score=0.85, text='山田太郎'), ...]

result = engine.redact("Contact john@example.com", language="en")
# RedactResult(text='Contact <EMAIL_ADDRESS>')

summary = scan_paths(
    engine,
    ["src", "docs"],
    language="ja",
    ignore=["fixtures"],
    on_file=lambda f: f.findings and print(f.path, len(f.findings)),
)
print(summary.by_entity, summary.total_findings)

# remote mode — same surface, no local model footprint
remote = PlenoAnonymize(base_url="https://pleno-anonymize.fly.dev")
remote.analyze("...")

API surface

Export Purpose
PlenoAnonymize(base_url=None, ...) Factory: returns LocalEngine (default) or RemoteEngine
LocalEngine In-process Presidio + spaCy + recognizer registry
RemoteEngine HTTP client (stdlib urllib) for a hosted server
PlenoAnonymizeError Raised by RemoteEngine on HTTP / transport failures
scan_file(engine, path, ...) Analyze a single file
scan_paths(engine, paths, ...) Walk paths with worker pool, return ScanSummary
Finding, RedactResult, FileScanResult, ScanSummary Dataclasses

Environment variables

Var Purpose
PLENO_ANONYMIZE_BASE_URL Default --base-url
PLENO_ANONYMIZE_API_KEY Default --api-key
NO_COLOR Disable ANSI colors in CLI output

Detected entities

Free-text NER (PERSON, ADDRESS, ORGANIZATION, DATE_OF_BIRTH, BANK_ACCOUNT) and structured / regex+checksum classes (PHONE_NUMBER, MY_NUMBER, MY_NUMBER_CORPORATE, CREDIT_CARD, PASSPORT, DRIVER_LICENSE, HEALTH_INSURANCE, RESIDENCE_CARD, POSTAL_CODE, EMAIL_ADDRESS, IP_ADDRESS, URL).

See the server README for the full list.

Exit codes (CLI)

Code Meaning
0 Success
1 Usage / runtime error
2 scan --fail-on-findings and findings were detected

License

AGPL-3.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pleno_anonymize-0.2.2.tar.gz (27.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pleno_anonymize-0.2.2-py3-none-any.whl (25.5 kB view details)

Uploaded Python 3

File details

Details for the file pleno_anonymize-0.2.2.tar.gz.

File metadata

  • Download URL: pleno_anonymize-0.2.2.tar.gz
  • Upload date:
  • Size: 27.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for pleno_anonymize-0.2.2.tar.gz
Algorithm Hash digest
SHA256 4ad07bcc679eef99a84285eaf1c756b1ccd59f5ad02244946f971eb421193264
MD5 74330932a5eb6bfd6bbfe091af2fda67
BLAKE2b-256 a8d6f360d78b9916c6f90e6aa449f83992f13030913ffc6191ac5f9269dd4bd5

See more details on using hashes here.

Provenance

The following attestation bundles were made for pleno_anonymize-0.2.2.tar.gz:

Publisher: release-pypi.yml on plenoai/pleno-anonymize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pleno_anonymize-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: pleno_anonymize-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 25.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for pleno_anonymize-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4119681831fa8b91854b66deea393ac411350fe389dbde5895126fac89c60d4f
MD5 66edd8f6b1484c228bdaa7fd26c5c29f
BLAKE2b-256 ea9cfdc1588c7b143e9d5f4f892dab27f9642ca0750acc325d314de8ae309e94

See more details on using hashes here.

Provenance

The following attestation bundles were made for pleno_anonymize-0.2.2-py3-none-any.whl:

Publisher: release-pypi.yml on plenoai/pleno-anonymize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page