Skip to main content

Local-first Japanese PII detection and redaction. SDK + `pleno-anonymize` CLI sharing the same recognizer registry and NER pipeline as the pleno-anonymize server.

Project description

pleno-anonymize

Local-first Japanese PII detection and redaction — SDK + CLI.

The package ships:

  • A PlenoAnonymize factory that defaults to running Presidio + the spaCy pleno_anonymize_ja / pleno_anonymize_en models in-process (no network at scan time).
  • An optional remote mode (--base-url / base_url=) that talks to a hosted pleno-anonymize server — same wire protocol as https://pleno-anonymize.fly.dev.
  • A filesystem scanner (scan_paths) that walks paths and reports PII per file.
  • The pleno-anonymize CLI installed as a [project.scripts] entry — run with uvx pleno-anonymize, pipx run pleno-anonymize, or after pip install pleno-anonymize.

Install

# one-shot via uvx (no install)
uvx pleno-anonymize scan .

# or as a dependency
uv add pleno-anonymize
pip install pleno-anonymize

Requires Python 3.12+.

The first time you scan a language, the matching NER wheel (pleno_anonymize_ja / pleno_anonymize_en, hosted on Hugging Face) is fetched and pip-installed into the active environment. Pre-install with:

uvx pleno-anonymize models install ja
uvx pleno-anonymize models install en

Or disable auto-install with --no-auto-download (falls back to a blank tokenizer + pattern recognizers — regex/checksum classes still detect, but free-text NER classes won't).

CLI

pleno-anonymize scan <path...>     # walk paths, detect PII per file
pleno-anonymize analyze [text]     # detect entities in text / stdin / --file
pleno-anonymize redact  [text]     # replace detected PII with <PLACEHOLDERS>
pleno-anonymize models {install,status}
pleno-anonymize health             # ping --base-url (remote mode only)

Common flags:

Flag Description
--base-url <url> Use a hosted endpoint instead of running locally (env: PLENO_ANONYMIZE_BASE_URL)
--api-key <key> Bearer token for --base-url (env: PLENO_ANONYMIZE_API_KEY)
--language ja|en Detection language (default ja)
--entities A,B,C Restrict to specific entity types
--no-auto-download Do not pip-install missing NER wheels (local mode only)
--json Emit JSON
--fail-on-findings Exit 2 from scan when PII is found (CI gate)
--workers <n> Parallel scan workers (default 4)
--max-bytes <n> Per-file byte cap for scan (default 262144)
--ignore a,b Extra directory names to skip
--ext .md,.py Restrict scan to extensions
-f, --file <path> Read input text from file

Examples

# scan the current repo locally, fail CI on any finding
uvx pleno-anonymize scan . --fail-on-findings

# analyze a Japanese string with the local model
echo "山田太郎 090-1234-5678 yamada@example.com" \
  | uvx pleno-anonymize analyze --language ja

# same call, but offload to the hosted server
echo "山田太郎 090-1234-5678" \
  | uvx pleno-anonymize analyze \
      --base-url https://pleno-anonymize.fly.dev

# redact and pipe to file
uvx pleno-anonymize redact -f notes.md > notes.redacted.md

# JSON output for tooling
uvx pleno-anonymize scan src --json | jq '.byEntity'

SDK

from pleno_anonymize import PlenoAnonymize, scan_paths

# default: local engine, auto-downloads pleno_anonymize_ja on first call
engine = PlenoAnonymize()
findings = engine.analyze("山田太郎 090-1234-5678", language="ja")
# [Finding(entity_type='PERSON', start=0, end=4, score=0.85, text='山田太郎'), ...]

result = engine.redact("Contact john@example.com", language="en")
# RedactResult(text='Contact <EMAIL_ADDRESS>')

summary = scan_paths(
    engine,
    ["src", "docs"],
    language="ja",
    ignore=["fixtures"],
    on_file=lambda f: f.findings and print(f.path, len(f.findings)),
)
print(summary.by_entity, summary.total_findings)

# remote mode — same surface, no local model footprint
remote = PlenoAnonymize(base_url="https://pleno-anonymize.fly.dev")
remote.analyze("...")

API surface

Export Purpose
PlenoAnonymize(base_url=None, ...) Factory: returns LocalEngine (default) or RemoteEngine
LocalEngine In-process Presidio + spaCy + recognizer registry
RemoteEngine HTTP client (stdlib urllib) for a hosted server
PlenoAnonymizeError Raised by RemoteEngine on HTTP / transport failures
scan_file(engine, path, ...) Analyze a single file
scan_paths(engine, paths, ...) Walk paths with worker pool, return ScanSummary
Finding, RedactResult, FileScanResult, ScanSummary Dataclasses

Environment variables

Var Purpose
PLENO_ANONYMIZE_BASE_URL Default --base-url
PLENO_ANONYMIZE_API_KEY Default --api-key
NO_COLOR Disable ANSI colors in CLI output

Detected entities

Free-text NER (PERSON, ADDRESS, ORGANIZATION, DATE_OF_BIRTH, BANK_ACCOUNT) and structured / regex+checksum classes (PHONE_NUMBER, MY_NUMBER, MY_NUMBER_CORPORATE, CREDIT_CARD, PASSPORT, DRIVER_LICENSE, HEALTH_INSURANCE, RESIDENCE_CARD, POSTAL_CODE, EMAIL_ADDRESS, IP_ADDRESS, URL).

See the server README for the full list.

Exit codes (CLI)

Code Meaning
0 Success
1 Usage / runtime error
2 scan --fail-on-findings and findings were detected

License

AGPL-3.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pleno_anonymize-0.2.3.tar.gz (28.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pleno_anonymize-0.2.3-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file pleno_anonymize-0.2.3.tar.gz.

File metadata

  • Download URL: pleno_anonymize-0.2.3.tar.gz
  • Upload date:
  • Size: 28.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for pleno_anonymize-0.2.3.tar.gz
Algorithm Hash digest
SHA256 96a2e95b04ff2093af2bec54dd40300efd5b6764e0afc7548e0b15fb03680d74
MD5 0e4f43b6b48d306883caa8657e131f84
BLAKE2b-256 096ab5dcaa763feaaf81be015854170f158b0131287d880ad4a3a5a039e00c71

See more details on using hashes here.

Provenance

The following attestation bundles were made for pleno_anonymize-0.2.3.tar.gz:

Publisher: release-pypi.yml on plenoai/pleno-anonymize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pleno_anonymize-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: pleno_anonymize-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for pleno_anonymize-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6787f7454f2066078d1640be177cd3e36542f3bb2fa4143a1da18f6b302d4fa5
MD5 3032ba4909312d7cfd23ddbc6c675aad
BLAKE2b-256 19f5676040cf505ae49b5d2cb1fd1bed9588a8ffca5d2f5f3b3db56f8e63374a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pleno_anonymize-0.2.3-py3-none-any.whl:

Publisher: release-pypi.yml on plenoai/pleno-anonymize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page