Skip to main content

Local-first Japanese PII anonymization engine

Reason this release was yanked:

Critical bugs identified

Project description

Besshouka (別称化)

A local-first Japanese PII anonymization engine. Besshouka detects personally identifiable information (PII), payment card data (PCI), and protected health information (PHI) in Japanese text and transforms it using configurable rules — all without sending data to any external service.

Note: Besshouka is in early development (alpha). It is not yet recommended for production use. Contributions to improve accuracy, coverage, and robustness are welcome — see CONTRIBUTING.md.

Why Besshouka?

  • Japanese-native — built specifically for Japanese data patterns: マイナンバー, Japanese phone formats, postal codes, full-width character handling, and GiNZA-powered NER for names, organizations, and locations.
  • Local-first — everything runs on your machine. No cloud APIs, no data leaves the device.
  • Pluggable — add custom regex recognizers via YAML, write your own operators in Python, or plug in any importable function as a custom operator. No forking required.
  • Auditable — every anonymization operation is logged in an audit trail with the original text, the operator used, and the new indices.

Quick Start

pip install besshouka

Anonymize text

besshouka anonymize "田中太郎の電話番号は090-1234-5678です"
# Output: <氏名>の電話番号は090-1234-****です

Analyze (detect only)

besshouka analyze --explain "田中太郎の電話番号は090-1234-5678です"

Use custom rules

besshouka anonymize \
  --recognizers my_patterns.yaml \
  --rules my_operators.yaml \
  --input document.txt \
  --output anonymized.txt

Programmatic Usage

from besshouka.config.loader import load_recognizer_config, load_operator_config
from besshouka.orchestrator.pipeline import run

rec_config = load_recognizer_config("path/to/recognizers.yaml")
op_config = load_operator_config("path/to/operators.yaml")

ctx = run("田中太郎の電話番号は090-1234-5678です", rec_config, op_config)

print(ctx.engine_result.text)   # anonymized text
print(ctx.engine_result.items)  # audit trail

Architecture

Text In → [Analyzer] → [Anonymizer] → Text Out
Module Role
Analyzer Detects PII using regex patterns + GiNZA NER
Anonymizer Transforms PII using pluggable operators
Orchestrator Wires analyzer and anonymizer into a pipeline

Each module has its own README with extension guides. See the besshouka/ directory.

Built-in Recognizers

Pattern Entity Type
Mobile phone PHONE_NUMBER
Landline phone PHONE_NUMBER
Toll-free phone PHONE_NUMBER
Email address EMAIL
マイナンバー MY_NUMBER
Postal code POSTAL_CODE
Credit card CREDIT_CARD
Bank account BANK_ACCOUNT
Driver's license DRIVERS_LICENSE
Passport PASSPORT
Person names PERSON (GiNZA)
Organizations ORGANIZATION (GiNZA)
Locations LOCATION (GiNZA)

Built-in Operators

Operator What it does
replace Substitute with a fixed value
mask Mask characters from end with a symbol
redact Remove entirely
hash Salted SHA-256 hex digest
encrypt Fernet symmetric encryption
keep Pass through unchanged
custom Call any importable Python function

Extending Besshouka

Add a regex recognizer (no code)

Add an entry to your recognizers YAML:

recognizers:
  - name: employee_id
    entity_type: EMPLOYEE_ID
    pattern: 'EMP-[A-Z]{2}\d{6}'
    score: 1.0
    source: custom

Add a custom operator (no subclassing)

Write a function anywhere importable:

def my_transform(text: str, params: dict) -> str:
    return text[::-1]  # reverse it, or whatever you need

Reference it in your operators YAML:

operators:
  EMPLOYEE_ID:
    method: custom
    function: "my_module.my_transform"

Development

git clone https://github.com/akhi/besshouka.git
cd besshouka
pip install -e ".[dev]"

Running Tests

# All tests (excluding slow GiNZA model tests)
pytest tests/ -m "not slow"

# All tests including GiNZA
pytest tests/

# With coverage
pytest tests/ --cov=besshouka --cov-report=term-missing

Requirements

  • Python >=3.11, <3.14 — Python 3.14 is not yet supported due to PyO3 compatibility with SudachiPy (GiNZA's tokenizer). Python 3.13 is recommended.
  • GiNZA / spaCy (for NER)
  • See requirements.txt for full list

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

besshouka-0.1.1a1.tar.gz (34.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

besshouka-0.1.1a1-py3-none-any.whl (34.0 kB view details)

Uploaded Python 3

File details

Details for the file besshouka-0.1.1a1.tar.gz.

File metadata

  • Download URL: besshouka-0.1.1a1.tar.gz
  • Upload date:
  • Size: 34.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for besshouka-0.1.1a1.tar.gz
Algorithm Hash digest
SHA256 d2aeef525e67afc790972b13fa4204e600542453cc5a40b908bf821b3a50945f
MD5 516317fcd1de851a5f23a409a9d253f2
BLAKE2b-256 551abcf5b44b0a7de2ac2b13c6f7a19844eb089d9fade1ced31ea5e78af46966

See more details on using hashes here.

Provenance

The following attestation bundles were made for besshouka-0.1.1a1.tar.gz:

Publisher: release.yml on go-akhi/besshouka

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file besshouka-0.1.1a1-py3-none-any.whl.

File metadata

  • Download URL: besshouka-0.1.1a1-py3-none-any.whl
  • Upload date:
  • Size: 34.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for besshouka-0.1.1a1-py3-none-any.whl
Algorithm Hash digest
SHA256 c41e219f2602a7aa41a2146484fa7acff0f4398a9e44dac4ce477b9c14458fee
MD5 264282176788e62de6eb78c4d74473d8
BLAKE2b-256 8f37a2daa99c44344717239b8c4464fff8d141d5f415a38dcec73bbe36ace449

See more details on using hashes here.

Provenance

The following attestation bundles were made for besshouka-0.1.1a1-py3-none-any.whl:

Publisher: release.yml on go-akhi/besshouka

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page