Skip to main content

Local-first Japanese PII anonymization engine

Reason this release was yanked:

Critical bugs identified

Project description

Besshouka (別称化)

A local-first Japanese PII anonymization engine. Besshouka detects personally identifiable information (PII), payment card data (PCI), and protected health information (PHI) in Japanese text and transforms it using configurable rules — all without sending data to any external service.

Note: Besshouka is in early development (alpha). It is not yet recommended for production use. Contributions to improve accuracy, coverage, and robustness are welcome — see CONTRIBUTING.md.

Why Besshouka?

  • Japanese-native — built specifically for Japanese data patterns: マイナンバー, Japanese phone formats, postal codes, full-width character handling, and GiNZA-powered NER for names, organizations, and locations.
  • Local-first — everything runs on your machine. No cloud APIs, no data leaves the device.
  • Pluggable — add custom regex recognizers via YAML, write your own operators in Python, or plug in any importable function as a custom operator. No forking required.
  • Auditable — every anonymization operation is logged in an audit trail with the original text, the operator used, and the new indices.

Quick Start

pip install besshouka

Anonymize text

besshouka anonymize "田中太郎の電話番号は090-1234-5678です"
# Output: <氏名>の電話番号は090-1234-****です

Analyze (detect only)

besshouka analyze --explain "田中太郎の電話番号は090-1234-5678です"

Use custom rules

besshouka anonymize \
  --recognizers my_patterns.yaml \
  --rules my_operators.yaml \
  --input document.txt \
  --output anonymized.txt

Programmatic Usage

from besshouka.config.loader import load_recognizer_config, load_operator_config
from besshouka.orchestrator.pipeline import run

rec_config = load_recognizer_config("path/to/recognizers.yaml")
op_config = load_operator_config("path/to/operators.yaml")

ctx = run("田中太郎の電話番号は090-1234-5678です", rec_config, op_config)

print(ctx.engine_result.text)   # anonymized text
print(ctx.engine_result.items)  # audit trail

Architecture

Text In → [Analyzer] → [Anonymizer] → Text Out
Module Role
Analyzer Detects PII using regex patterns + GiNZA NER
Anonymizer Transforms PII using pluggable operators
Orchestrator Wires analyzer and anonymizer into a pipeline

Each module has its own README with extension guides. See the besshouka/ directory.

Built-in Recognizers

Pattern Entity Type
Mobile phone PHONE_NUMBER
Landline phone PHONE_NUMBER
Toll-free phone PHONE_NUMBER
Email address EMAIL
マイナンバー MY_NUMBER
Postal code POSTAL_CODE
Credit card CREDIT_CARD
Bank account BANK_ACCOUNT
Driver's license DRIVERS_LICENSE
Passport PASSPORT
Person names PERSON (GiNZA)
Organizations ORGANIZATION (GiNZA)
Locations LOCATION (GiNZA)

Built-in Operators

Operator What it does
replace Substitute with a fixed value
mask Mask characters from end with a symbol
redact Remove entirely
hash Salted SHA-256 hex digest
encrypt Fernet symmetric encryption
keep Pass through unchanged
custom Call any importable Python function

Extending Besshouka

Add a regex recognizer (no code)

Add an entry to your recognizers YAML:

recognizers:
  - name: employee_id
    entity_type: EMPLOYEE_ID
    pattern: 'EMP-[A-Z]{2}\d{6}'
    score: 1.0
    source: custom

Add a custom operator (no subclassing)

Write a function anywhere importable:

def my_transform(text: str, params: dict) -> str:
    return text[::-1]  # reverse it, or whatever you need

Reference it in your operators YAML:

operators:
  EMPLOYEE_ID:
    method: custom
    function: "my_module.my_transform"

Development

git clone https://github.com/akhi/besshouka.git
cd besshouka
pip install -e ".[dev]"

Running Tests

# All tests (excluding slow GiNZA model tests)
pytest tests/ -m "not slow"

# All tests including GiNZA
pytest tests/

# With coverage
pytest tests/ --cov=besshouka --cov-report=term-missing

Requirements

  • Python >=3.11, <3.14 — Python 3.14 is not yet supported due to PyO3 compatibility with SudachiPy (GiNZA's tokenizer). Python 3.13 is recommended.
  • GiNZA / spaCy (for NER)
  • See requirements.txt for full list

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

besshouka-0.1.1a0.tar.gz (34.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

besshouka-0.1.1a0-py3-none-any.whl (33.7 kB view details)

Uploaded Python 3

File details

Details for the file besshouka-0.1.1a0.tar.gz.

File metadata

  • Download URL: besshouka-0.1.1a0.tar.gz
  • Upload date:
  • Size: 34.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for besshouka-0.1.1a0.tar.gz
Algorithm Hash digest
SHA256 fb4c7330b19a0ef14814bb380e5ef1a4e0db0aacaeecc5197c8f6647e3847c74
MD5 6134702d6ebba727b3e3b873791f9724
BLAKE2b-256 39d99de0eb756611de7c71eada98f27341ebed255b22d04e88b4ccfd367836ca

See more details on using hashes here.

Provenance

The following attestation bundles were made for besshouka-0.1.1a0.tar.gz:

Publisher: release.yml on go-akhi/besshouka

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file besshouka-0.1.1a0-py3-none-any.whl.

File metadata

  • Download URL: besshouka-0.1.1a0-py3-none-any.whl
  • Upload date:
  • Size: 33.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for besshouka-0.1.1a0-py3-none-any.whl
Algorithm Hash digest
SHA256 52eb8753aee75ad089d8a9b364aa7ea9fc7ec972d9446667cc0f2f186958170d
MD5 0bc9f5f13766e5f7ff40fb5722b0fe9e
BLAKE2b-256 ad2f52db8956b9799348a114809201fe428a7ba89d68966ca75161edab734f31

See more details on using hashes here.

Provenance

The following attestation bundles were made for besshouka-0.1.1a0-py3-none-any.whl:

Publisher: release.yml on go-akhi/besshouka

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page