Skip to main content

A tool that divides Japanese full names into family and given names.

Project description

namedivider-python🦒

NameDivider Logo

PyPI version Python versions PyPI downloads CI

NameDivider is a tool that divides Japanese full names into family and given names.

🚀 Try Live Demo📖 Documentation (日本語)🐳 Docker API⚡ Rust Version


💡 Why NameDivider?

Japanese full names like "菅義偉" are typically stored as single strings with no clear boundary between family and given names. NameDivider solves this with exceptional accuracy.

Unlike cloud-based AI solutions, NameDivider processes all data locally — no external API calls, no data transmission, and full privacy control.

# Before
person_name = "菅義偉"  # How do you know where to divide?

# After  
from namedivider import BasicNameDivider
divider = BasicNameDivider()
result = divider.divide_name("菅義偉")
print(f"Family: {result.family}, Given: {result.given}")
# Family: 菅, Given: 義偉

✨ Key Features

  • 🎯 99.91% accuracy - Tested on real-world Japanese names
  • Multiple algorithms - Choose between speed (Basic) or accuracy (GBDT)
  • 🔐 Privacy-first – Local-only processing, ideal for sensitive data
  • 🔧 Production ready - CLI, Python library, and Docker support
  • 🎨 Interactive demo - Try it live with Streamlit
  • 📊 Confidence scoring - Know when to trust the results
  • 🛠️ Customizable rules - Add domain-specific patterns

🚀 Quick Start

Installation

pip install namedivider-python

Basic Usage

from namedivider import BasicNameDivider, GBDTNameDivider

# Fast but good accuracy (99.3%)
basic_divider = BasicNameDivider()
result = basic_divider.divide_name("菅義偉")
print(result)  # 菅 義偉

# Slower but best accuracy (99.9%)
gbdt_divider = GBDTNameDivider()
result = gbdt_divider.divide_name("菅義偉")
print(result.to_dict())
# {
#   'algorithm': 'gbdt',
#   'family': '菅',
#   'given': '義偉',
#   'score': 0.7300634880343344,
#   'separator': ' '
# }

🔧 Multiple Interfaces

🖥️ Command Line Interface

Perfect for batch processing and automation:

# Single name
$ nmdiv name 菅義偉
菅 義偉

# Process file with progress bar
$ nmdiv file customer_names.txt
100%|██████████| 1000/1000 [00:02<00:00, 431.2it/s]

# Check accuracy on labeled data
$ nmdiv accuracy test_data.txt
Accuracy: 99.1%

🐳 REST API (Docker)

For environments where Python cannot be used, we provide a containerized REST API:

# Run the API server
docker run -d -p 8000:8000 rskmoi/namedivider-api

# Send batch requests
curl -X POST localhost:8000/divide \
  -H "Content-Type: application/json" \
  -d '{"names": ["竈門炭治郎", "竈門禰豆子"]}'

Response:

{
  "divided_names": [
    {"family": "竈門", "given": "炭治郎", "separator": " ", "score": 0.3004587452426102, "algorithm": "kanji_feature"},
    {"family": "竈門", "given": "禰豆子", "separator": " ", "score": 0.30480429696983175, "algorithm": "kanji_feature"}
  ]
}

🎯 Interactive Web Demo

Try NameDivider instantly in your browser: Live Demo →

Run locally:

cd examples/demo
pip install -r requirements.txt
streamlit run example_streamlit.py

📊 Performance & Benchmarks

Algorithm Accuracy Speed (names/sec) Use Case
BasicNameDivider / backend=python 99.3% 4152.8 Stable & compatible
BasicNameDivider / backend=rust 99.3% 18597.7 Max performance (if available)
GBDTNameDivider / backend=python 99.9% 1143.3 Best accuracy, guaranteed
GBDTNameDivider / backend=rust 99.9% 6277.4 Fast + accurate (if available)

Run your own benchmarks:

bash scripts/benchmark_sample.sh

🛠️ Advanced Features

Custom Rules

Handle domain-specific names with custom patterns:

from namedivider import BasicNameDivider, BasicNameDividerConfig
from namedivider import SpecificFamilyNameRule

config = BasicNameDividerConfig(
    custom_rules=[
        SpecificFamilyNameRule(family_names=["竜胆"]),  # Rare family names
    ]
)
divider = BasicNameDivider(config=config)
result = divider.divide_name("竜胆尊")
# DividedName(family='竜胆', given='尊', separator=' ', score=1.0, algorithm='rule_specific_family')

Speed Up

For high-volume processing, NameDivider offers several optimization options:

from namedivider import BasicNameDivider, BasicNameDividerConfig

# Load your names
with open("names.txt", "r", encoding="utf-8") as f:
    names = [line.strip() for line in f]

# Option 1: Enable caching (faster repeated processing)
config = BasicNameDividerConfig(cache_mask=True)
divider = BasicNameDivider(config=config)
results = [divider.divide_name(name) for name in names]

# Option 2: (beta) Use Rust backend (up to 4x faster)
# First install: pip install namedivider-core
config = BasicNameDividerConfig(backend="rust")
divider = BasicNameDivider(config=config)
results = [divider.divide_name(name) for name in names]

🏢 Typical Use Cases

  • Customer Data Processing - Clean and standardize name databases
  • Form Validation - Real-time name splitting in web applications
  • Analytics & Reports - Generate family name statistics
  • Data Migration - Convert legacy systems with combined name fields
  • Government & Municipal - Process citizen registration data
  • Security-sensitive Environments - Process names without sending data to external APIs

📚 Examples & Tutorials

📄 License

Source code and gbdt_model_v1.txt

MIT License

bert_katakana_v0_3_0.pt

cc-by-sa-4.0

family_name_repository.pickle

English

(1) Purpose of use

family_name_repository.pickle is available for commercial/non-commercial use if you use this software to divide name, and to develop algorithms for dividing name.

Any other use of family_name_repository.pickle is prohibited.

(2) Liability

The author or copyright holder assumes no responsibility for the software.

Japanese / 日本語

(1) 利用目的

このソフトウェアを用いて姓名分割、および姓名分割アルゴリズムの開発をする場合、family_name_repository.pickleは商用/非商用問わず利用可能です。

それ以外の目的でのfamily_name_repository.pickleの利用を禁じます。

(2) 責任

作者または著作権者は、family_name_repository.pickleに関して一切の責任を負いません。

The family name data used in family_name_repository.pickle is provided by Myoji-Yurai.net(名字由来net).

🔗 Related Projects

📈 Project Stats

GitHub stars GitHub forks Docker Pulls

Trusted by developers worldwide


Made with ❤️ by @rskmoi • Contact @rskmoi

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

namedivider_python-0.4.1.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

namedivider_python-0.4.1-py2.py3-none-any.whl (46.4 kB view details)

Uploaded Python 2Python 3

File details

Details for the file namedivider_python-0.4.1.tar.gz.

File metadata

  • Download URL: namedivider_python-0.4.1.tar.gz
  • Upload date:
  • Size: 35.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.2

File hashes

Hashes for namedivider_python-0.4.1.tar.gz
Algorithm Hash digest
SHA256 ef5d0aecc71fbf020da2d8b84793109d48aaf65da9d244f4746973c4821b08c0
MD5 c6612ed26bc5cb1fd634f30c0f46e902
BLAKE2b-256 82e469d1a59c468d4ea1b80ce2f598de9994f5da7fcff35889b402fdf870e74f

See more details on using hashes here.

File details

Details for the file namedivider_python-0.4.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for namedivider_python-0.4.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 efedf5cfcddc6fcc46c8116cc883955888d9eed6d518169fe6af1e1683055b3f
MD5 684522a572105d07bb363a0a12aa0d54
BLAKE2b-256 3cb78d7aaef9fb1480ab77030c0189929bd5489d848fb390ce945223def2063d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page