A tool that divides Japanese full names into family and given names.
Project description
namedivider-python🦒
NameDivider is a tool that divides Japanese full names into family and given names.
🚀 Try Live Demo • 📖 Documentation (日本語) • 🐳 Docker API • ⚡ Rust Version
💡 Why NameDivider?
Japanese full names like "菅義偉" are typically stored as single strings with no clear boundary between family and given names. NameDivider solves this with exceptional accuracy.
Unlike cloud-based AI solutions, NameDivider processes all data locally — no external API calls, no data transmission, and full privacy control.
# Before
person_name = "菅義偉" # How do you know where to divide?
# After
from namedivider import BasicNameDivider
divider = BasicNameDivider()
result = divider.divide_name("菅義偉")
print(f"Family: {result.family}, Given: {result.given}")
# Family: 菅, Given: 義偉
✨ Key Features
- 🎯 99.91% accuracy - Tested on real-world Japanese names
- ⚡ Multiple algorithms - Choose between speed (Basic) or accuracy (GBDT)
- 🔐 Privacy-first – Local-only processing, ideal for sensitive data
- 🔧 Production ready - CLI, Python library, and Docker support
- 🎨 Interactive demo - Try it live with Streamlit
- 📊 Confidence scoring - Know when to trust the results
- 🛠️ Customizable rules - Add domain-specific patterns
🚀 Quick Start
Installation
pip install namedivider-python
Basic Usage
from namedivider import BasicNameDivider, GBDTNameDivider
# Fast but good accuracy (99.3%)
basic_divider = BasicNameDivider()
result = basic_divider.divide_name("菅義偉")
print(result) # 菅 義偉
# Slower but best accuracy (99.9%)
gbdt_divider = GBDTNameDivider()
result = gbdt_divider.divide_name("菅義偉")
print(result.to_dict())
# {
# 'algorithm': 'gbdt',
# 'family': '菅',
# 'given': '義偉',
# 'score': 0.7300634880343344,
# 'separator': ' '
# }
🔧 Multiple Interfaces
🖥️ Command Line Interface
Perfect for batch processing and automation:
# Single name
$ nmdiv name 菅義偉
菅 義偉
# Process file with progress bar
$ nmdiv file customer_names.txt
100%|██████████| 1000/1000 [00:02<00:00, 431.2it/s]
# Check accuracy on labeled data
$ nmdiv accuracy test_data.txt
Accuracy: 99.1%
🐳 REST API (Docker)
For environments where Python cannot be used, we provide a containerized REST API:
# Run the API server
docker run -d -p 8000:8000 rskmoi/namedivider-api
# Send batch requests
curl -X POST localhost:8000/divide \
-H "Content-Type: application/json" \
-d '{"names": ["竈門炭治郎", "竈門禰豆子"]}'
Response:
{
"divided_names": [
{"family": "竈門", "given": "炭治郎", "separator": " ", "score": 0.3004587452426102, "algorithm": "kanji_feature"},
{"family": "竈門", "given": "禰豆子", "separator": " ", "score": 0.30480429696983175, "algorithm": "kanji_feature"}
]
}
🎯 Interactive Web Demo
Try NameDivider instantly in your browser: Live Demo →
Run locally:
cd examples/demo
pip install -r requirements.txt
streamlit run example_streamlit.py
📊 Performance & Benchmarks
| Algorithm | Accuracy | Speed (names/sec) | Use Case |
|---|---|---|---|
| BasicNameDivider / backend=python | 99.3% | 4152.8 | Stable & compatible |
| BasicNameDivider / backend=rust | 99.3% | 18597.7 | Max performance (if available) |
| GBDTNameDivider / backend=python | 99.9% | 1143.3 | Best accuracy, guaranteed |
| GBDTNameDivider / backend=rust | 99.9% | 6277.4 | Fast + accurate (if available) |
Run your own benchmarks:
bash scripts/benchmark_sample.sh
🛠️ Advanced Features
Custom Rules
Handle domain-specific names with custom patterns:
from namedivider import BasicNameDivider, BasicNameDividerConfig
from namedivider import SpecificFamilyNameRule
config = BasicNameDividerConfig(
custom_rules=[
SpecificFamilyNameRule(family_names=["竜胆"]), # Rare family names
]
)
divider = BasicNameDivider(config=config)
result = divider.divide_name("竜胆尊")
# DividedName(family='竜胆', given='尊', separator=' ', score=1.0, algorithm='rule_specific_family')
Speed Up
For high-volume processing, NameDivider offers several optimization options:
from namedivider import BasicNameDivider, BasicNameDividerConfig
# Load your names
with open("names.txt", "r", encoding="utf-8") as f:
names = [line.strip() for line in f]
# Option 1: Enable caching (faster repeated processing)
config = BasicNameDividerConfig(cache_mask=True)
divider = BasicNameDivider(config=config)
results = [divider.divide_name(name) for name in names]
# Option 2: (beta) Use Rust backend (up to 4x faster)
# First install: pip install namedivider-core
config = BasicNameDividerConfig(backend="rust")
divider = BasicNameDivider(config=config)
results = [divider.divide_name(name) for name in names]
🏢 Typical Use Cases
- Customer Data Processing - Clean and standardize name databases
- Form Validation - Real-time name splitting in web applications
- Analytics & Reports - Generate family name statistics
- Data Migration - Convert legacy systems with combined name fields
- Government & Municipal - Process citizen registration data
- Security-sensitive Environments - Process names without sending data to external APIs
📚 Examples & Tutorials
- 🌐 Use REST API with minimal client samples - Integration examples (7 languages available in namedivider-rs)
- ⚡ Performance Optimization - Handle large datasets efficiently
- 🔧 Custom Rules Examples - Domain-specific configurations
📄 License
Source code and gbdt_model_v1.txt
MIT License
bert_katakana_v0_3_0.pt
cc-by-sa-4.0
family_name_repository.pickle
English
(1) Purpose of use
family_name_repository.pickle is available for commercial/non-commercial use if you use this software to divide name, and to develop algorithms for dividing name.
Any other use of family_name_repository.pickle is prohibited.
(2) Liability
The author or copyright holder assumes no responsibility for the software.
Japanese / 日本語
(1) 利用目的
このソフトウェアを用いて姓名分割、および姓名分割アルゴリズムの開発をする場合、family_name_repository.pickleは商用/非商用問わず利用可能です。
それ以外の目的でのfamily_name_repository.pickleの利用を禁じます。
(2) 責任
作者または著作権者は、family_name_repository.pickleに関して一切の責任を負いません。
The family name data used in family_name_repository.pickle is provided by Myoji-Yurai.net(名字由来net).
🔗 Related Projects
- ⚡ namedivider-rs - High-performance Rust implementation
- 🧠 BERT Katakana Divider - Deep learning approach for katakana names
📈 Project Stats
Trusted by developers worldwide
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file namedivider_python-0.4.1.tar.gz.
File metadata
- Download URL: namedivider_python-0.4.1.tar.gz
- Upload date:
- Size: 35.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.27.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef5d0aecc71fbf020da2d8b84793109d48aaf65da9d244f4746973c4821b08c0
|
|
| MD5 |
c6612ed26bc5cb1fd634f30c0f46e902
|
|
| BLAKE2b-256 |
82e469d1a59c468d4ea1b80ce2f598de9994f5da7fcff35889b402fdf870e74f
|
File details
Details for the file namedivider_python-0.4.1-py2.py3-none-any.whl.
File metadata
- Download URL: namedivider_python-0.4.1-py2.py3-none-any.whl
- Upload date:
- Size: 46.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.27.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
efedf5cfcddc6fcc46c8116cc883955888d9eed6d518169fe6af1e1683055b3f
|
|
| MD5 |
684522a572105d07bb363a0a12aa0d54
|
|
| BLAKE2b-256 |
3cb78d7aaef9fb1480ab77030c0189929bd5489d848fb390ce945223def2063d
|