Skip to main content

A tool for dividing the Japanese full name into a family name and a given name.

Project description

namedivider-python

logo

NameDivider is a tool for dividing the Japanese full name into a family name and a given name.

input: 菅義偉 -> output: 菅 義偉

NameDivider divides the name using statistical information of the kanji used in the names.

Measuring the accuracy using a privately held data set, the accuracy is 99.91%.

You can see how it works with this demo.

Documents

NameDivider(日本語)

Installation

pip install namedivider-python

Usage

It's simple to use.

from namedivider import BasicNameDivider, GBDTNameDivider
from pprint import pprint

basic_divider = BasicNameDivider() # BasicNameDivider is fast but accuracy is 99.2%
divided_name = basic_divider.divide_name("菅義偉")

gbdt_divider = GBDTNameDivider() # GBDTNameDivider is slow but accuracy is 99.9%
divided_name = gbdt_divider.divide_name("菅義偉")

print(divided_name)
# 菅 義偉

pprint(divided_name.to_dict())
# {'algorithm': 'kanji_feature',
# 'family': '菅',
# 'given': '義偉',
# 'score': 0.7300634880343344,
# 'separator': ' '}

For more advanced features, see here.

NameDivider API

NameDivider API is a Docker container that provides a RESTful API for dividing the Japanese full name into a family name and a given name.

I am developing NameDivider API to provide NameDivider functionality to non-Python language users.

Installation

docker pull rskmoi/namedivider-api

Usage

  • Run Docker Image
docker run -d --rm -p 8000:8000 rskmoi/namedivider-api
  • Send HTTP request
curl -X POST -H "Content-Type: application/json" -d '{"names":["竈門炭治郎", "竈門禰豆子"]}' localhost:8000/divide
  • Response
{
    "divided_names":
        [
            {"family":"竈門","given":"炭治郎","separator":" ","score":0.3004587452426102,"algorithm":"kanji_feature"},
            {"family":"竈門","given":"禰豆子","separator":" ","score":0.30480429696983175,"algorithm":"kanji_feature"}
        ]
}

Notice

  • names is a list of undivided name. The maximum length of the list is 1000.
  • If you require speed or want to use GBDTNameDivider, please try v0.2.0-beta.

CLI

Read namedivider/cli.py for more information.

$ nmdiv name 菅義偉
菅 義偉
$ nmdiv file undivided_names.txt
100%|███████████████████████████████████████████| 4/4 [00:00<00:00, 4194.30it/s]
原 敬
菅 義偉
阿部 晋三
中曽根 康弘
$ nmdiv accuracy divided_names.txt
100%|███████████████████████████████████████████| 5/5 [00:00<00:00, 3673.41it/s]
0.8
True: 滝 登喜男, Pred: 滝登 喜男

License

Source code and gbdt_model_v1.txt

MIT License

bert_katakana_v0_3_0.pt

cc-by-sa-4.0

family_name_repository.pickle

  • English

(1) Purpose of use

family_name_repository.pickle is available for commercial/non-commercial use if you use this software to divide name, and to develop algorithms for dividing name.

Any other use of family_name_repository.pickle is prohibited.

(2) Liability

The author or copyright holder assumes no responsibility for the software.

  • Japanese

(1) 利用目的

このソフトウェアを用いて姓名分割、および姓名分割アルゴリズムの開発をする場合、family_name_repository.pickleは商用/非商用問わず利用可能です。

それ以外の目的でのfamily_name_repository.pickleの利用を禁じます。

(2) 責任

作者または著作権者は、family_name_repository.pickleに関して一切の責任を負いません。

The family name data used in family_name_repository.pickle is provided by Myoji-Yurai.net(名字由来net).

Ongoing Projects

  • Porting Python to Rust

https://github.com/rskmoi/namedivider-rs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

namedivider_python-0.3.1.tar.gz (34.9 kB view details)

Uploaded Source

Built Distribution

namedivider_python-0.3.1-py2.py3-none-any.whl (46.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file namedivider_python-0.3.1.tar.gz.

File metadata

  • Download URL: namedivider_python-0.3.1.tar.gz
  • Upload date:
  • Size: 34.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.2

File hashes

Hashes for namedivider_python-0.3.1.tar.gz
Algorithm Hash digest
SHA256 d07c7ef1adcfea9dce8c2adffd7a4a5717d557a145ab403c2ca2093be85606a0
MD5 3beffebe84841ad762dea650d9251a48
BLAKE2b-256 3d109d0e993e8701ceaf287a73bfa15e649b7feebb2f17c1b49444160950c068

See more details on using hashes here.

File details

Details for the file namedivider_python-0.3.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for namedivider_python-0.3.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f462c46154bc6bd3eb8c2073db088b66aaea0774ff980538fb48d96f9a81537c
MD5 9d76a615a051e44167765233f0bc32e3
BLAKE2b-256 8d56a22edb010f8b30e7a32c16e18a7eaa3d304dd8c6905393d20a6fdc0b3d33

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page