neologdn

Japanese text normalizer for mecab-neologd

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- Japanese
Programming Language
Topic
- Text Processing

Project description

neologdn

pyversion

neologdn is a Japanese text normalizer for mecab-neologd.

The normalization is based on the neologd’s rules: https://github.com/neologd/mecab-ipadic-neologd/wiki/Regexp.ja

Contributions are welcome!

NOTE: Installing this module requires C++11 compiler.

Installation

$ pip install neologdn

Usage

import neologdn
neologdn.normalize("ﾊﾝｶｸｶﾅ")
# => 'ハンカクカナ'
neologdn.normalize("全角記号！？＠＃")
# => '全角記号!?@#'
neologdn.normalize("全角記号例外「・」")
# => '全角記号例外「・」'
neologdn.normalize("長音短縮ウェーーーーイ")
# => '長音短縮ウェーイ'
neologdn.normalize("チルダ削除ウェ~∼∾〜〰～イ")
# => 'チルダ削除ウェイ'
neologdn.normalize("いろんなハイフン˗֊‐‑‒–⁃⁻₋−")
# => 'いろんなハイフン-'
neologdn.normalize("　　　ＰＲＭＬ　　副　読　本　　　")
# => 'PRML副読本'
neologdn.normalize(" Natural Language Processing ")
# => 'Natural Language Processing'
neologdn.normalize("かわいいいいいいいいい", repeat=6)
# => 'かわいいいいいい'

Benchmark

# Sample code from
# https://github.com/neologd/mecab-ipadic-neologd/wiki/Regexp.ja#python-written-by-hideaki-t--overlast
import normalize_neologd

%timeit normalize(normalize_neologd.normalize_neologd)
# => 1000 loops, best of 3: 1.21 ms per loop


import neologdn
%timeit normalize(neologdn.normalize)
# => 10000 loops, best of 3: 145 µs per loop

neologdn is about x10 faster than sample code.

details are described as the below notebook: https://github.com/ikegami-yukino/neologdn/blob/master/benchmark/benchmark.ipynb

License

Apache Software License.

CHANGES

0.2 (2016-04-12)

Add lengthened expression (repeating character) threshold

0.1.2 (2016-03-29)

Fix installation bug

0.1.1.1 (2016-03-19)

Support Windows
Explicitly specify to -std=c++11 in build (Many thanks id774)

0.1.1 (2015-10-10)

Initial release.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- Japanese
Programming Language
Topic
- Text Processing

Release history Release notifications | RSS feed

0.5.6

Dec 2, 2025

0.5.5

Dec 1, 2025

0.5.4

Mar 15, 2025

0.5.3

May 2, 2024

0.5.2

Aug 3, 2023

0.5.1

May 2, 2021

0.4

Dec 6, 2018

0.3.2

May 16, 2018

0.3.1

May 16, 2018

0.3

May 16, 2018

0.2.3.1

May 16, 2018

0.2.2

Mar 9, 2018

0.2.1

Jan 23, 2017

This version

0.2

Apr 12, 2016

0.1.2

Mar 29, 2016

0.1.1.1

Mar 19, 2016

0.1.1

Oct 9, 2015

0.1

Oct 9, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neologdn-0.2.tar.gz (45.2 kB view details)

Uploaded Apr 12, 2016 Source

File details

Details for the file neologdn-0.2.tar.gz.

File metadata

Download URL: neologdn-0.2.tar.gz
Upload date: Apr 12, 2016
Size: 45.2 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for neologdn-0.2.tar.gz
Algorithm	Hash digest
SHA256	`ff80a6d30f9f81769000e9da950946a76411cd3437e975ff90984dfd88c3bfc9`
MD5	`f8a3ec77608f09422cb352ce0217fdb0`
BLAKE2b-256	`a91485547ee42ab893a65b97e4d14c08df0fa84c0c49fff64b117de389e30882`

See more details on using hashes here.

neologdn 0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

neologdn

Installation

Usage

Benchmark

License

CHANGES

0.2 (2016-04-12)

0.1.2 (2016-03-29)

0.1.1.1 (2016-03-19)

0.1.1 (2015-10-10)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes