nagisa

A Japanese tokenizer based on recurrent neural networks

These details have not been verified by PyPI

Project links

Project description

.. raw:: html

<p align="center">

.. raw:: html

</p>

--------------

|Build Status| |Documentation Status| |PyPI|

| Nagisa is a python module for Japanese word segmentation/POS-tagging.
| It is designed to be a simple and easy-to-use tool.

This tool has the following features. - Based on recurrent neural
networks. - The word segmentation model uses character- and word-level
features
`[池田+] <http://www.anlp.jp/proceedings/annual_meeting/2017/pdf_dir/B6-2.pdf>`__.
- The POS-tagging model uses tag dictionary information
`[Inoue+] <http://www.aclweb.org/anthology/K17-1042>`__.

For more details refer to the following links. - The slide in Japanese
is available
`here <https://drive.google.com/open?id=1AzR5wh5502u_OI_Jxwsq24t-er_rnJBP>`__.
- The documentation is available
`here <https://nagisa.readthedocs.io/en/latest/?badge=latest>`__.

Installation
============

| Python 2.7.x or 3.5+ is required.
| This tool uses `DyNet <https://github.com/clab/dynet>`__ (the Dynamic
Neural Network Toolkit) to calcucate neural networks.
| You can install nagisa by using the following command.

.. code:: bash

pip install nagisa

Usage
=====

Basic usage.

.. code:: python

import nagisa

# Sample of word segmentation and POS-tagging for Japanese
text = 'Pythonで簡単に使えるツールです'
words = nagisa.tagging(text)
print(words)
#=> Python/名詞で/助詞簡単/形状詞に/助動詞使える/動詞ツール/名詞です/助動詞

# Get a list of words
print(words.words)
#=> ['Python', 'で', '簡単', 'に', '使える', 'ツール', 'です']

# Get a list of POS-tags
print(words.postags)
#=> ['名詞', '助詞', '形状詞', '助動詞', '動詞', '名詞', '助動詞']

# The nagisa.wakati method is faster than the nagisa.tagging method.
words = nagisa.wakati(text)
print(words)
#=> ['Python', 'で', '簡単', 'に', '使える', 'ツール', 'です']

Post processing functions.

.. code:: python

# Extarcting all nouns from a text
words = nagisa.extract(text, extract_postags=['名詞'])
print(words)
#=> Python/名詞ツール/名詞

# Filtering specific POS-tags from a text
words = nagisa.filter(text, filter_postags=['助詞', '助動詞'])
print(words)
#=> Python/名詞簡単/形状詞使える/動詞ツール/名詞

# A list of available POS-tags
print(nagisa.tagger.postags)
#=> ['補助記号', '名詞', ... , 'URL']

# A word can be recognized as a single word forcibly.
text = 'ニューラルネットワークを使ってます。'
print(nagisa.tagging(text))
#=> ニューラル/名詞ネットワーク/名詞を/助詞使っ/動詞て/助動詞ます/助動詞。/補助記号

# If a word is included in the single_word_list, it is recognized as a single word.
tagger_nn = nagisa.Tagger(single_word_list=['ニューラルネットワーク'])
print(tagger_nn.tagging(text))
#=> ニューラルネットワーク/名詞を/助詞使っ/動詞て/助動詞ます/助動詞。/補助記号

# Nagisa is good at capturing the URLs and kaomoji from an input text.
url = 'https://github.com/taishi-i/nagisaでコードを公開中(๑¯ω¯๑)'
words = nagisa.tagging(url)
print(words)
#=> https://github.com/taishi-i/nagisa/URL で/助詞コード/名詞を/助詞公開/名詞中/接尾辞 (๑　̄ω　̄๑)/補助記号

.. |Build Status| image:: https://travis-ci.org/taishi-i/nagisa.svg?branch=master
:target: https://travis-ci.org/taishi-i/nagisa
.. |Documentation Status| image:: https://readthedocs.org/projects/nagisa/badge/?version=latest
:target: https://nagisa.readthedocs.io/en/latest/?badge=latest
.. |PyPI| image:: https://img.shields.io/pypi/v/nagisa.svg
:target: https://pypi.python.org/pypi/nagisa

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.11

Jan 28, 2024

0.2.11rc1 pre-release

Jan 28, 2024

0.2.10

Jan 27, 2024

0.2.9

Jul 30, 2023

0.2.8

Sep 9, 2022

0.2.7

Jul 6, 2020

0.2.6

Jun 11, 2020

0.2.5

Dec 31, 2019

0.2.4

Aug 5, 2019

0.2.3

May 19, 2019

0.2.2

May 3, 2019

0.2.1

Mar 3, 2019

0.2.0

Jan 9, 2019

0.1.2

Dec 25, 2018

0.1.1

Sep 21, 2018

0.1.0

Sep 2, 2018

This version

0.0.9

Jun 27, 2018

0.0.8

May 22, 2018

0.0.7

May 17, 2018

0.0.6

Mar 19, 2018

0.0.5

Feb 25, 2018

0.0.4

Feb 25, 2018

0.0.3

Feb 25, 2018

0.0.2

Feb 22, 2018

0.0.1

Feb 15, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nagisa-0.0.9.tar.gz (20.8 MB view hashes)

Uploaded Jun 27, 2018 Source

Hashes for nagisa-0.0.9.tar.gz

Hashes for nagisa-0.0.9.tar.gz
Algorithm	Hash digest
SHA256	`9caff3c399d69c00961c39ea96cd22c702e68552498e8ec329384e499b07dae1`
MD5	`461c8f34c40d57a8f44b842b871391cd`
BLAKE2b-256	`908da3c91b4762f7b65ffcbef8f1fbad62332fcf901d32b7bd697b79329a94a0`