Skip to main content

Tokenizes Japanese documents to enable CRUD operations.

Project description

Jadoc: Tokenizes Japanese Documents to Enable CRUD Operations

PyPI Version Python Versions License Code style: black Imports: isort

Installation

Install MeCab

MeCab is required for Jadoc to work. If it is not already installed, install MeCab first.

Install Jadoc

$ pip install jadoc

Examples

from jadoc.doc import Doc


doc = Doc("本を書きました。")

# print surface forms of the tokens.
surfaces = [word.surface for word in doc.words]
print("/".join(surfaces))  # 本/を/書き/まし/た/。

# print plain text
print(doc.get_text())  # 本を書きました。

# delete a word
doc.delete(3)  # Word conjugation will be done as needed.
print(doc.get_text())  # 本を書いた。

# update a word
word = doc.conjugation.tokenize("読む")
# In addition to conjugation, transform the peripheral words as needed.
doc.update(2, word)
print(doc.get_text())  # 本を読んだ。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jadoc-0.2.5.tar.gz (16.6 kB view hashes)

Uploaded Source

Built Distribution

jadoc-0.2.5-py3-none-any.whl (19.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page