Tokenizes Japanese documents to enable CRUD operations.
Project description
Jadoc: Tokenizes Japanese Documents to Enable CRUD Operations
Installation
Install MeCab
MeCab is required for Jadoc to work. If it is not already installed, install MeCab first.
Install Jadoc
$ pip install jadoc
Examples
from jadoc.doc import Doc
doc = Doc("本を書きました。")
# print surface forms of the tokens.
surfaces = [word.surface for word in doc.words]
print("/".join(surfaces)) # 本/を/書き/まし/た/。
# print plain text
print(doc.get_text()) # 本を書きました。
# delete a word
doc.delete(3) # Word conjugation will be done as needed.
print(doc.get_text()) # 本を書いた。
# update a word
word = doc.conjugation.tokenize("読む")
# In addition to conjugation, transform the peripheral words as needed.
doc.update(2, word)
print(doc.get_text()) # 本を読んだ。
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
jadoc-0.2.3.tar.gz
(16.4 kB
view hashes)
Built Distribution
jadoc-0.2.3-py3-none-any.whl
(18.9 kB
view hashes)